Part VI

From McCulloch-Pitts
to Backpropagation

The Complete Arc: 1943–1989

Chapter 20

Chapter 20 Full Notes →

The Grand Timeline: 1943–1989

AI Winter 1943 McCulloch-Pitts Formal neuron 1949 Hebb Learning rule 1958 Rosenblatt Perceptron 1969 Minsky & Papert Perceptrons book 1974 Werbos Backpropagation 1986 Rumelhart, Hinton & Williams 1989 Cybenko / Hornik Universal Approx. Birth of Computational Neuroscience AI Winter & Underground Work Renaissance Chapter 20 Full Notes →

Six Stages of Neural Network History

Era Milestone Key Figures What Was Solved
1943 Formal neuron McCulloch & Pitts Mathematical model of neurons
1949 Hebbian learning Donald Hebb First learning principle
1958 Perceptron Frank Rosenblatt First learning machine
1969 Limitations proved Minsky & Papert Rigorous impossibility results
1986 Backpropagation Rumelhart, Hinton & Williams Multi-layer learning
1989 Universal Approximation Hornik, Stinchcombe & White Universal representation
Chapter 20 Full Notes →

Themes Three Recurring Themes

Representation

What can a network compute?

  • M-P: Any Boolean function
  • Perceptron: Linearly separable only
  • MLP: Any continuous function (UAT)

Learning

How does it acquire computation?

  • M-P: No learning (hand-designed)
  • Hebb: Unsupervised correlation
  • Backprop: Supervised multi-layer GD

Universality

Are there fundamental limits?

  • Perceptron: Yes (linear separability)
  • MLP: No (UAT)
  • Practice: Depth matters, optimization is hard
Chapter 20 Full Notes →

Comparison Side-by-Side: Three Architectures

Property M-P (1943) Perceptron (1958) MLP + Backprop (1986)
Learning None Perceptron rule Backpropagation
Can learn? No Linearly separable Any continuous fn
XOR? Yes (manual) No Yes
Theory Boolean completeness Convergence thm UAT
Key limitation No learning Linear separability Vanishing gradient
Key insight: Each generation solved the previous one's limitation — but introduced a new one.
Chapter 20 Full Notes →

Breakthroughs Three Pivotal Advances

1. Formal Neuron (1943) — McCulloch & Pitts showed that neural computation can be formalized mathematically. Created the field of computational neuroscience.
2. Learning Algorithm (1958) — Rosenblatt demonstrated that machines can learn from data with provable convergence guarantees. First automated learning.
3. Deep Learning (1986) — Rumelhart, Hinton & Williams showed that hidden representations can be learned automatically via backpropagation. Solved the credit assignment problem.
Chapter 20 Full Notes →

Obstacles Three Critical Barriers

1. No Learning (1943–1958) — M-P neurons compute but don't learn. Weights must be hand-designed.
Solved by: Rosenblatt's perceptron learning rule
2. Linear Separability (1958–1986) — Single layers can't learn XOR, parity, or connectivity.
Solved by: Multi-layer networks with hidden layers
3. Credit Assignment (1969–1986) — How do we determine which hidden-layer weights caused the error?
Solved by: Backpropagation (chain rule through all layers)
Chapter 20 Full Notes →

Warning Lessons from the AI Winter

Overpromising leads to backlash. "The perceptron will be conscious" → devastating disillusionment when it couldn't even learn XOR.
One negative result can freeze a field. Minsky & Papert's narrow result (single-layer limits) was widely interpreted as condemning all neural networks.
"Dead" ideas can revive. Backpropagation was essentially rediscovered 12 years after Werbos's 1974 thesis. Important ideas survive if a few committed researchers persist.
Existence does not equal algorithm. Everyone knew multi-layer networks could solve XOR. But without a training algorithm, that knowledge was useless.
Chapter 20 Full Notes →

The Theory-Practice Gap

1943 Formal neuron 15 years to practical learning 1958 Perceptron 1969 XOR barrier 17 years to training MLPs 1986 Backprop 1989 UAT 23 years to practical deep learning 2012 AlexNet
Key insight: Knowing something is possible and knowing how to do it efficiently are very different. The gap is always filled by engineering, hardware, data, and persistence.
Chapter 20 Full Notes →

Preview What Comes Next

  • CNNs (LeCun 1989 → AlexNet 2012) — Weight sharing exploits spatial structure. Convolution replaces full connectivity.
  • RNNs / LSTMs (Elman 1990, Hochreiter 1997) — Processing sequences with shared weights across time. Gated memory cells solve vanishing gradients.
  • Transformers (Vaswani 2017) — Self-attention replaces recurrence with parallel computation. Foundation of GPT, BERT, and modern LLMs.
Unifying principle: All modern architectures rely on the same core machinery — parameterized differentiable functions trained by gradient descent via backpropagation.
Chapter 20 Full Notes →

Pause & Think

The complete arc — from formal neuron to universal approximator — took 46 years.

It required mathematicians, psychologists, physicists, and computer scientists.

It survived the AI winter.

Which classical problems — generalization, efficiency, biological plausibility —
matter most for the future of AI?

Chapter 20 Full Notes →

Summary Course Overview

I Part I: Origins — McCulloch-Pitts formal neuron, Boolean completeness, the first mathematical model of neural computation.
II Part II: The Perceptron — Rosenblatt's learning algorithm, convergence theorem, learning from data with guarantees.
III Part III: Limitations & Breakthroughs — XOR problem, Minsky-Papert theorem, linear separability, multi-layer solution.
IV Part IV: Learning Rules — Hebbian learning, Oja's rule, PCA, BCM theory, unsupervised learning foundations.
V Part V: Backpropagation — Gradient descent, chain rule, activation functions, UAT, training deep networks.
VI Part VI: Synthesis — The complete historical arc, recurring themes, breakthroughs, obstacles, and what comes next.
Chapter 20 Full Notes →
Chapter 20 Full Notes →