Part VI

Synthesis

The Complete Arc: 1943–1989

Chapter 20

The Grand Timeline

Ch. 20 Ch. 20 notes

Evolution Six Stages of Development

Year	Milestone	What Was Achieved
1943	McCulloch-Pitts	Formal neuron model; Boolean completeness
1949	Hebb's Postulate	First learning rule (unsupervised)
1957	Perceptron	First learning machine with convergence proof
1969	Minsky-Papert	Proved single-layer limits; AI Winter
1986	Backpropagation	Multi-layer learning via chain rule
1989	UAT	One hidden layer suffices for any function

Ch. 20 Ch. 20 notes

Themes Three Recurring Themes

Representation
What can networks compute?

M-P: any Boolean function.
Perceptron: linearly separable only.
MLP: any continuous function.

Learning
How do networks acquire their computation?

M-P: no learning.
Hebb: unsupervised.
Perceptron: supervised (1 layer).
Backprop: supervised (all layers).

Universality
Are there fundamental limits?

Perceptron: yes (linear).
MLP+Backprop: representationally universal.
In practice: depth matters.

Ch. 20 Ch. 20 notes

Comparison Model Comparison

Property	M-P (1943)	Perceptron (1958)	MLP + Backprop (1986)
Architecture	Single threshold	Single layer	Multiple layers
Learning	None (hand-set)	Perceptron rule	Backpropagation
Can solve XOR?	Yes (manual design)	No	Yes (learned)
Theory	Boolean completeness	Convergence thm	UAT
Bio. plausibility	High	Moderate	Low
Key limitation	No learning	Linear separability	Vanishing gradient

Ch. 20 Ch. 20 notes

Milestones Three Key Breakthroughs

1943: Formal neuron — neural computation can be described mathematically (McCulloch & Pitts)

1958: Learning from data — machines can automatically find correct weights (Rosenblatt)

1986: Credit assignment — hidden layers can learn useful representations via backpropagation (Rumelhart, Hinton & Williams)

Ch. 20 Ch. 20 notes

Challenges Three Key Obstacles

No learning mechanism (1943–1958): M-P neurons had to be hand-designed — solved by the perceptron learning rule

Linear separability barrier (1969): Single-layer networks cannot compute XOR — solved by multi-layer architecture

Credit assignment problem (1969–1986): No algorithm to train hidden layers — solved by backpropagation

Ch. 20 Ch. 20 notes

Lessons AI Winter Lessons

Valid criticism + institutional power = 17-year research shutdown
Minsky-Papert proved limitations of single-layer nets, but community concluded all neural nets were useless
Backpropagation existed (Werbos, 1974) but was ignored during the winter
Important ideas can survive decades of neglect

The gap between “can solve” and “can learn to solve” was 17 years and multiple independent rediscoveries.

Ch. 20 Ch. 20 notes

Beyond 1989 What Comes Next

Convolutional Neural Networks (LeCun, 1989 → Krizhevsky/AlexNet, 2012)
Recurrent Networks & LSTM (Hochreiter & Schmidhuber, 1997)
Deep Learning Revolution (Hinton et al., 2006–2012)
Attention & Transformers (Vaswani et al., 2017)
Large Language Models (GPT, Claude, 2020s)

All of modern AI rests on: parameterized differentiable functions trained by gradient descent via backpropagation.

Ch. 20 Ch. 20 notes

Pause & Reflect

Questions for Reflection

If Minsky-Papert had been wrong, would multi-layer networks have been developed anyway?

Why was backpropagation discovered at least 4 times before it was widely adopted?

What parallels exist between the 1960s perceptron hype and the 2020s AI hype?

Ch. 20 Ch. 20 notes

The Complete Arc

Part IThe McCulloch-Pitts neuron — computation as logic (1943)

Part IIThe Perceptron — learning from data with convergence proof (1957)

Part IIILimitations — XOR, Minsky-Papert, AI Winter (1969)

Part IVLearning Rules — Hebb, Oja, PCA, credit assignment (1949–1982)

Part VBackpropagation — the four equations, activation functions, UAT (1986)

Part VISynthesis — from McCulloch-Pitts to universal approximation (1943–1989)

Ch. 20 Ch. 20 notes

Thank You

Chapter References

Ch. 8: The XOR Problem
Ch. 9: Minsky & Papert's Analysis
Ch. 10: Linear Separability Deep Dive
Ch. 11: Multi-Layer Networks
Ch. 12: Hebbian Learning
Ch. 13: Oja's Rule & PCA
Ch. 14: The Zoo of Learning Rules
Ch. 15: Gradient Descent
Ch. 16: Backpropagation Derivation
Ch. 17: Activation Functions
Ch. 18: Backpropagation in Practice
Ch. 19: Universal Approximation Theorem
Ch. 20: The Complete Arc

Press ESC for slide overview · S for speaker notes · ? for shortcuts