Classical Foundations of Artificial Neural Networks#
“What is a number, that a man may know it, and a man, that he may know a number?” — Warren S. McCulloch
Welcome#
This interactive book traces the intellectual arc of artificial neural networks from their birth in mathematical logic (1943) through the development of practical learning algorithms (1986). It is designed as a rigorous, hands-on course for computer science students who want to understand not just how neural networks work, but why they work — and the deep mathematical theory behind them.
You will read the foundational papers, work through the original proofs, implement the algorithms from scratch in Python, and build your own neural networks with as few as 2–5 neurons to understand the core principles before scaling up.
What You Will Learn#
Part I: Origins (1943)#
How McCulloch and Pitts created the first mathematical model of a neuron, proved that suitably arranged networks could compute any Boolean function on binary inputs, and connected the model to logical computation.
Part II: The Perceptron (1958)#
How Rosenblatt added learning to the neuron model, proved that his algorithm converges, and what the geometric meaning of a perceptron’s decision boundary is.
Part III: Limitations and Breakthroughs (1969)#
Why a single perceptron cannot compute XOR, what Minsky and Papert proved about the limits of linear classifiers, and how adding a single hidden layer breaks through these limitations.
Part IV: Learning Rules (1949–1982)#
Hebb’s postulate about synaptic modification, Oja’s rule for extracting principal components, and the biological evidence for Hebbian learning.
Part V: Backpropagation (1974–1986)#
The complete mathematical derivation of backpropagation, activation functions and the vanishing gradient problem, and the Universal Approximation Theorem.
Part VI: Synthesis#
The complete intellectual arc from McCulloch-Pitts to modern deep learning, and what comes next.
Interactive Papers#
Deep, guided walkthroughs of key research papers with interactive applets that illuminate every step of the proofs. The first entry covers Monico (2024), an elementary proof of the Universal Approximation Theorem using only undergraduate analysis — a perfect companion to the functional-analytic proof in Chapter 19.
Lecture Slides#
Interactive presentation slides are available for all parts of the course. See the Lecture Slides page for the full collection.
Prerequisites#
Linear algebra: vectors, matrices, dot products, eigenvalues
Calculus: derivatives, partial derivatives, chain rule, gradients
Probability: basic probability, expected value
Programming: Python (NumPy, Matplotlib)
Mathematical maturity: comfort with proofs, formal definitions, and theorems
How to Use This Book#
Each chapter contains:
Historical context — who, when, why
Mathematical theory — definitions, theorems, complete proofs
Python implementations — working code you can run and modify
Experiments — parameter exploration, visualization, empirical verification
Exercises and challenges — from routine to research-level
The code cells are meant to be executed interactively. Modify the parameters, change the data, break things — that is how you learn.
Key Papers#
Throughout this course, we engage directly with the foundational papers:
McCulloch & Pitts (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys., 5(4), 115–133.
Hebb (1949). The Organization of Behavior. Wiley.
Rosenblatt (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psych. Review, 65(6), 386–408.
Minsky & Papert (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.
Rumelhart, Hinton & Williams (1986). Learning Representations by Back-Propagating Errors. Nature, 323, 533–536.
Hornik, Stinchcombe & White (1989). Multilayer Feedforward Networks Are Universal Approximators. Neural Networks, 2(5), 359–366.
Technical Setup#
pip install numpy scipy matplotlib jupyter-book
All code in this book uses NumPy, SciPy, and Matplotlib — no deep learning frameworks. You will build everything from scratch.