Classical Foundations of Artificial Neural Networks#

“What is a number, that a man may know it, and a man, that he may know a number?” — Warren S. McCulloch

Welcome#

This interactive book traces the intellectual arc of artificial neural networks from their birth in mathematical logic (1943) through the development of practical learning algorithms (1986). It is designed as a rigorous, hands-on course for computer science students who want to understand not just how neural networks work, but why they work — and the deep mathematical theory behind them.

You will read the foundational papers, work through the original proofs, implement the algorithms from scratch in Python, and build your own neural networks with as few as 2–5 neurons to understand the core principles before scaling up.

What You Will Learn#

Part I: Origins (1943)#

How McCulloch and Pitts created the first mathematical model of a neuron, proved that suitably arranged networks could compute any Boolean function on binary inputs, and connected the model to logical computation.

Part II: The Perceptron (1958)#

How Rosenblatt added learning to the neuron model, proved that his algorithm converges, and what the geometric meaning of a perceptron’s decision boundary is.

Part III: Limitations and Breakthroughs (1969)#

Why a single perceptron cannot compute XOR, what Minsky and Papert proved about the limits of linear classifiers, and how adding a single hidden layer breaks through these limitations.

Part IV: Learning Rules (1949–1982)#

Hebb’s postulate about synaptic modification, Oja’s rule for extracting principal components, and the biological evidence for Hebbian learning.

Part V: Backpropagation (1974–1986)#

The complete mathematical derivation of backpropagation, activation functions and the vanishing gradient problem, and the Universal Approximation Theorem.

Part VI: Synthesis#

The complete intellectual arc from McCulloch-Pitts to modern deep learning, and what comes next.

Interactive Papers#

Deep, guided walkthroughs of key research papers with interactive applets that illuminate every step of the proofs. The first entry covers Monico (2024), an elementary proof of the Universal Approximation Theorem using only undergraduate analysis — a perfect companion to the functional-analytic proof in Chapter 19.

Lecture Slides#

Interactive presentation slides are available for all parts of the course. See the Lecture Slides page for the full collection.

Prerequisites#

  • Linear algebra: vectors, matrices, dot products, eigenvalues

  • Calculus: derivatives, partial derivatives, chain rule, gradients

  • Probability: basic probability, expected value

  • Programming: Python (NumPy, Matplotlib)

  • Mathematical maturity: comfort with proofs, formal definitions, and theorems

How to Use This Book#

Each chapter contains:

  • Historical context — who, when, why

  • Mathematical theory — definitions, theorems, complete proofs

  • Python implementations — working code you can run and modify

  • Experiments — parameter exploration, visualization, empirical verification

  • Exercises and challenges — from routine to research-level

The code cells are meant to be executed interactively. Modify the parameters, change the data, break things — that is how you learn.

Key Papers#

Throughout this course, we engage directly with the foundational papers:

  1. McCulloch & Pitts (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys., 5(4), 115–133.

  2. Hebb (1949). The Organization of Behavior. Wiley.

  3. Rosenblatt (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psych. Review, 65(6), 386–408.

  4. Minsky & Papert (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.

  5. Rumelhart, Hinton & Williams (1986). Learning Representations by Back-Propagating Errors. Nature, 323, 533–536.

  6. Hornik, Stinchcombe & White (1989). Multilayer Feedforward Networks Are Universal Approximators. Neural Networks, 2(5), 359–366.

Technical Setup#

pip install numpy scipy matplotlib jupyter-book

All code in this book uses NumPy, SciPy, and Matplotlib — no deep learning frameworks. You will build everything from scratch.