{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": "# Chapter 20: From McCulloch-Pitts to Backpropagation -- The Complete Arc\n\n\nThis final chapter synthesizes the journey we have taken through the classical foundations\nof neural networks. We trace the arc from McCulloch and Pitts's formal neuron (1943)\nthrough Rosenblatt's perceptron (1958), the downturn in neural network research associated with Minsky and Papert (1969),\nand the resurrection via backpropagation (1986), culminating in the Universal Approximation\nTheorem (1989). We identify the recurring themes, the key breakthroughs, and the obstacles\nthat shaped the field."
  },
  {
   "cell_type": "markdown",
   "id": "cell-0b",
   "metadata": {},
   "source": [
    "```{admonition} Historical Perspective\n",
    ":class: note\n",
    "\n",
    "The story of neural networks is one of the most dramatic in all of science. It spans\n",
    "nearly half a century of breakthroughs, setbacks, and comebacks -- a narrative shaped as\n",
    "much by scientific politics and funding cycles as by theorems and algorithms. Understanding\n",
    "this history is not mere trivia: the patterns of hype, disappointment, and renewal continue\n",
    "to repeat in modern AI.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-1",
   "metadata": {},
   "source": [
    "## 20.1 The Historical Timeline\n",
    "\n",
    "| Year | Milestone | Key Figure(s) | What Was Achieved |\n",
    "|------|-----------|---------------|-------------------|\n",
    "| **1943** | McCulloch-Pitts neuron | McCulloch & Pitts | Formal model of a neuron; any Boolean function can be computed |\n",
    "| **1949** | Hebbian learning | Donald Hebb | First learning rule: \"neurons that fire together wire together\" |\n",
    "| **1958** | The Perceptron | Frank Rosenblatt | First machine that learns from data (perceptron convergence theorem) |\n",
    "| **1969** | *Perceptrons* | Minsky & Papert | Proved linear separability limitations; triggered the AI winter |\n",
    "| **1974** | Backpropagation | Paul Werbos | Applied reverse-mode automatic differentiation to neural networks |\n",
    "| **1982** | Hopfield networks | John Hopfield | Revived interest in neural networks via physics connections |\n",
    "| **1986** | Backprop popularized | Rumelhart, Hinton & Williams | Demonstrated that hidden layers could learn useful representations |\n",
    "| **1989** | Universal Approximation | Hornik, Stinchcombe & White; Cybenko | Proved one hidden layer suffices for any continuous function |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": [
    "## 20.2 What Was Solved at Each Stage\n",
    "\n",
    "### 1943: The Formal Neuron (McCulloch & Pitts)\n",
    "\n",
    "**Solved**: How to model neural computation mathematically.\n",
    "**Left open**: How do the weights get set? (No learning rule.)\n",
    "\n",
    "### 1949: Hebbian Learning (Hebb)\n",
    "\n",
    "**Solved**: A biologically plausible principle for synaptic modification.\n",
    "**Left open**: How to use it for specific tasks? Stability? Multi-layer learning?\n",
    "\n",
    "### 1958: The Perceptron (Rosenblatt)\n",
    "\n",
    "**Solved**: A concrete learning algorithm with convergence guarantee.\n",
    "**Left open**: Only single-layer networks. What about non-linearly-separable problems?\n",
    "\n",
    "### 1969: The Limitations (Minsky & Papert)\n",
    "\n",
    "**Solved** (in a negative sense): Proved that single-layer perceptrons cannot compute\n",
    "XOR, parity, or connectivity. Established rigorous limits of linear classifiers.\n",
    "**Left open**: Can multi-layer networks overcome these limits? How to train them?\n",
    "\n",
    "### 1974/1986: Backpropagation (Werbos / Rumelhart-Hinton-Williams)\n",
    "\n",
    "**Solved**: The credit assignment problem. Efficient gradient computation for multi-layer\n",
    "networks. Demonstrated that hidden layers learn useful internal representations.\n",
    "**Left open**: Why is this so hard to train deep networks? Can networks approximate anything?\n",
    "\n",
    "### 1989: Universal Approximation (Hornik et al. / Cybenko)\n",
    "\n",
    "**Solved**: Neural networks are universal function approximators.\n",
    "**Left open**: Practical training of deep networks. Generalization. Efficiency."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2b",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "**Key Takeaway: 1943--1958 -- The Birth of Computational Neuroscience**\n",
    "\n",
    "This era established the radical idea that the brain's computation can be formalized\n",
    "mathematically. McCulloch and Pitts showed *what* neurons can compute (any Boolean function),\n",
    "and Hebb proposed *how* they might learn (correlation-based synaptic modification). The\n",
    "gap between these two -- a computable model with no learning, and a learning principle\n",
    "with no concrete algorithm -- would drive the next decade of research.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2c",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "**Key Takeaway: 1958--1969 -- The Perceptron Golden Age**\n",
    "\n",
    "Rosenblatt's perceptron bridged the gap: a concrete algorithm that provably learns from\n",
    "data. The convergence theorem gave mathematical certainty. But the excitement outpaced\n",
    "the reality -- the perceptron could only learn linearly separable functions, and the\n",
    "media hype (\"the embryo of an electronic computer that the Navy expects will be able to\n",
    "walk, talk, see, write, reproduce itself and be conscious of its existence\") set the\n",
    "stage for a devastating backlash.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2d",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "**Key Takeaway: 1969--1986 -- The AI Winter and Underground Work**\n",
    "\n",
    "The Minsky-Papert book did not just prove a theorem -- it changed the sociology of an\n",
    "entire field. Funding dried up, researchers moved to other areas, and neural networks\n",
    "became unfashionable. But crucial work continued underground: Werbos developed backpropagation\n",
    "(1974), Hopfield connected neural networks to physics (1982), and scattered researchers\n",
    "kept the flame alive. The lesson: important ideas can survive decades of neglect if a\n",
    "few committed researchers persist.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2e",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "**Key Takeaway: 1986+ -- The Backpropagation Renaissance**\n",
    "\n",
    "Rumelhart, Hinton, and Williams did not just present an algorithm -- they demonstrated\n",
    "empirically that multi-layer networks could learn meaningful internal representations.\n",
    "Combined with the Universal Approximation Theorem (1989), this provided both the tool\n",
    "(backprop) and the theoretical guarantee (UAT) that the tool was powerful enough. The\n",
    "lesson: a breakthrough needs both an algorithm that works in practice *and* a theory\n",
    "that explains why.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3",
   "metadata": {},
   "source": [
    "## 20.3 The Recurring Themes\n",
    "\n",
    "Three themes recur throughout the entire history:\n",
    "\n",
    "### 1. Representation\n",
    "\n",
    "*What can a network compute?*\n",
    "\n",
    "- McCulloch-Pitts: Any Boolean function (with hand-set weights)\n",
    "- Perceptron: Linearly separable functions only\n",
    "- MLP: Any continuous function (UAT)\n",
    "\n",
    "### 2. Learning\n",
    "\n",
    "*How does a network acquire its computation?*\n",
    "\n",
    "- McCulloch-Pitts: No learning (weights fixed by design)\n",
    "- Hebb: Unsupervised correlation-based learning\n",
    "- Perceptron: Supervised, single-layer learning with convergence guarantee\n",
    "- Backpropagation: Supervised, multi-layer learning via gradient descent\n",
    "\n",
    "### 3. Universality\n",
    "\n",
    "*Are there fundamental limits?*\n",
    "\n",
    "- Perceptron: Yes -- linear separability barrier\n",
    "- MLP with backprop: Representationally universal (UAT)\n",
    "- In practice: Depth matters, optimization is hard, generalization is subtle"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3b",
   "metadata": {},
   "source": [
    "```{admonition} Historical Reflection: The Sociology of Science\n",
    ":class: note\n",
    "\n",
    "The history of neural networks teaches us that scientific progress is not purely a function\n",
    "of ideas and evidence. **Funding**, **fashion**, and **personalities** play enormous roles.\n",
    "Minsky's outsized influence at MIT was as important as his mathematics in shaping the AI\n",
    "winter. Hinton's persistence during the dark years was as important as backpropagation\n",
    "itself in enabling the revival. Students of science should study not just the theorems,\n",
    "but the humans who proved (or failed to prove) them.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": "## 20.4 Side-by-Side Comparison\n\n| Property | M-P Neuron (1943) | Perceptron (1958) | MLP + Backprop (1986) |\n|----------|-------------------|-------------------|-----------------------|\n| **Architecture** | Single threshold unit | Single layer | Multiple layers |\n| **Activation** | Binary step | Binary step | Sigmoid (later ReLU) |\n| **Learning** | None | Perceptron rule | Backpropagation |\n| **Can learn?** | No | Yes (linearly separable) | Can approximate any continuous function on compact domains, in principle |\n| **XOR?** | Yes (manual) | No | Yes |\n| **Theory** | Boolean completeness | Convergence theorem | Universal approximation |\n| **Biological basis** | High | Moderate | Low |\n| **Parameters** | Hand-designed | Learned (single layer) | Learned (all layers) |\n| **Key limitation** | No learning | Linear separability | Vanishing gradient (for sigmoid) |"
  },
  {
   "cell_type": "markdown",
   "id": "cell-5",
   "metadata": {},
   "source": [
    "## 20.5 The Three Key Breakthroughs\n",
    "\n",
    "### Breakthrough 1: The Formal Neuron (1943)\n",
    "\n",
    "McCulloch and Pitts showed that neural computation could be formalized mathematically.\n",
    "This was the foundational insight: the brain's computation can be modeled, analyzed, and\n",
    "potentially replicated.\n",
    "\n",
    "**Impact**: Created the field of computational neuroscience and inspired AI.\n",
    "\n",
    "### Breakthrough 2: Learning Algorithms (1958)\n",
    "\n",
    "Rosenblatt's perceptron showed that machines could learn from examples. The convergence\n",
    "theorem provided the first mathematical guarantee for a learning algorithm.\n",
    "\n",
    "**Impact**: Demonstrated that learning -- not just computation -- could be automated.\n",
    "\n",
    "### Breakthrough 3: Deep Learning via Backpropagation (1986)\n",
    "\n",
    "Rumelhart, Hinton, and Williams showed that hidden-layer representations could be learned\n",
    "automatically. Backpropagation solved the credit assignment problem.\n",
    "\n",
    "**Impact**: Enabled the training of multi-layer networks, overcoming the linear separability\n",
    "barrier and eventually leading to the deep learning revolution."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-5b",
   "metadata": {},
   "source": [
    "```{danger}\n",
    "**Lessons from AI Winters That Are STILL Relevant Today**\n",
    "\n",
    "The AI winter of 1969--1986 was not just a historical curiosity. Its causes are structural\n",
    "and recurring:\n",
    "\n",
    "1. **Overpromising capabilities leads to backlash.** Rosenblatt and the media promised\n",
    "   machines that could \"see, walk, talk, and be conscious.\" When the perceptron could\n",
    "   not even learn XOR, the disillusionment was proportional to the hype. Today's claims\n",
    "   about AGI invite the same risk.\n",
    "\n",
    "2. **A single negative result can derail an entire field.** Minsky and Papert's book\n",
    "   proved a narrow result (limitations of *single-layer* perceptrons), but it was widely\n",
    "   interpreted as proving neural networks were fundamentally flawed. One influential\n",
    "   critique, amplified by institutional power, froze a generation of research.\n",
    "\n",
    "3. **Fundamental advances often come from revisiting \"dead\" ideas.** Backpropagation was\n",
    "   essentially reverse-mode automatic differentiation applied to neural networks -- an idea\n",
    "   that could have been developed decades earlier. The key insight (Werbos, 1974) came from\n",
    "   someone willing to work on an \"unfashionable\" topic.\n",
    "\n",
    "4. **The gap between \"existence proof\" and \"practical algorithm\" can be decades.** Everyone\n",
    "   knew multi-layer networks could solve XOR. But without a training algorithm, that\n",
    "   knowledge was useless. The UAT (1989) proved universality, but practical deep learning\n",
    "   took another 20+ years.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-5c",
   "metadata": {},
   "source": [
    "```{warning}\n",
    "**History Repeats: The Current AI Hype Cycle Has Parallels to the 1960s**\n",
    "\n",
    "Consider the parallels:\n",
    "\n",
    "| 1960s | 2020s |\n",
    "|-------|-------|\n",
    "| \"The perceptron will be conscious\" | \"AGI is 2--5 years away\" |\n",
    "| Media amplifies modest results | Media amplifies benchmark scores |\n",
    "| Funding pours into a narrow approach | Billions flow into scaling LLMs |\n",
    "| Fundamental limitations ignored | Hallucination, reasoning limits ignored |\n",
    "| One negative result triggers winter | What will be this era's *Perceptrons* book? |\n",
    "\n",
    "This is not to say current AI is overhyped -- the capabilities are genuinely remarkable.\n",
    "But the *pattern* of hype-backlash-winter is a sociological dynamic that operates\n",
    "independently of technical merit. Wise practitioners manage expectations carefully.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-6",
   "metadata": {},
   "source": [
    "## 20.6 The Three Key Obstacles\n",
    "\n",
    "### Obstacle 1: No Learning Mechanism (1943--1958)\n",
    "\n",
    "McCulloch-Pitts neurons could compute but not learn. Weights had to be set by hand.\n",
    "\n",
    "**Solved by**: Rosenblatt's perceptron learning rule (1958).\n",
    "\n",
    "### Obstacle 2: Linear Separability Barrier (1958--1986)\n",
    "\n",
    "Minsky and Papert proved that single-layer perceptrons cannot learn non-linearly-separable\n",
    "functions. Hidden layers were needed but could not be trained.\n",
    "\n",
    "**Solved by**: Multi-layer networks trained with backpropagation (1986).\n",
    "\n",
    "### Obstacle 3: Credit Assignment Problem (1969--1986)\n",
    "\n",
    "Given an error at the output, how do we determine which hidden-layer weights are responsible?\n",
    "\n",
    "**Solved by**: Backpropagation (chain rule applied to compute exact gradients through all layers)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-6b",
   "metadata": {},
   "source": [
    "```{admonition} Historical Reflection: The Unsung Heroes\n",
    ":class: note\n",
    "\n",
    "The standard narrative credits McCulloch-Pitts, Rosenblatt, and Rumelhart-Hinton-Williams.\n",
    "But several crucial contributors are often overlooked:\n",
    "\n",
    "- **Paul Werbos** (1974) developed backpropagation in his PhD thesis -- 12 years before\n",
    "  RHW's famous *Nature* paper. He did this during the AI winter, when neural networks\n",
    "  were considered a dead end.\n",
    "- **Seppo Linnainmaa** (1970) invented reverse-mode automatic differentiation -- the\n",
    "  mathematical foundation of backpropagation -- as part of his Master's thesis.\n",
    "- **John Hopfield** (1982) revived interest in neural networks from outside the AI\n",
    "  community, using physics (energy functions, Boltzmann distributions) to make neural\n",
    "  networks respectable again.\n",
    "- **Yann LeCun** (1985) independently developed backpropagation in France, before the\n",
    "  RHW paper.\n",
    "\n",
    "The lesson: breakthroughs often have multiple independent discoverers, and priority does\n",
    "not always go to the first but to the most effectively communicated.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-7",
   "metadata": {},
   "source": [
    "## 20.7 What Comes Next\n",
    "\n",
    "The classical foundations covered in this course (1943--1989) established the core principles.\n",
    "The modern era builds upon them:\n",
    "\n",
    "### Convolutional Neural Networks (CNNs)\n",
    "- LeCun et al. (1989): LeNet for handwritten digit recognition\n",
    "- Krizhevsky et al. (2012): AlexNet -- the deep learning revolution in computer vision\n",
    "- Key idea: weight sharing exploits spatial structure\n",
    "\n",
    "### Recurrent Neural Networks (RNNs)\n",
    "- Elman (1990), Jordan (1986): Processing sequences\n",
    "- Backpropagation through time (BPTT)\n",
    "- Key idea: shared weights across time steps\n",
    "\n",
    "### Long Short-Term Memory (LSTM)\n",
    "- Hochreiter & Schmidhuber (1997)\n",
    "- Solved the vanishing gradient problem for sequences\n",
    "- Key idea: gated memory cells\n",
    "\n",
    "### Attention and Transformers\n",
    "- Bahdanau et al. (2014): Attention mechanism\n",
    "- Vaswani et al. (2017): \"Attention Is All You Need\" -- the Transformer\n",
    "- Key idea: self-attention replaces recurrence with parallel computation\n",
    "- Foundation of GPT, BERT, and modern large language models\n",
    "\n",
    "All of these architectures rely on the same core machinery: **parameterized differentiable\n",
    "functions trained by gradient descent via backpropagation**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-7b",
   "metadata": {},
   "source": [
    "```{admonition} Historical Reflection: The Long Road from Theory to Practice\n",
    ":class: note\n",
    "\n",
    "Consider the timeline from theoretical possibility to practical impact:\n",
    "\n",
    "- **1943**: McCulloch-Pitts prove Boolean completeness.\n",
    "  - **Time to practical learning**: 15 years (perceptron, 1958).\n",
    "- **1969**: Everyone knows multi-layer networks can solve XOR.\n",
    "  - **Time to practical training**: 17 years (backpropagation popularized, 1986).\n",
    "- **1989**: Universal Approximation Theorem proved.\n",
    "  - **Time to practical deep learning**: 23 years (AlexNet, 2012).\n",
    "\n",
    "Knowing something is *possible* and knowing how to *do it efficiently* are very different.\n",
    "The gap is always filled by engineering, hardware, data, and persistence.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# ============================================================\n",
    "# Final Comprehensive Demo: Two Moons Classification\n",
    "# ============================================================\n",
    "\n",
    "np.random.seed(42)\n",
    "\n",
    "def make_moons(n_samples=500, noise=0.1):\n",
    "    \"\"\"Generate two-moons dataset.\"\"\"\n",
    "    n = n_samples // 2\n",
    "    # Upper moon\n",
    "    theta1 = np.linspace(0, np.pi, n)\n",
    "    x1 = np.cos(theta1) + np.random.randn(n) * noise\n",
    "    y1 = np.sin(theta1) + np.random.randn(n) * noise\n",
    "    # Lower moon (shifted)\n",
    "    theta2 = np.linspace(0, np.pi, n)\n",
    "    x2 = 1 - np.cos(theta2) + np.random.randn(n) * noise\n",
    "    y2 = 1 - np.sin(theta2) - 0.5 + np.random.randn(n) * noise\n",
    "    \n",
    "    X = np.vstack([np.hstack([x1, x2]), np.hstack([y1, y2])])\n",
    "    Y = np.hstack([np.zeros(n), np.ones(n)]).reshape(1, -1)\n",
    "    return X, Y\n",
    "\n",
    "X_moons, Y_moons = make_moons(n_samples=600, noise=0.15)\n",
    "\n",
    "# Build a Neural Network (reusing the class from Chapter 18)\n",
    "class NeuralNetwork:\n",
    "    def __init__(self, layer_sizes, activation='sigmoid'):\n",
    "        self.layer_sizes = layer_sizes\n",
    "        self.L = len(layer_sizes) - 1\n",
    "        self.activation_name = activation\n",
    "        self.weights = []\n",
    "        self.biases = []\n",
    "        for l in range(self.L):\n",
    "            n_in, n_out = layer_sizes[l], layer_sizes[l+1]\n",
    "            W = np.random.randn(n_out, n_in) * np.sqrt(2.0 / (n_in + n_out))\n",
    "            b = np.zeros((n_out, 1))\n",
    "            self.weights.append(W)\n",
    "            self.biases.append(b)\n",
    "        self.z_cache = []\n",
    "        self.a_cache = []\n",
    "    \n",
    "    def _activation(self, z):\n",
    "        if self.activation_name == 'sigmoid':\n",
    "            return 1.0 / (1.0 + np.exp(-np.clip(z, -500, 500)))\n",
    "        elif self.activation_name == 'relu':\n",
    "            return np.maximum(0, z)\n",
    "    \n",
    "    def _activation_derivative(self, z):\n",
    "        if self.activation_name == 'sigmoid':\n",
    "            s = self._activation(z)\n",
    "            return s * (1 - s)\n",
    "        elif self.activation_name == 'relu':\n",
    "            return (z > 0).astype(float)\n",
    "    \n",
    "    def forward(self, X):\n",
    "        self.z_cache = []\n",
    "        self.a_cache = [X]\n",
    "        a = X\n",
    "        for l in range(self.L):\n",
    "            z = self.weights[l] @ a + self.biases[l]\n",
    "            a = self._activation(z)\n",
    "            self.z_cache.append(z)\n",
    "            self.a_cache.append(a)\n",
    "        return a\n",
    "    \n",
    "    def compute_loss(self, y_hat, Y):\n",
    "        m = Y.shape[1]\n",
    "        return 0.5 * np.sum((y_hat - Y)**2) / m\n",
    "    \n",
    "    def backward(self, Y):\n",
    "        m = Y.shape[1]\n",
    "        dW = [None] * self.L\n",
    "        db = [None] * self.L\n",
    "        a_L = self.a_cache[-1]\n",
    "        dL_da = (a_L - Y) / m\n",
    "        sigma_prime = self._activation_derivative(self.z_cache[-1])\n",
    "        delta = dL_da * sigma_prime\n",
    "        dW[-1] = delta @ self.a_cache[-2].T\n",
    "        db[-1] = np.sum(delta, axis=1, keepdims=True)\n",
    "        for l in range(self.L - 2, -1, -1):\n",
    "            sigma_prime = self._activation_derivative(self.z_cache[l])\n",
    "            delta = (self.weights[l+1].T @ delta) * sigma_prime\n",
    "            dW[l] = delta @ self.a_cache[l].T\n",
    "            db[l] = np.sum(delta, axis=1, keepdims=True)\n",
    "        return dW, db\n",
    "    \n",
    "    def train(self, X, Y, epochs, eta, verbose=True):\n",
    "        losses = []\n",
    "        for epoch in range(epochs):\n",
    "            y_hat = self.forward(X)\n",
    "            loss = self.compute_loss(y_hat, Y)\n",
    "            losses.append(loss)\n",
    "            dW, db = self.backward(Y)\n",
    "            for l in range(self.L):\n",
    "                self.weights[l] -= eta * dW[l]\n",
    "                self.biases[l] -= eta * db[l]\n",
    "            if verbose and (epoch % 500 == 0 or epoch == epochs - 1):\n",
    "                print(f\"Epoch {epoch:5d}: Loss = {loss:.6f}\")\n",
    "        return losses\n",
    "\n",
    "# Train on two-moons\n",
    "print(\"Training a 2-16-8-1 network on the two-moons dataset...\\n\")\n",
    "nn = NeuralNetwork([2, 16, 8, 1], activation='sigmoid')\n",
    "losses = nn.train(X_moons, Y_moons, epochs=5000, eta=5.0)\n",
    "\n",
    "# Compute accuracy\n",
    "y_pred = nn.forward(X_moons)\n",
    "accuracy = np.mean((y_pred > 0.5).astype(float) == Y_moons)\n",
    "print(f\"\\nFinal accuracy: {accuracy*100:.1f}%\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-9",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Visualization: Decision Boundary + Hidden Representations + Timeline\n",
    "\n",
    "fig, axes = plt.subplots(2, 2, figsize=(16, 14))\n",
    "\n",
    "# (1) Training loss\n",
    "axes[0, 0].plot(losses, linewidth=2, color='navy')\n",
    "axes[0, 0].set_xlabel('Epoch', fontsize=12)\n",
    "axes[0, 0].set_ylabel('MSE Loss', fontsize=12)\n",
    "axes[0, 0].set_title('Training Loss', fontsize=13)\n",
    "axes[0, 0].set_yscale('log')\n",
    "axes[0, 0].grid(True, alpha=0.3)\n",
    "\n",
    "# (2) Decision boundary\n",
    "xx, yy = np.meshgrid(np.linspace(-1.5, 2.5, 300), np.linspace(-1.5, 2.0, 300))\n",
    "grid = np.c_[xx.ravel(), yy.ravel()].T\n",
    "z_grid = nn.forward(grid).reshape(xx.shape)\n",
    "\n",
    "axes[0, 1].contourf(xx, yy, z_grid, levels=50, cmap='RdBu_r', alpha=0.8)\n",
    "axes[0, 1].contour(xx, yy, z_grid, levels=[0.5], colors='black', linewidths=2)\n",
    "colors = ['red' if y == 0 else 'blue' for y in Y_moons[0]]\n",
    "axes[0, 1].scatter(X_moons[0], X_moons[1], c=colors, s=10, alpha=0.5, edgecolors='none')\n",
    "axes[0, 1].set_xlabel('$x_1$', fontsize=12)\n",
    "axes[0, 1].set_ylabel('$x_2$', fontsize=12)\n",
    "axes[0, 1].set_title('Learned Decision Boundary', fontsize=13)\n",
    "\n",
    "# (3) Hidden layer representations (layer 1 activations)\n",
    "_ = nn.forward(X_moons)  # populate cache\n",
    "h1 = nn.a_cache[1]  # first hidden layer activations, shape (16, 600)\n",
    "\n",
    "# Use first 2 hidden units for visualization\n",
    "axes[1, 0].scatter(h1[0], h1[1], c=colors, s=10, alpha=0.5, edgecolors='none')\n",
    "axes[1, 0].set_xlabel('Hidden unit 1', fontsize=12)\n",
    "axes[1, 0].set_ylabel('Hidden unit 2', fontsize=12)\n",
    "axes[1, 0].set_title('Hidden Layer 1 Representation (units 1 & 2)', fontsize=13)\n",
    "axes[1, 0].grid(True, alpha=0.3)\n",
    "\n",
    "# (4) Historical timeline\n",
    "ax_timeline = axes[1, 1]\n",
    "ax_timeline.set_xlim(1940, 2000)\n",
    "ax_timeline.set_ylim(-1, 1)\n",
    "ax_timeline.axhline(y=0, color='black', linewidth=2)\n",
    "\n",
    "milestones = [\n",
    "    (1943, 'McCulloch-\\nPitts', 0.5),\n",
    "    (1949, 'Hebb', -0.5),\n",
    "    (1958, 'Perceptron', 0.5),\n",
    "    (1969, 'Minsky-\\nPapert', -0.5),\n",
    "    (1974, 'Werbos\\n(backprop)', 0.5),\n",
    "    (1982, 'Hopfield', -0.5),\n",
    "    (1986, 'RHW\\n(backprop)', 0.5),\n",
    "    (1989, 'UAT', -0.5),\n",
    "]\n",
    "\n",
    "for year, label, y_pos in milestones:\n",
    "    color = 'green' if y_pos > 0 else 'darkorange'\n",
    "    ax_timeline.plot(year, 0, 'o', color=color, markersize=10, zorder=5)\n",
    "    ax_timeline.plot([year, year], [0, y_pos * 0.7], '-', color=color, linewidth=1.5)\n",
    "    ax_timeline.text(year, y_pos * 0.85, label, ha='center', va='center', fontsize=9,\n",
    "                     fontweight='bold', color=color)\n",
    "\n",
    "# AI Winter shading\n",
    "ax_timeline.axvspan(1969, 1982, alpha=0.1, color='blue', label='AI Winter')\n",
    "ax_timeline.text(1975.5, 0.9, 'AI Winter', ha='center', fontsize=10, color='blue', style='italic')\n",
    "\n",
    "ax_timeline.set_xlabel('Year', fontsize=12)\n",
    "ax_timeline.set_title('The Complete Timeline: 1943-1989', fontsize=13)\n",
    "ax_timeline.set_yticks([])\n",
    "ax_timeline.grid(True, alpha=0.2, axis='x')\n",
    "\n",
    "plt.suptitle('Chapter 20: From McCulloch-Pitts to Backpropagation',\n",
    "             fontsize=15, fontweight='bold', y=1.01)\n",
    "plt.tight_layout()\n",
    "plt.savefig('synthesis_final.png', dpi=150, bbox_inches='tight')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-arc",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as mpatches\n",
    "from matplotlib.patches import FancyArrowPatch\n",
    "\n",
    "# ============================================================\n",
    "# Complete Arc Diagram: The Full Neural Network Timeline\n",
    "# Color-coded periods with key milestones and connecting arrows\n",
    "# ============================================================\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(12, 8))\n",
    "\n",
    "# Define eras with colors\n",
    "eras = [\n",
    "    (1943, 1958, 'Birth of\\nComputational\\nNeuroscience', '#2196F3', 0.12),\n",
    "    (1958, 1969, 'Perceptron\\nGolden Age', '#4CAF50', 0.12),\n",
    "    (1969, 1986, 'AI Winter &\\nUnderground\\nWork', '#9E9E9E', 0.12),\n",
    "    (1986, 1995, 'Backpropagation\\nRenaissance', '#FF9800', 0.12),\n",
    "]\n",
    "\n",
    "# Draw era backgrounds\n",
    "for start, end, label, color, alpha in eras:\n",
    "    ax.axvspan(start, end, alpha=alpha, color=color, zorder=0)\n",
    "    mid = (start + end) / 2\n",
    "    ax.text(mid, 9.2, label, ha='center', va='center', fontsize=9,\n",
    "            fontweight='bold', color=color, alpha=0.9,\n",
    "            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', edgecolor=color, alpha=0.9))\n",
    "\n",
    "# Main timeline axis\n",
    "ax.axhline(y=5, color='#333333', linewidth=3, zorder=2)\n",
    "\n",
    "# Milestones: (year, label, y_offset_direction, description, color)\n",
    "milestones = [\n",
    "    (1943, 'McCulloch-Pitts\\nFormal Neuron', 1, 'Boolean completeness\\nproved', '#1565C0'),\n",
    "    (1949, 'Hebb\\nLearning Rule', -1, '\"Fire together,\\nwire together\"', '#1565C0'),\n",
    "    (1958, 'Rosenblatt\\nPerceptron', 1, 'First learning\\nmachine', '#2E7D32'),\n",
    "    (1962, 'Novikoff\\nConvergence Proof', -1, 'Finite-step\\nguarantee', '#2E7D32'),\n",
    "    (1969, 'Minsky & Papert\\nPerceptrons', 1, 'XOR impossibility\\nfor single layer', '#C62828'),\n",
    "    (1974, 'Werbos\\nBackpropagation', -1, 'Reverse-mode AD\\nfor neural nets', '#616161'),\n",
    "    (1982, 'Hopfield\\nNetworks', 1, 'Physics revives\\nneural nets', '#616161'),\n",
    "    (1986, 'Rumelhart, Hinton\\n& Williams', -1, 'Backprop in\\nNature', '#E65100'),\n",
    "    (1989, 'Cybenko / Hornik\\nUAT', 1, 'Universal\\napproximation', '#E65100'),\n",
    "]\n",
    "\n",
    "for i, (year, label, direction, desc, color) in enumerate(milestones):\n",
    "    y_dot = 5\n",
    "    y_label = 5 + direction * 2.8\n",
    "    y_desc = 5 + direction * 1.6\n",
    "    \n",
    "    # Milestone dot\n",
    "    ax.plot(year, y_dot, 'o', color=color, markersize=14, zorder=5,\n",
    "            markeredgecolor='white', markeredgewidth=2)\n",
    "    \n",
    "    # Connecting line\n",
    "    ax.plot([year, year], [y_dot, y_label - direction * 0.3], '-',\n",
    "            color=color, linewidth=1.5, zorder=3, alpha=0.7)\n",
    "    \n",
    "    # Year label\n",
    "    ax.text(year, y_dot - direction * 0.4, str(year), ha='center', va='center',\n",
    "            fontsize=8, fontweight='bold', color='#333333')\n",
    "    \n",
    "    # Milestone name\n",
    "    ax.text(year, y_label, label, ha='center', va='center',\n",
    "            fontsize=8, fontweight='bold', color=color)\n",
    "    \n",
    "    # Description\n",
    "    ax.text(year, y_desc, desc, ha='center', va='center',\n",
    "            fontsize=7, color='#555555', style='italic')\n",
    "\n",
    "# Draw connecting arrows between key breakthroughs\n",
    "arrow_connections = [\n",
    "    (1943, 1958, 'Adds learning', 3.2),\n",
    "    (1958, 1969, 'Proves limits', 7.0),\n",
    "    (1969, 1986, 'Overcomes limits', 3.2),\n",
    "    (1986, 1989, 'Proves universality', 7.0),\n",
    "]\n",
    "\n",
    "for start_yr, end_yr, label, y_arc in arrow_connections:\n",
    "    mid = (start_yr + end_yr) / 2\n",
    "    # Draw curved arrow\n",
    "    arrow = FancyArrowPatch(\n",
    "        (start_yr, y_arc), (end_yr, y_arc),\n",
    "        connectionstyle=f'arc3,rad={0.3 if y_arc > 5 else -0.3}',\n",
    "        arrowstyle='->', color='#888888', linewidth=1.5,\n",
    "        mutation_scale=15, zorder=1\n",
    "    )\n",
    "    ax.add_patch(arrow)\n",
    "    # Arrow label\n",
    "    y_text = y_arc + (0.5 if y_arc > 5 else -0.5)\n",
    "    ax.text(mid, y_text, label, ha='center', va='center',\n",
    "            fontsize=7, color='#888888', style='italic')\n",
    "\n",
    "# Formatting\n",
    "ax.set_xlim(1939, 1997)\n",
    "ax.set_ylim(0.5, 10.5)\n",
    "ax.set_xlabel('Year', fontsize=12)\n",
    "ax.set_yticks([])\n",
    "ax.set_title('The Complete Arc: From McCulloch-Pitts to Universal Approximation',\n",
    "             fontsize=14, fontweight='bold', pad=15)\n",
    "\n",
    "# Legend\n",
    "legend_patches = [\n",
    "    mpatches.Patch(color='#2196F3', alpha=0.3, label='Birth (1943-1958)'),\n",
    "    mpatches.Patch(color='#4CAF50', alpha=0.3, label='Golden Age (1958-1969)'),\n",
    "    mpatches.Patch(color='#9E9E9E', alpha=0.3, label='AI Winter (1969-1986)'),\n",
    "    mpatches.Patch(color='#FF9800', alpha=0.3, label='Renaissance (1986+)'),\n",
    "]\n",
    "ax.legend(handles=legend_patches, loc='lower right', fontsize=9,\n",
    "          framealpha=0.9, edgecolor='#cccccc')\n",
    "\n",
    "ax.spines['top'].set_visible(False)\n",
    "ax.spines['right'].set_visible(False)\n",
    "ax.spines['left'].set_visible(False)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-table",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# ============================================================\n",
    "# Summary Table of ALL Key Results in the Book\n",
    "# ============================================================\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(12, 8))\n",
    "ax.axis('off')\n",
    "\n",
    "# Table data\n",
    "columns = ['Year', 'Author(s)', 'Result', 'Ch. Ref.', 'Significance']\n",
    "data = [\n",
    "    ['1943', 'McCulloch & Pitts', 'Formal neuron model', 'Ch. 1-3',\n",
    "     'Any Boolean function computable'],\n",
    "    ['1949', 'Hebb', 'Hebbian learning rule', 'Ch. 4-6',\n",
    "     'First learning principle'],\n",
    "    ['1958', 'Rosenblatt', 'Perceptron algorithm', 'Ch. 7-9',\n",
    "     'First learning machine'],\n",
    "    ['1962', 'Novikoff', 'Convergence theorem', 'Ch. 10',\n",
    "     'Finite-step guarantee'],\n",
    "    ['1969', 'Minsky & Papert', 'Linear separability limits', 'Ch. 11-12',\n",
    "     'XOR impossibility (single layer)'],\n",
    "    ['1970', 'Linnainmaa', 'Reverse-mode AD', 'Ch. 14',\n",
    "     'Mathematical basis for backprop'],\n",
    "    ['1974', 'Werbos', 'Backprop for neural nets', 'Ch. 14-15',\n",
    "     'Credit assignment solved'],\n",
    "    ['1982', 'Oja', 'PCA via Hebbian learning', 'Ch. 6',\n",
    "     'Stabilized Hebb rule'],\n",
    "    ['1982', 'Hopfield', 'Energy-based networks', 'Ch. 13',\n",
    "     'Physics revives the field'],\n",
    "    ['1986', 'Rumelhart et al.', 'Backprop popularized', 'Ch. 15-17',\n",
    "     'Hidden representations learned'],\n",
    "    ['1989', 'Cybenko', 'UAT (sigmoidal)', 'Ch. 18-19',\n",
    "     'One hidden layer suffices'],\n",
    "    ['1989', 'Hornik et al.', 'UAT (general)', 'Ch. 18-19',\n",
    "     'Universal approximation proved'],\n",
    "]\n",
    "\n",
    "# Color rows by era\n",
    "era_colors = {\n",
    "    'birth': '#E3F2FD',       # blue - light\n",
    "    'golden': '#E8F5E9',      # green - light\n",
    "    'winter': '#F5F5F5',      # grey - light\n",
    "    'renaissance': '#FFF3E0', # orange - light\n",
    "}\n",
    "\n",
    "row_colors = [\n",
    "    era_colors['birth'],      # 1943 M-P\n",
    "    era_colors['birth'],      # 1949 Hebb\n",
    "    era_colors['golden'],     # 1958 Rosenblatt\n",
    "    era_colors['golden'],     # 1962 Novikoff\n",
    "    era_colors['winter'],     # 1969 M&P\n",
    "    era_colors['winter'],     # 1970 Linnainmaa\n",
    "    era_colors['winter'],     # 1974 Werbos\n",
    "    era_colors['winter'],     # 1982 Oja\n",
    "    era_colors['winter'],     # 1982 Hopfield\n",
    "    era_colors['renaissance'],# 1986 RHW\n",
    "    era_colors['renaissance'],# 1989 Cybenko\n",
    "    era_colors['renaissance'],# 1989 Hornik\n",
    "]\n",
    "\n",
    "table = ax.table(\n",
    "    cellText=data,\n",
    "    colLabels=columns,\n",
    "    cellLoc='center',\n",
    "    loc='center',\n",
    "    colWidths=[0.06, 0.16, 0.22, 0.08, 0.30]\n",
    ")\n",
    "\n",
    "# Style the table\n",
    "table.auto_set_font_size(False)\n",
    "table.set_fontsize(9)\n",
    "table.scale(1.0, 1.8)\n",
    "\n",
    "# Header styling\n",
    "for j in range(len(columns)):\n",
    "    cell = table[0, j]\n",
    "    cell.set_facecolor('#37474F')\n",
    "    cell.set_text_props(color='white', fontweight='bold', fontsize=10)\n",
    "\n",
    "# Row styling\n",
    "for i in range(len(data)):\n",
    "    for j in range(len(columns)):\n",
    "        cell = table[i + 1, j]\n",
    "        cell.set_facecolor(row_colors[i])\n",
    "        cell.set_edgecolor('#BDBDBD')\n",
    "        if j == 0:  # Year column bold\n",
    "            cell.set_text_props(fontweight='bold')\n",
    "\n",
    "ax.set_title('Summary of Key Results Across All Chapters',\n",
    "             fontsize=14, fontweight='bold', pad=20)\n",
    "\n",
    "# Era legend below table\n",
    "legend_text = ('Color coding:  '\n",
    "               'Blue = Birth (1943-1958)  |  '\n",
    "               'Green = Golden Age (1958-1969)  |  '\n",
    "               'Grey = AI Winter (1969-1986)  |  '\n",
    "               'Orange = Renaissance (1986+)')\n",
    "fig.text(0.5, 0.02, legend_text, ha='center', fontsize=9, style='italic', color='#555555')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-capability",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as mpatches\n",
    "\n",
    "# ============================================================\n",
    "# Capability Evolution Plot\n",
    "# What could be computed at each historical stage?\n",
    "# ============================================================\n",
    "\n",
    "np.random.seed(42)\n",
    "\n",
    "fig, axes = plt.subplots(2, 4, figsize=(12, 8))\n",
    "\n",
    "# ---- Top row: The function classes at each stage ----\n",
    "\n",
    "# 1. McCulloch-Pitts: Logic gates (AND, OR, NOT)\n",
    "ax = axes[0, 0]\n",
    "ax.set_title('McCulloch-Pitts (1943)\\nLogic Gates', fontsize=9, fontweight='bold',\n",
    "             color='#1565C0')\n",
    "# Draw AND gate truth table\n",
    "inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]\n",
    "and_out = [0, 0, 0, 1]\n",
    "for (x1, x2), y in zip(inputs, and_out):\n",
    "    color = '#1565C0' if y == 1 else '#BBDEFB'\n",
    "    marker = 's' if y == 1 else 'o'\n",
    "    ax.plot(x1, x2, marker, color=color, markersize=20, markeredgecolor='#333')\n",
    "ax.set_xlim(-0.5, 1.5)\n",
    "ax.set_ylim(-0.5, 1.5)\n",
    "ax.set_xlabel('$x_1$', fontsize=9)\n",
    "ax.set_ylabel('$x_2$', fontsize=9)\n",
    "ax.text(0.5, -0.35, 'AND gate\\n(hand-wired)', ha='center', fontsize=8, color='#555')\n",
    "ax.set_xticks([0, 1])\n",
    "ax.set_yticks([0, 1])\n",
    "ax.grid(True, alpha=0.2)\n",
    "\n",
    "# 2. Perceptron: Linearly separable functions\n",
    "ax = axes[0, 1]\n",
    "ax.set_title('Perceptron (1958)\\nLinear Separation', fontsize=9, fontweight='bold',\n",
    "             color='#2E7D32')\n",
    "# Generate linearly separable data\n",
    "n_pts = 40\n",
    "class0 = np.random.randn(2, n_pts) * 0.4 + np.array([[-1], [0.5]])\n",
    "class1 = np.random.randn(2, n_pts) * 0.4 + np.array([[1], [-0.5]])\n",
    "ax.scatter(class0[0], class0[1], c='#C62828', s=20, alpha=0.7, label='Class 0')\n",
    "ax.scatter(class1[0], class1[1], c='#1565C0', s=20, alpha=0.7, label='Class 1')\n",
    "x_line = np.linspace(-2.5, 2.5, 100)\n",
    "ax.plot(x_line, x_line * 0.5, '--', color='#2E7D32', linewidth=2)\n",
    "ax.fill_between(x_line, x_line * 0.5, 2.5, alpha=0.05, color='#1565C0')\n",
    "ax.fill_between(x_line, -2.5, x_line * 0.5, alpha=0.05, color='#C62828')\n",
    "ax.set_xlim(-2.5, 2.5)\n",
    "ax.set_ylim(-2.5, 2.5)\n",
    "ax.set_xlabel('$x_1$', fontsize=9)\n",
    "ax.set_ylabel('$x_2$', fontsize=9)\n",
    "ax.text(0, -2.2, 'Linear boundary\\n(learned)', ha='center', fontsize=8, color='#555')\n",
    "ax.grid(True, alpha=0.2)\n",
    "\n",
    "# 3. MLP: Any Boolean function (XOR)\n",
    "ax = axes[0, 2]\n",
    "ax.set_title('MLP (multi-layer)\\nAny Boolean Function', fontsize=9, fontweight='bold',\n",
    "             color='#E65100')\n",
    "# XOR\n",
    "xor_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]\n",
    "xor_out = [0, 1, 1, 0]\n",
    "for (x1, x2), y in zip(xor_inputs, xor_out):\n",
    "    color = '#E65100' if y == 1 else '#FFE0B2'\n",
    "    marker = 's' if y == 1 else 'o'\n",
    "    ax.plot(x1, x2, marker, color=color, markersize=20, markeredgecolor='#333')\n",
    "# Draw XOR boundary (two lines)\n",
    "x_line = np.linspace(-0.3, 1.3, 100)\n",
    "ax.plot(x_line, 0.5 + 0.8 * (x_line - 0.5), '--', color='#E65100', linewidth=1.5, alpha=0.6)\n",
    "ax.plot(x_line, 0.5 - 0.8 * (x_line - 0.5), '--', color='#E65100', linewidth=1.5, alpha=0.6)\n",
    "ax.set_xlim(-0.5, 1.5)\n",
    "ax.set_ylim(-0.5, 1.5)\n",
    "ax.set_xlabel('$x_1$', fontsize=9)\n",
    "ax.set_ylabel('$x_2$', fontsize=9)\n",
    "ax.text(0.5, -0.35, 'XOR: non-linear\\nboundary needed', ha='center', fontsize=8, color='#555')\n",
    "ax.set_xticks([0, 1])\n",
    "ax.set_yticks([0, 1])\n",
    "ax.grid(True, alpha=0.2)\n",
    "\n",
    "# 4. Backprop + MLP: Any continuous function (UAT)\n",
    "ax = axes[0, 3]\n",
    "ax.set_title('Backprop + MLP (1986+)\\nAny Continuous Function', fontsize=9,\n",
    "             fontweight='bold', color='#6A1B9A')\n",
    "# Show a complex 1D function and its neural net approximation\n",
    "x_func = np.linspace(0, 2 * np.pi, 200)\n",
    "y_target = np.sin(x_func) + 0.3 * np.sin(3 * x_func) + 0.1 * np.cos(7 * x_func)\n",
    "# Simulate a neural net approximation (smooth version)\n",
    "y_approx = np.sin(x_func) + 0.28 * np.sin(3 * x_func) + 0.08 * np.cos(7 * x_func)\n",
    "ax.plot(x_func, y_target, '-', color='#333333', linewidth=2, label='Target $f(x)$')\n",
    "ax.plot(x_func, y_approx, '--', color='#6A1B9A', linewidth=2, label='NN approx.')\n",
    "ax.fill_between(x_func, y_target, y_approx, alpha=0.15, color='#6A1B9A')\n",
    "ax.set_xlabel('$x$', fontsize=9)\n",
    "ax.set_ylabel('$f(x)$', fontsize=9)\n",
    "ax.text(np.pi, -1.5, 'Arbitrary continuous\\nfunction (learned)', ha='center',\n",
    "        fontsize=8, color='#555')\n",
    "ax.legend(fontsize=7, loc='upper right')\n",
    "ax.grid(True, alpha=0.2)\n",
    "\n",
    "# ---- Bottom row: Capability summary bar chart ----\n",
    "\n",
    "# Merge bottom 4 axes into one\n",
    "for a in axes[1, :]:\n",
    "    a.remove()\n",
    "ax_bottom = fig.add_subplot(2, 1, 2)\n",
    "\n",
    "# Capability categories\n",
    "categories = ['Logic\\nGates', 'Linearly\\nSeparable', 'XOR /\\nParity', 'Any\\nBoolean',\n",
    "              'Smooth\\nFunctions', 'Universal\\nApprox.']\n",
    "n_cat = len(categories)\n",
    "\n",
    "# Models and their capabilities (1 = yes, 0 = no, 0.5 = partial)\n",
    "models = {\n",
    "    'McCulloch-Pitts (1943)': [1, 1, 1, 1, 0, 0],\n",
    "    'Perceptron (1958)':      [1, 1, 0, 0, 0, 0],\n",
    "    'MLP - no training':      [1, 1, 1, 1, 0.5, 0],\n",
    "    'MLP + Backprop (1986)':  [1, 1, 1, 1, 1, 1],\n",
    "}\n",
    "\n",
    "model_colors = ['#1565C0', '#2E7D32', '#E65100', '#6A1B9A']\n",
    "bar_width = 0.18\n",
    "x_pos = np.arange(n_cat)\n",
    "\n",
    "for i, (model_name, caps) in enumerate(models.items()):\n",
    "    offset = (i - 1.5) * bar_width\n",
    "    bars = ax_bottom.bar(x_pos + offset, caps, bar_width, label=model_name,\n",
    "                         color=model_colors[i], alpha=0.8, edgecolor='white')\n",
    "\n",
    "ax_bottom.set_xticks(x_pos)\n",
    "ax_bottom.set_xticklabels(categories, fontsize=9)\n",
    "ax_bottom.set_ylabel('Capability', fontsize=10)\n",
    "ax_bottom.set_yticks([0, 0.5, 1])\n",
    "ax_bottom.set_yticklabels(['No', 'Partial', 'Yes'], fontsize=9)\n",
    "ax_bottom.set_ylim(0, 1.3)\n",
    "ax_bottom.legend(fontsize=8, loc='upper left', ncol=2, framealpha=0.9)\n",
    "ax_bottom.grid(True, alpha=0.2, axis='y')\n",
    "ax_bottom.set_title('Capability Comparison Across Historical Stages', fontsize=11,\n",
    "                     fontweight='bold', pad=10)\n",
    "\n",
    "# Note about M-P\n",
    "ax_bottom.text(3.5, 1.2, 'Note: McCulloch-Pitts can compute Boolean functions\\n'\n",
    "               'but requires hand-designed weights (no learning).',\n",
    "               fontsize=8, style='italic', color='#777',\n",
    "               bbox=dict(boxstyle='round', facecolor='#f9f9f9', edgecolor='#ddd'))\n",
    "\n",
    "plt.suptitle('Capability Evolution: What Could Be Computed at Each Stage?',\n",
    "             fontsize=14, fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10",
   "metadata": {},
   "source": [
    "## 20.8 Reflection Questions\n",
    "\n",
    "1. **Why did it take 17 years** from the Minsky-Papert critique (1969) to the backpropagation\n",
    "   renaissance (1986)? What sociological and scientific factors contributed to the delay?\n",
    "\n",
    "2. **Biological plausibility vs. engineering utility**: Hebbian learning is biologically\n",
    "   plausible but limited. Backpropagation is powerful but biologically implausible. What\n",
    "   does this tension tell us about the relationship between neuroscience and AI?\n",
    "\n",
    "3. **The role of proofs**: How important were the formal proofs (perceptron convergence,\n",
    "   Minsky-Papert impossibility, universal approximation) in shaping the field's direction?\n",
    "   Could the field have progressed faster with more or fewer theoretical results?\n",
    "\n",
    "4. **Depth vs. width**: The Universal Approximation Theorem guarantees that width alone\n",
    "   suffices. Yet modern practice favors deep, narrow networks over wide, shallow ones.\n",
    "   Why? What does this say about the gap between existence proofs and practical algorithms?\n",
    "\n",
    "5. **Looking forward**: Which of the unsolved problems from the classical era (generalization,\n",
    "   efficiency, biological plausibility) do you think is most important for the future of AI?\n",
    "\n",
    "6. **The credit assignment problem revisited**: Backpropagation solves credit assignment\n",
    "   computationally. But does the brain solve the same problem? If so, how? If not, what\n",
    "   problem does it solve instead?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10b",
   "metadata": {},
   "source": [
    "```{admonition} Final Reflection: The Arc of Understanding\n",
    ":class: note\n",
    "\n",
    "We began this course with a question: *Can the brain's computation be formalized?*\n",
    "McCulloch and Pitts answered yes, in 1943. Each subsequent decade added another piece:\n",
    "learning (Hebb, Rosenblatt), the understanding of limits (Minsky-Papert), the ability\n",
    "to train deep networks (Werbos, Rumelhart-Hinton-Williams), and the proof of universality\n",
    "(Cybenko, Hornik).\n",
    "\n",
    "The full arc -- from formal neuron to universal approximator -- took 46 years.\n",
    "It required mathematicians, psychologists, physicists, and computer scientists.\n",
    "It survived two world wars' aftermath, an AI winter, and the rise and fall of\n",
    "multiple competing paradigms. And it produced the theoretical foundation upon which\n",
    "all of modern deep learning rests.\n",
    "\n",
    "That foundation -- **parameterized differentiable functions trained by gradient descent** --\n",
    "is the subject of this course, and the starting point for everything that comes next.\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}