{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": [
    "# Chapter 12: Hebbian Learning Theory\n",
    "\n",
    "\n",
    "## 12.1 Historical Context: Donald Hebb and the Birth of Synaptic Learning\n",
    "\n",
    "Donald Olding Hebb (1904--1985) was a Canadian neuropsychologist whose work laid the\n",
    "theoretical foundation for understanding how neural connections strengthen through experience.\n",
    "His magnum opus, *The Organization of Behavior* (1949), proposed a neurophysiological\n",
    "theory of learning that would profoundly influence both neuroscience and artificial intelligence.\n",
    "\n",
    "Hebb was trained under Karl Lashley at Harvard and later worked with Wilder Penfield at the\n",
    "Montreal Neurological Institute. His clinical observations of patients with brain lesions,\n",
    "combined with his theoretical inclinations, led him to propose a mechanistic account of\n",
    "learning at the synaptic level.\n",
    "\n",
    "At the time, the dominant view in psychology was behaviorism (Skinner, Watson), which\n",
    "deliberately avoided theorizing about internal brain mechanisms. Hebb broke with this\n",
    "tradition by proposing an explicit neural mechanism for associative learning."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-1",
   "metadata": {},
   "source": [
    "## 12.2 Hebb's Postulate\n",
    "\n",
    "### The Exact Statement\n",
    "\n",
    "```{admonition} Definition (Hebb's Learning Postulate, 1949)\n",
    ":class: note\n",
    "\n",
    "From *The Organization of Behavior* (1949, p. 62):\n",
    "\n",
    "> \"When an axon of cell A is near enough to excite a cell B and repeatedly or persistently\n",
    "> takes part in firing it, some growth process or metabolic change takes place in one or\n",
    "> both cells such that A's efficiency, as one of the cells firing B, is increased.\"\n",
    "```\n",
    "\n",
    "### The Popular Paraphrase\n",
    "\n",
    "```{tip}\n",
    "**\"Neurons that fire together wire together.\"** -- This is the most famous summary of Hebb's postulate, coined by Carla Shatz in 1992. While memorable, it loses several important nuances of Hebb's original statement (see the five key aspects below).\n",
    "```\n",
    "\n",
    "### Five Key Aspects of the Postulate\n",
    "\n",
    "A careful reading reveals five critical properties:\n",
    "\n",
    "| # | Aspect | Description |\n",
    "|---|--------|-------------|\n",
    "| 1 | **Directionality** | Cell A must contribute to firing cell B. The connection is from presynaptic (A) to postsynaptic (B). |\n",
    "| 2 | **Persistence** | The co-activation must be \"repeated or persistent\" -- not a single coincidence. |\n",
    "| 3 | **Locality** | The rule depends only on local information: the activities of the pre- and postsynaptic neurons. No global error signal is required. |\n",
    "| 4 | **Mechanism Agnosticism** | Hebb says \"some growth process or metabolic change\" -- he does not specify the biological mechanism. |\n",
    "| 5 | **Asymmetry** | A must take part in firing B. Mere simultaneous firing is not sufficient; A must causally contribute. |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": [
    "## 12.3 Cell Assemblies and Phase Sequences\n",
    "\n",
    "Hebb did not stop at the single-synapse level. He proposed that Hebbian learning leads to\n",
    "the formation of **cell assemblies** -- groups of neurons that become strongly interconnected\n",
    "through repeated co-activation.\n",
    "\n",
    "**Cell Assembly**: A set of neurons that, due to repeated Hebbian strengthening, can\n",
    "activate each other in a self-sustaining pattern. Once a subset of the assembly is activated,\n",
    "the entire assembly tends to become active.\n",
    "\n",
    "**Phase Sequence**: A temporal chain of cell assemblies, where the activation of one assembly\n",
    "triggers the next. This was Hebb's model for the stream of thought.\n",
    "\n",
    "These concepts anticipated modern ideas about:\n",
    "- **Attractor networks** in computational neuroscience\n",
    "- **Distributed representations** in connectionism\n",
    "- **Content-addressable memory** (Hopfield networks, 1982)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3",
   "metadata": {},
   "source": [
    "## 12.4 The Basic Hebbian Learning Rule\n",
    "\n",
    "### Mathematical Formulation\n",
    "\n",
    "Consider a single neuron with inputs $x_1, x_2, \\ldots, x_n$, weights $w_1, w_2, \\ldots, w_n$,\n",
    "and linear output:\n",
    "\n",
    "$$y = \\sum_{i=1}^n w_i x_i = \\mathbf{w}^\\top \\mathbf{x}$$\n",
    "\n",
    "```{admonition} Definition (Hebbian Update Rule)\n",
    ":class: note\n",
    "\n",
    "The **basic Hebbian learning rule** updates each weight proportionally to the product of the\n",
    "presynaptic input and the postsynaptic output:\n",
    "\n",
    "$$\\Delta w_i = \\eta \\, x_i \\, y$$\n",
    "\n",
    "where $\\eta > 0$ is the learning rate.\n",
    "\n",
    "In vector form:\n",
    "\n",
    "$$\\Delta \\mathbf{w} = \\eta \\, y \\, \\mathbf{x}$$\n",
    "```\n",
    "\n",
    "### Interpretation\n",
    "\n",
    "- If both $x_i > 0$ and $y > 0$ (both \"fire\"), then $\\Delta w_i > 0$ (synapse strengthens).\n",
    "- The update is proportional to the correlation between input and output.\n",
    "- This is a purely **local** rule: each synapse only needs to know the activities of the\n",
    "  neurons it connects.\n",
    "- No target value or error signal is required -- this is **unsupervised learning**.\n",
    "\n",
    "```{warning}\n",
    "Hebb's rule is **unsupervised** -- it does not use error signals. The rule only captures correlations in the input data and has no concept of a \"correct\" output. This means it cannot be used for classification, regression, or any task that requires a target signal.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "lbcpgk75qug",
   "source": "## 12.4b Hebbian Learning in Action: Classical Conditioning\n\nThe Hebbian learning rule may seem abstract, so let us see it solve a real problem:\n**classical conditioning** — exactly the phenomenon Hebb set out to explain.\n\n### Pavlov's Dog as a Neural Network\n\nIn Pavlov's famous experiment (1927), a dog learns to salivate at the sound of a bell\nafter the bell is repeatedly paired with food. We can model this with a single neuron:\n\n```\n  food ──[w_food = 1.0 (innate)]──→ ╭──────╮\n                                     │saliva│──→ y (response)\n  bell ──[w_bell = 0.0 (learned)]──→ ╰──────╯\n```\n\n- The food→saliva connection is **innate** (fixed weight $w_{\\text{food}} = 1$).\n- The bell→saliva connection is **initially zero** ($w_{\\text{bell}} = 0$).\n- Only the bell synapse learns via the Hebbian rule: $\\Delta w_{\\text{bell}} = \\eta \\cdot \\text{bell} \\cdot y$.\n\n### Why It Works\n\nDuring conditioning, both bell and food are presented together:\n\n$$y = w_{\\text{food}} \\cdot 1 + w_{\\text{bell}} \\cdot 1 = 1 + w_{\\text{bell}} > 0$$\n\nSince the bell is active ($\\text{bell} = 1$) and the output is positive ($y > 0$),\nthe Hebbian rule strengthens the bell synapse:\n\n$$\\Delta w_{\\text{bell}} = \\eta \\cdot 1 \\cdot y > 0$$\n\nAfter enough pairings, $w_{\\text{bell}}$ grows large enough that the bell **alone**\ncan drive the response — the network has learned to **predict** food from the bell.\n\n```{note}\nThis is not merely an analogy. Hebb explicitly designed his rule to account for\nassociative learning phenomena like classical conditioning. The simulation below\nshows the complete acquisition curve.\n```",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "id": "ulrh586cndo",
   "source": "import numpy as np\nimport matplotlib.pyplot as plt\n\nnp.random.seed(42)\n\n# ── Classical conditioning as Hebbian learning ──────────────────\n# A single neuron learns to predict food from a bell signal.\n\neta = 0.05      # learning rate\nn_trials = 40   # total trials\n\n# Training protocol: (bell, food) per trial\n# Phase 1: food only     (baseline — bell has no effect)\n# Phase 2: bell + food   (conditioning — Hebbian pairing)\n# Phase 3: bell only     (test — can the bell predict food?)\n# Phase 4: food only     (the bell weight persists)\nprotocol = (\n    [(0, 1)] * 5  +   # Phase 1: baseline (trials 1-5)\n    [(1, 1)] * 20 +   # Phase 2: conditioning (trials 6-25)\n    [(1, 0)] * 10 +   # Phase 3: test (trials 26-35)\n    [(0, 1)] * 5       # Phase 4: return to food only (trials 36-40)\n)\n\n# Weights\nw_food = 1.0    # innate (fixed) — hardwired response to food\nw_bell = 0.0    # learned — initially no association\n\nw_bell_history = [w_bell]\nresponse_history = []\n\nfor bell, food in protocol:\n    y = w_food * food + w_bell * bell   # neuron response\n    response_history.append(y)\n    w_bell += eta * bell * y            # Hebbian update (bell synapse only)\n    w_bell_history.append(w_bell)\n\n# ── Visualization ───────────────────────────────────────────────\nfig, axes = plt.subplots(1, 2, figsize=(14, 5))\n\ntrials = np.arange(1, n_trials + 1)\n\n# Phase background shading\nphase_specs = [\n    (1, 5,  '#E3F2FD', 'Baseline\\n(food only)'),\n    (6, 25, '#FFF3E0', 'Conditioning\\n(bell + food)'),\n    (26, 35,'#E8F5E9', 'Test\\n(bell only)'),\n    (36, 40,'#E3F2FD', 'Return\\n(food only)'),\n]\n\nfor ax in axes:\n    for start, end, color, label in phase_specs:\n        ax.axvspan(start - 0.5, end + 0.5, alpha=0.4, color=color)\n\n# Panel 1: Learned weight w_bell\naxes[0].plot(trials, w_bell_history[1:], 'o-', color='#1565C0',\n             markersize=5, linewidth=2)\naxes[0].set_xlabel('Trial', fontsize=12)\naxes[0].set_ylabel(r'$w_{\\mathrm{bell}}$', fontsize=13)\naxes[0].set_title('Hebbian Weight Growth (Bell Synapse)', fontsize=13)\naxes[0].axhline(y=0, color='gray', linestyle='--', alpha=0.5)\naxes[0].grid(True, alpha=0.3)\n\n# Phase labels\nfor start, end, _, label in phase_specs:\n    axes[0].text((start + end) / 2, axes[0].get_ylim()[1] * 0.95,\n                 label, ha='center', va='top', fontsize=8, style='italic')\n\n# Panel 2: Neuron response\naxes[1].bar(trials, response_history, color=[\n    '#1565C0' if bell == 0 else '#E65100'\n    for bell, food in protocol\n], alpha=0.7, width=0.8)\naxes[1].set_xlabel('Trial', fontsize=12)\naxes[1].set_ylabel('Response $y$', fontsize=13)\naxes[1].set_title('Salivation Response', fontsize=13)\naxes[1].axhline(y=0, color='gray', linestyle='--', alpha=0.5)\naxes[1].grid(True, alpha=0.3)\n\n# Custom legend\nfrom matplotlib.patches import Patch\naxes[1].legend(handles=[\n    Patch(facecolor='#1565C0', alpha=0.7, label='Food present (no bell)'),\n    Patch(facecolor='#E65100', alpha=0.7, label='Bell present'),\n], fontsize=9)\n\nplt.tight_layout()\nplt.show()\n\n# Summary\nprint(\"Classical Conditioning via Hebbian Learning\")\nprint(\"=\" * 50)\nprint(f\"  Baseline (bell only):     response = {w_bell_history[0]:.2f}  (no effect)\")\nprint(f\"  After conditioning:       w_bell   = {w_bell_history[25]:.2f}\")\nprint(f\"  Test (bell alone, trial 26): response = {response_history[25]:.2f}  (prediction!)\")\nprint(f\"  Final w_bell:             {w_bell_history[-1]:.2f}  (weight persists)\")\nprint()\nprint(\"The network learned to PREDICT food from the bell signal.\")\nprint(\"Note: w_bell keeps growing — this is the instability problem (§12.5)!\")",
   "metadata": {},
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "id": "i75ol2i3m9p",
   "source": "### What the Simulation Reveals\n\nThe simulation demonstrates three fundamental properties of Hebbian learning:\n\n1. **Association through co-occurrence**: The bell synapse strengthens because bell and\n   response are repeatedly co-active during conditioning. This is Hebb's rule at work.\n\n2. **Prediction without supervision**: No teacher told the network that the bell predicts\n   food. The association emerged purely from temporal co-occurrence — an **unsupervised**\n   learning process.\n\n3. **The instability flaw**: Notice that $w_{\\text{bell}}$ grows *accelerating* during\n   conditioning — each trial makes the response larger, which makes the next update larger.\n   During the test phase (bell only), the weight continues to grow even without food.\n   This runaway growth is the **instability problem** we analyze next.\n\n```{warning}\nThe simulation also reveals what Hebbian learning *cannot* do: there is no mechanism for\n**extinction** (unlearning). In real conditioning, presenting the bell without food\neventually weakens the association. Pure Hebbian learning cannot weaken synapses —\nthis requires extensions like LTD or the covariance rule (§12.6).\n```",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": "## 12.5 The Instability Problem\n\n```{admonition} Theorem (Weight Instability)\n:class: important\n\nUnder the basic Hebbian rule with stationary input statistics, the weight vector $\\mathbf{w}$\ndiverges to infinity: $\\|\\mathbf{w}\\| \\to \\infty$ as $t \\to \\infty$.\n```\n\n```{admonition} Proof\n:class: dropdown\n\nConsider the continuous-time version of the Hebbian rule with a linear neuron:\n\n$$\\frac{d\\mathbf{w}}{dt} = \\eta \\, y \\, \\mathbf{x} = \\eta \\, (\\mathbf{w}^\\top \\mathbf{x}) \\, \\mathbf{x} = \\eta \\, \\mathbf{x} \\mathbf{x}^\\top \\mathbf{w}$$\n\nTaking the expectation over the input distribution:\n\n$$\\frac{d\\mathbf{w}}{dt} = \\eta \\, \\mathbb{E}[\\mathbf{x}\\mathbf{x}^\\top] \\, \\mathbf{w} = \\eta \\, \\mathbf{C} \\, \\mathbf{w}$$\n\nwhere $\\mathbf{C} = \\mathbb{E}[\\mathbf{x}\\mathbf{x}^\\top]$ is the input second-moment matrix; for centered data, this equals the covariance matrix.\n\nSince $\\mathbf{C}$ is symmetric and positive semi-definite, it has an eigendecomposition:\n\n$$\\mathbf{C} = \\sum_{i=1}^n \\lambda_i \\mathbf{e}_i \\mathbf{e}_i^\\top$$\nwhere $\\lambda_1 \\geq \\lambda_2 \\geq \\cdots \\geq \\lambda_n \\geq 0$ are the eigenvalues with corresponding orthonormal eigenvectors $\\mathbf{e}_1, \\ldots, \\mathbf{e}_n$.\n\nDecomposing $\\mathbf{w}(t)$ in the eigenbasis:\n\n$$\\mathbf{w}(t) = \\sum_{i=1}^n c_i(t) \\, \\mathbf{e}_i$$\n\nSubstituting into the differential equation:\n\n$$\\frac{dc_i}{dt} = \\eta \\lambda_i c_i$$\n\nThis has the solution:\n\n$$c_i(t) = c_i(0) \\, e^{\\eta \\lambda_i t}$$\n\nTherefore:\n\n$$\\mathbf{w}(t) = \\sum_{i=1}^n c_i(0) \\, e^{\\eta \\lambda_i t} \\, \\mathbf{e}_i$$\n\nFor any eigenvalue $\\lambda_i > 0$ with nonzero initial component $c_i(0) \\neq 0$, the\ncorresponding component grows exponentially. In particular:\n\n$$\\|\\mathbf{w}(t)\\| \\geq |c_1(0)| \\, e^{\\eta \\lambda_1 t} \\to \\infty \\quad \\text{as } t \\to \\infty$$\n\nprovided $\\lambda_1 > 0$ and $c_1(0) \\neq 0$. $\\blacksquare$\n```\n\n**Key insight**: The weight vector grows fastest along the direction of the first eigenvector\n$\\mathbf{e}_1$ (the leading eigenvector of $\\mathbf{C}$; for centered data, the first principal component), but its norm diverges.\n\n```{danger}\nPure Hebbian learning leads to unbounded weight growth! Without normalization, weights diverge to infinity. This is a fundamental flaw that took decades to resolve.\n```"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "np.random.seed(42)\n",
    "\n",
    "# Generate 2D correlated data\n",
    "n_samples = 1000\n",
    "# Correlation matrix with eigenvalues 2.0 and 0.5\n",
    "angle = np.pi / 6  # 30 degrees\n",
    "R = np.array([[np.cos(angle), -np.sin(angle)],\n",
    "              [np.sin(angle),  np.cos(angle)]])\n",
    "Lambda = np.diag([2.0, 0.5])\n",
    "C = R @ Lambda @ R.T\n",
    "\n",
    "# Generate data from this covariance\n",
    "X = np.random.multivariate_normal([0, 0], C, n_samples)\n",
    "\n",
    "# Basic Hebbian learning\n",
    "eta = 0.001\n",
    "w = np.array([0.5, 0.5])  # initial weights\n",
    "\n",
    "# Track weight history\n",
    "w_history = [w.copy()]\n",
    "norm_history = [np.linalg.norm(w)]\n",
    "\n",
    "n_epochs = 5\n",
    "for epoch in range(n_epochs):\n",
    "    for i in range(n_samples):\n",
    "        x = X[i]\n",
    "        y = w @ x          # linear output\n",
    "        w = w + eta * y * x  # Hebbian update\n",
    "        w_history.append(w.copy())\n",
    "        norm_history.append(np.linalg.norm(w))\n",
    "\n",
    "w_history = np.array(w_history)\n",
    "norm_history = np.array(norm_history)\n",
    "\n",
    "# Compute true principal component for reference\n",
    "eigenvalues, eigenvectors = np.linalg.eigh(C)\n",
    "pc1 = eigenvectors[:, -1]  # largest eigenvalue\n",
    "print(f\"True PC1 direction: {pc1}\")\n",
    "print(f\"Final weight direction: {w / np.linalg.norm(w)}\")\n",
    "print(f\"Final weight norm: {np.linalg.norm(w):.2f}\")\n",
    "\n",
    "# Plot\n",
    "fig, axes = plt.subplots(1, 3, figsize=(16, 5))\n",
    "\n",
    "# (1) Weight norm over time\n",
    "axes[0].plot(norm_history)\n",
    "axes[0].set_xlabel('Iteration')\n",
    "axes[0].set_ylabel('||w||')\n",
    "axes[0].set_title('Weight Norm (Exploding!)')\n",
    "axes[0].set_yscale('log')\n",
    "axes[0].grid(True, alpha=0.3)\n",
    "\n",
    "# (2) Data with weight trajectory\n",
    "axes[1].scatter(X[:, 0], X[:, 1], alpha=0.2, s=5, color='gray')\n",
    "# Show weight direction at several time points\n",
    "n_arrows = 10\n",
    "indices = np.linspace(0, len(w_history)-1, n_arrows, dtype=int)\n",
    "colors = plt.cm.viridis(np.linspace(0, 1, n_arrows))\n",
    "for idx, color in zip(indices, colors):\n",
    "    w_dir = w_history[idx] / np.linalg.norm(w_history[idx]) * 3\n",
    "    axes[1].annotate('', xy=w_dir, xytext=[0, 0],\n",
    "                     arrowprops=dict(arrowstyle='->', color=color, lw=2))\n",
    "# Show true PC1\n",
    "axes[1].annotate('', xy=pc1*3, xytext=[0, 0],\n",
    "                 arrowprops=dict(arrowstyle='->', color='red', lw=2, linestyle='--'))\n",
    "axes[1].set_xlabel('$x_1$')\n",
    "axes[1].set_ylabel('$x_2$')\n",
    "axes[1].set_title('Data and Weight Direction Over Time')\n",
    "axes[1].set_aspect('equal')\n",
    "axes[1].set_xlim(-5, 5)\n",
    "axes[1].set_ylim(-5, 5)\n",
    "axes[1].grid(True, alpha=0.3)\n",
    "axes[1].legend(['PC1 (true)'], loc='upper left')\n",
    "\n",
    "# (3) Angle to PC1 over time\n",
    "angles = []\n",
    "for w_t in w_history:\n",
    "    cos_angle = np.abs(np.dot(w_t, pc1)) / (np.linalg.norm(w_t) * np.linalg.norm(pc1))\n",
    "    cos_angle = np.clip(cos_angle, -1, 1)\n",
    "    angles.append(np.degrees(np.arccos(cos_angle)))\n",
    "\n",
    "axes[2].plot(angles)\n",
    "axes[2].set_xlabel('Iteration')\n",
    "axes[2].set_ylabel('Angle to PC1 (degrees)')\n",
    "axes[2].set_title('Convergence of Direction to PC1')\n",
    "axes[2].axhline(y=0, color='r', linestyle='--', alpha=0.5)\n",
    "axes[2].grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.savefig('hebbian_instability.png', dpi=150, bbox_inches='tight')\n",
    "plt.show()\n",
    "\n",
    "print(\"\\nKey observation: The weight DIRECTION converges to PC1,\")\n",
    "print(\"but the weight NORM diverges exponentially!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-5b",
   "metadata": {},
   "source": [
    "### Weight Growth Over Time: The Exponential Explosion\n",
    "\n",
    "The following visualization demonstrates the exponential weight growth under pure Hebbian\n",
    "learning for several different initial conditions and learning rates. The log-scale plot\n",
    "makes the exponential growth appear linear, confirming the theoretical prediction\n",
    "$\\|\\mathbf{w}(t)\\| \\sim e^{\\eta \\lambda_1 t}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5c",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "np.random.seed(42)\n",
    "\n",
    "# Generate 2D correlated data\n",
    "n_samples = 1000\n",
    "angle = np.pi / 6\n",
    "R = np.array([[np.cos(angle), -np.sin(angle)],\n",
    "              [np.sin(angle),  np.cos(angle)]])\n",
    "C = R @ np.diag([2.0, 0.5]) @ R.T\n",
    "X = np.random.multivariate_normal([0, 0], C, n_samples)\n",
    "\n",
    "# Theoretical prediction\n",
    "eigenvalues_C = np.linalg.eigvalsh(C)\n",
    "lambda_max = np.max(eigenvalues_C)\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "\n",
    "learning_rates = [0.0005, 0.001, 0.002]\n",
    "colors = ['#2196F3', '#F44336', '#4CAF50']\n",
    "\n",
    "for eta, color in zip(learning_rates, colors):\n",
    "    w = np.array([0.5, 0.5])\n",
    "    norms = [np.linalg.norm(w)]\n",
    "    \n",
    "    for epoch in range(5):\n",
    "        for i in range(n_samples):\n",
    "            x = X[i]\n",
    "            y = w @ x\n",
    "            w = w + eta * y * x\n",
    "            norms.append(np.linalg.norm(w))\n",
    "    \n",
    "    iterations = np.arange(len(norms))\n",
    "    ax.plot(iterations, norms, color=color, linewidth=2,\n",
    "            label=f'$\\\\eta = {eta}$')\n",
    "    \n",
    "    # Theoretical exponential envelope\n",
    "    theoretical = norms[0] * np.exp(eta * lambda_max * iterations)\n",
    "    ax.plot(iterations, theoretical, color=color, linewidth=1,\n",
    "            linestyle='--', alpha=0.5)\n",
    "\n",
    "ax.set_yscale('log')\n",
    "ax.set_xlabel('Iteration', fontsize=12)\n",
    "ax.set_ylabel('$||\\\\mathbf{w}||$ (log scale)', fontsize=12)\n",
    "ax.set_title('Weight Growth Under Pure Hebbian Learning\\n'\n",
    "             'Solid: actual | Dashed: theoretical $||w_0|| \\\\cdot e^{\\\\eta \\\\lambda_1 t}$',\n",
    "             fontsize=13)\n",
    "ax.legend(fontsize=12)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(f\"Largest eigenvalue lambda_1 = {lambda_max:.3f}\")\n",
    "print(\"Higher learning rates => faster exponential divergence.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-6",
   "metadata": {},
   "source": [
    "## 12.6 The Covariance Rule\n",
    "\n",
    "One natural modification of the basic Hebbian rule is to use deviations from the means\n",
    "rather than raw activities:\n",
    "\n",
    "$$\\Delta w_{ij} = \\eta (x_i - \\bar{x}_i)(y_j - \\bar{y}_j)$$\n",
    "\n",
    "where $\\bar{x}_i = \\mathbb{E}[x_i]$ and $\\bar{y}_j = \\mathbb{E}[y_j]$ are the mean\n",
    "activities.\n",
    "\n",
    "### Properties\n",
    "\n",
    "1. **Allows weakening**: If $x_i$ is above average but $y_j$ is below average, the weight\n",
    "   decreases. The basic Hebbian rule can only strengthen synapses (for positive activities).\n",
    "\n",
    "2. **Invariance to mean**: The rule depends only on the covariance structure of the inputs\n",
    "   and outputs, not their means.\n",
    "\n",
    "3. **Still unstable**: The covariance rule inherits the instability problem of the basic\n",
    "   Hebbian rule -- weights still diverge.\n",
    "\n",
    "The covariance rule is sometimes called the **\"centered Hebbian rule\"** and is closely\n",
    "related to PCA on centered data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-7",
   "metadata": {},
   "source": [
    "## 12.7 Biological Support for Hebbian Learning\n",
    "\n",
    "### Long-Term Potentiation (LTP)\n",
    "\n",
    "The first direct evidence for Hebb's postulate came 24 years after its publication.\n",
    "\n",
    "**Bliss & Lomo (1973)** stimulated the perforant pathway in the hippocampus of anesthetized\n",
    "rabbits with high-frequency bursts (tetanic stimulation). They observed that synaptic\n",
    "transmission was enhanced for hours afterwards. This **long-term potentiation (LTP)** was\n",
    "the first experimental demonstration of a Hebbian synaptic modification.\n",
    "\n",
    "### The NMDA Receptor as a Coincidence Detector\n",
    "\n",
    "The molecular mechanism underlying LTP involves the **NMDA receptor**, which functions as\n",
    "an AND gate:\n",
    "\n",
    "- It requires **glutamate binding** (presynaptic activity).\n",
    "- It requires **postsynaptic depolarization** (to relieve the Mg$^{2+}$ block).\n",
    "- Only when **both** conditions are met does Ca$^{2+}$ flow in, triggering LTP.\n",
    "\n",
    "This is precisely the coincidence detection Hebb described: the synapse strengthens only\n",
    "when the presynaptic neuron contributes to firing the postsynaptic neuron.\n",
    "\n",
    "### Long-Term Depression (LTD)\n",
    "\n",
    "Hebb's original postulate only described strengthening. However, the brain also exhibits\n",
    "**long-term depression (LTD)** -- weakening of synaptic connections. This occurs when:\n",
    "- Presynaptic activity occurs without sufficient postsynaptic response.\n",
    "- Low-frequency stimulation is applied.\n",
    "\n",
    "LTD provides the weakening mechanism that the basic Hebbian rule lacks.\n",
    "\n",
    "### Spike-Timing-Dependent Plasticity (STDP)\n",
    "\n",
    "```{note}\n",
    "**Biological Evidence: STDP (Spike-Timing-Dependent Plasticity)**, discovered in the 1990s by Markram et al. (1997) and Bi & Poo (1998), confirms Hebb's intuition with remarkable precision. STDP shows that the *exact timing* between pre- and postsynaptic spikes determines whether a synapse is strengthened or weakened -- providing a temporally precise, causal version of Hebb's \"taking part in firing\" condition.\n",
    "```\n",
    "\n",
    "A more refined view emerged with STDP, which\n",
    "depends on the precise timing between pre- and postsynaptic spikes:\n",
    "\n",
    "$$\\Delta w = \\begin{cases}\n",
    "A_+ \\exp\\left(-\\dfrac{\\Delta t}{\\tau_+}\\right) & \\text{if } \\Delta t > 0 \\text{ (pre before post: LTP)} \\\\\n",
    "-A_- \\exp\\left(\\dfrac{\\Delta t}{\\tau_-}\\right) & \\text{if } \\Delta t < 0 \\text{ (post before pre: LTD)}\n",
    "\\end{cases}$$\n",
    "\n",
    "where $\\Delta t = t_{\\text{post}} - t_{\\text{pre}}$.\n",
    "\n",
    "Typical values: $\\tau_+ \\approx \\tau_- \\approx 20$ ms, $A_+ \\approx A_- \\approx 0.01$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Visualize the STDP learning window\n",
    "\n",
    "A_plus = 1.0\n",
    "A_minus = 0.8\n",
    "tau_plus = 20.0   # ms\n",
    "tau_minus = 20.0  # ms\n",
    "\n",
    "dt = np.linspace(-80, 80, 1000)  # ms\n",
    "\n",
    "dw = np.where(dt > 0,\n",
    "              A_plus * np.exp(-dt / tau_plus),\n",
    "              -A_minus * np.exp(dt / tau_minus))\n",
    "dw[dt == 0] = 0\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "ax.fill_between(dt[dt >= 0], 0, dw[dt >= 0], alpha=0.3, color='green', label='LTP')\n",
    "ax.fill_between(dt[dt <= 0], 0, dw[dt <= 0], alpha=0.3, color='red', label='LTD')\n",
    "ax.plot(dt, dw, 'k-', linewidth=2)\n",
    "ax.axhline(y=0, color='gray', linestyle='-', linewidth=0.5)\n",
    "ax.axvline(x=0, color='gray', linestyle='-', linewidth=0.5)\n",
    "\n",
    "ax.set_xlabel(r'$\\Delta t = t_{\\mathrm{post}} - t_{\\mathrm{pre}}$ (ms)', fontsize=12)\n",
    "ax.set_ylabel(r'$\\Delta w$', fontsize=12)\n",
    "ax.set_title('Spike-Timing-Dependent Plasticity (STDP) Window', fontsize=14)\n",
    "ax.legend(fontsize=12)\n",
    "ax.set_xlim(-80, 80)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "ax.annotate('Pre before Post\\n(causal: strengthen)',\n",
    "            xy=(15, A_plus * np.exp(-15/tau_plus)),\n",
    "            xytext=(40, 0.7),\n",
    "            arrowprops=dict(arrowstyle='->', color='green'),\n",
    "            fontsize=10, color='green')\n",
    "ax.annotate('Post before Pre\\n(acausal: weaken)',\n",
    "            xy=(-15, -A_minus * np.exp(-15/tau_minus)),\n",
    "            xytext=(-70, -0.6),\n",
    "            arrowprops=dict(arrowstyle='->', color='red'),\n",
    "            fontsize=10, color='red')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.savefig('stdp_window.png', dpi=150, bbox_inches='tight')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-8b",
   "metadata": {},
   "source": [
    "### STDP Timing Window Curve\n",
    "\n",
    "The following visualization explores how the STDP curve changes with different parameter\n",
    "choices, showing the effect of varying the time constants $\\tau_+$, $\\tau_-$ and the\n",
    "amplitude ratio $A_+/A_-$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8c",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# STDP timing window curve: parameter exploration\n",
    "\n",
    "dt = np.linspace(-100, 100, 2000)\n",
    "\n",
    "fig, axes = plt.subplots(1, 3, figsize=(16, 5))\n",
    "\n",
    "# Panel 1: Standard STDP window with biological parameters\n",
    "params_list = [\n",
    "    {'A_plus': 1.0, 'A_minus': 1.0, 'tau_plus': 20, 'tau_minus': 20, 'label': 'Symmetric ($A_+=A_-$)'},\n",
    "    {'A_plus': 1.0, 'A_minus': 0.5, 'tau_plus': 20, 'tau_minus': 20, 'label': 'LTP-dominant ($A_+>A_-$)'},\n",
    "    {'A_plus': 0.5, 'A_minus': 1.0, 'tau_plus': 20, 'tau_minus': 20, 'label': 'LTD-dominant ($A_+<A_-$)'},\n",
    "]\n",
    "\n",
    "for params in params_list:\n",
    "    dw = np.where(dt > 0,\n",
    "                  params['A_plus'] * np.exp(-dt / params['tau_plus']),\n",
    "                  -params['A_minus'] * np.exp(dt / params['tau_minus']))\n",
    "    dw[np.abs(dt) < 0.1] = 0\n",
    "    axes[0].plot(dt, dw, linewidth=2, label=params['label'])\n",
    "\n",
    "axes[0].axhline(y=0, color='gray', linewidth=0.5)\n",
    "axes[0].axvline(x=0, color='gray', linewidth=0.5)\n",
    "axes[0].set_xlabel(r'$\\Delta t$ (ms)', fontsize=11)\n",
    "axes[0].set_ylabel(r'$\\Delta w$', fontsize=11)\n",
    "axes[0].set_title('Effect of Amplitude Ratio', fontsize=12)\n",
    "axes[0].legend(fontsize=9)\n",
    "axes[0].grid(True, alpha=0.3)\n",
    "\n",
    "# Panel 2: Effect of time constants\n",
    "tau_values = [10, 20, 40]\n",
    "colors = ['#E91E63', '#2196F3', '#4CAF50']\n",
    "for tau, color in zip(tau_values, colors):\n",
    "    dw = np.where(dt > 0,\n",
    "                  1.0 * np.exp(-dt / tau),\n",
    "                  -0.8 * np.exp(dt / tau))\n",
    "    dw[np.abs(dt) < 0.1] = 0\n",
    "    axes[1].plot(dt, dw, linewidth=2, color=color,\n",
    "                label=f'$\\\\tau_\\\\pm = {tau}$ ms')\n",
    "\n",
    "axes[1].axhline(y=0, color='gray', linewidth=0.5)\n",
    "axes[1].axvline(x=0, color='gray', linewidth=0.5)\n",
    "axes[1].set_xlabel(r'$\\Delta t$ (ms)', fontsize=11)\n",
    "axes[1].set_ylabel(r'$\\Delta w$', fontsize=11)\n",
    "axes[1].set_title('Effect of Time Constants', fontsize=12)\n",
    "axes[1].legend(fontsize=10)\n",
    "axes[1].grid(True, alpha=0.3)\n",
    "\n",
    "# Panel 3: Net effect - integrate STDP over correlated pre/post spike trains\n",
    "np.random.seed(42)\n",
    "n_trials = 5000\n",
    "correlation_values = np.linspace(-0.5, 1.0, 30)\n",
    "net_dw = []\n",
    "\n",
    "A_plus, A_minus = 1.0, 0.8\n",
    "tau_plus, tau_minus = 20.0, 20.0\n",
    "\n",
    "for corr in correlation_values:\n",
    "    # Generate correlated spike time differences\n",
    "    # Higher correlation -> more positive dt (pre before post)\n",
    "    mean_dt = corr * 10  # mean time difference in ms\n",
    "    dts = np.random.normal(mean_dt, 15, n_trials)\n",
    "    \n",
    "    dw_vals = np.where(dts > 0,\n",
    "                       A_plus * np.exp(-dts / tau_plus),\n",
    "                       -A_minus * np.exp(dts / tau_minus))\n",
    "    net_dw.append(np.mean(dw_vals))\n",
    "\n",
    "axes[2].plot(correlation_values, net_dw, 'k-', linewidth=2)\n",
    "axes[2].fill_between(correlation_values, 0, net_dw,\n",
    "                     where=[d > 0 for d in net_dw], alpha=0.3, color='green', label='Net LTP')\n",
    "axes[2].fill_between(correlation_values, 0, net_dw,\n",
    "                     where=[d <= 0 for d in net_dw], alpha=0.3, color='red', label='Net LTD')\n",
    "axes[2].axhline(y=0, color='gray', linewidth=0.5)\n",
    "axes[2].set_xlabel('Pre-Post Correlation', fontsize=11)\n",
    "axes[2].set_ylabel(r'Net $\\langle \\Delta w \\rangle$', fontsize=11)\n",
    "axes[2].set_title('Net Weight Change vs Correlation', fontsize=12)\n",
    "axes[2].legend(fontsize=10)\n",
    "axes[2].grid(True, alpha=0.3)\n",
    "\n",
    "plt.suptitle('STDP Timing Window: Parameter Exploration', fontsize=14, fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(\"Key insight: STDP is a temporally precise, causal version of Hebb's rule.\")\n",
    "print(\"Correlated firing (pre before post) leads to net potentiation.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-8d",
   "metadata": {},
   "source": [
    "### Hebbian Learning on Simple Patterns\n",
    "\n",
    "The following code demonstrates how the Hebbian weight matrix captures input correlations.\n",
    "We present several distinct patterns to a network of neurons and observe how the learned\n",
    "weight matrix reflects the correlation structure of the input data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8e",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "np.random.seed(42)\n",
    "\n",
    "# Hebbian learning on simple patterns: weight matrix captures correlations\n",
    "\n",
    "# Define 3 binary patterns (8-dimensional)\n",
    "n_neurons = 8\n",
    "patterns = np.array([\n",
    "    [1, 1, 1, 1, 0, 0, 0, 0],   # Pattern A: first half active\n",
    "    [0, 0, 0, 0, 1, 1, 1, 1],   # Pattern B: second half active\n",
    "    [1, 0, 1, 0, 1, 0, 1, 0],   # Pattern C: alternating\n",
    "], dtype=float)\n",
    "\n",
    "# Convert to +1/-1 encoding for cleaner Hebbian learning\n",
    "patterns_bipolar = 2 * patterns - 1\n",
    "\n",
    "# Apply basic Hebbian learning: W = sum of outer products\n",
    "W_hebb = np.zeros((n_neurons, n_neurons))\n",
    "for p in patterns_bipolar:\n",
    "    W_hebb += np.outer(p, p)\n",
    "\n",
    "# Zero the diagonal (no self-connections)\n",
    "np.fill_diagonal(W_hebb, 0)\n",
    "\n",
    "# Also compute the true correlation matrix of the patterns\n",
    "C_patterns = patterns_bipolar.T @ patterns_bipolar / len(patterns_bipolar)\n",
    "np.fill_diagonal(C_patterns, 0)\n",
    "\n",
    "# Online Hebbian learning with noisy presentations\n",
    "eta = 0.01\n",
    "W_online = np.zeros((n_neurons, n_neurons))\n",
    "n_presentations = 3000\n",
    "\n",
    "for _ in range(n_presentations):\n",
    "    idx = np.random.randint(len(patterns_bipolar))\n",
    "    x = patterns_bipolar[idx] + np.random.randn(n_neurons) * 0.1  # noisy\n",
    "    # Hebbian: dW = eta * x * x^T\n",
    "    W_online += eta * np.outer(x, x)\n",
    "\n",
    "np.fill_diagonal(W_online, 0)\n",
    "\n",
    "# Visualization\n",
    "fig, axes = plt.subplots(1, 4, figsize=(18, 4))\n",
    "\n",
    "# Panel 1: The patterns\n",
    "im0 = axes[0].imshow(patterns_bipolar, cmap='RdBu_r', aspect='auto', vmin=-1.5, vmax=1.5)\n",
    "axes[0].set_xlabel('Neuron index')\n",
    "axes[0].set_ylabel('Pattern')\n",
    "axes[0].set_yticks([0, 1, 2])\n",
    "axes[0].set_yticklabels(['A', 'B', 'C'])\n",
    "axes[0].set_title('Input Patterns', fontsize=12)\n",
    "plt.colorbar(im0, ax=axes[0], shrink=0.8)\n",
    "\n",
    "# Panel 2: Batch Hebbian weight matrix\n",
    "im1 = axes[1].imshow(W_hebb, cmap='RdBu_r', aspect='equal')\n",
    "axes[1].set_xlabel('Neuron j')\n",
    "axes[1].set_ylabel('Neuron i')\n",
    "axes[1].set_title('Batch Hebbian $W = \\\\sum \\\\mathbf{p}\\\\mathbf{p}^T$', fontsize=12)\n",
    "plt.colorbar(im1, ax=axes[1], shrink=0.8)\n",
    "\n",
    "# Panel 3: Online Hebbian weight matrix\n",
    "im2 = axes[2].imshow(W_online, cmap='RdBu_r', aspect='equal')\n",
    "axes[2].set_xlabel('Neuron j')\n",
    "axes[2].set_ylabel('Neuron i')\n",
    "axes[2].set_title('Online Hebbian (noisy)', fontsize=12)\n",
    "plt.colorbar(im2, ax=axes[2], shrink=0.8)\n",
    "\n",
    "# Panel 4: Pattern correlation matrix\n",
    "im3 = axes[3].imshow(C_patterns, cmap='RdBu_r', aspect='equal')\n",
    "axes[3].set_xlabel('Neuron j')\n",
    "axes[3].set_ylabel('Neuron i')\n",
    "axes[3].set_title('Pattern Correlation Matrix', fontsize=12)\n",
    "plt.colorbar(im3, ax=axes[3], shrink=0.8)\n",
    "\n",
    "plt.suptitle('Hebbian Learning Captures Correlation Structure',\n",
    "             fontsize=14, fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Test pattern recall\n",
    "print(\"Pattern recall test (using sign of W @ pattern):\")\n",
    "for i, (name, p) in enumerate(zip(['A', 'B', 'C'], patterns_bipolar)):\n",
    "    recalled = np.sign(W_hebb @ p)\n",
    "    match = np.all(recalled == p)\n",
    "    print(f\"  Pattern {name}: recalled correctly = {match}\")\n",
    "\n",
    "print(\"\\nKey insight: The Hebbian weight matrix is proportional to the\")\n",
    "print(\"input correlation matrix. Neurons that co-activate in patterns\")\n",
    "print(\"develop strong positive connections (red), while anti-correlated\")\n",
    "print(\"neurons develop negative connections (blue).\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-9",
   "metadata": {},
   "source": [
    "## 12.8 Limitations of Hebbian Learning\n",
    "\n",
    "Despite its elegance and biological plausibility, Hebbian learning has significant limitations:\n",
    "\n",
    "### 1. Unsupervised Only\n",
    "\n",
    "Hebbian learning is inherently **unsupervised**: it extracts statistical structure from inputs\n",
    "but cannot learn to map inputs to desired outputs. There is no mechanism for incorporating\n",
    "a target signal or error feedback.\n",
    "\n",
    "### 2. No Credit Assignment\n",
    "\n",
    "In a multi-layer network, Hebbian learning provides no way to determine how hidden-layer\n",
    "weights should change to reduce output error. This is the **credit assignment problem**,\n",
    "which will be solved by backpropagation (Chapter 16).\n",
    "\n",
    "### 3. Linear Projections Only\n",
    "\n",
    "With a linear neuron, Hebbian learning can only learn linear projections (principal components).\n",
    "It cannot learn nonlinear decision boundaries or nonlinear feature extraction.\n",
    "\n",
    "### 4. Instability\n",
    "\n",
    "As proven in Section 12.5, basic Hebbian learning leads to unbounded weight growth.\n",
    "This requires modifications such as Oja's rule (Chapter 13) or BCM theory (Chapter 14).\n",
    "\n",
    "### Summary\n",
    "\n",
    "| Property | Hebbian Learning |\n",
    "|----------|------------------|\n",
    "| Learning type | Unsupervised |\n",
    "| Biological plausibility | High |\n",
    "| Stability | Unstable (without modification) |\n",
    "| What it learns | Principal components / correlations |\n",
    "| Error signal needed | No |\n",
    "| Can solve XOR | No |\n",
    "| Credit assignment | No |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10",
   "metadata": {},
   "source": "## Exercises\n\n**Exercise 12.1.** Implement the covariance rule $\\Delta w_{ij} = \\eta(x_i - \\bar{x}_i)(y_j - \\bar{y}_j)$\nusing running estimates of the means. Compare its behavior with the basic Hebbian rule on\ndata with nonzero mean.\n\n**Exercise 12.2.** Prove that the weight direction under basic Hebbian learning converges to the\nleading eigenvector of $\\mathbf{C}$. That is, show $\\mathbf{w}(t)/\\|\\mathbf{w}(t)\\| \\to \\pm \\mathbf{e}_1$\nas $t \\to \\infty$, assuming $|c_1(0)| > 0$ and $\\lambda_1 > \\lambda_2$.\n\n**Exercise 12.3.** Simulate STDP learning with Poisson spike trains. Generate pre- and\npostsynaptic spike trains at different rates and correlations. Compute the expected weight\nchange and verify it matches the analytical prediction.\n\n**Exercise 12.4.** What happens to Hebbian learning if the inputs are whitened (decorrelated\nwith unit variance)? What does the correlation matrix $\\mathbf{C}$ become? What are the\nimplications for the instability theorem?\n\n**Exercise 12.5.** Implement Hebbian learning with a nonlinear activation function $y = \\tanh(\\mathbf{w}^\\top \\mathbf{x})$.\nDoes the saturation of tanh prevent unbounded weight growth? Run experiments to find out."
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}