{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "62353118",
   "metadata": {},
   "source": [
    "# Chapter 3: Historical Context and Timeline\n",
    "\n",
    "\n",
    "## 1. Introduction: The Intellectual Currents\n",
    "\n",
    "The story of neural networks is not a simple linear progression. It is a complex web of ideas drawn from **neurophysiology**, **mathematical logic**, **psychology**, **electrical engineering**, and **computer science** --- fields that, in the mid-20th century, were just beginning to recognize their deep connections.\n",
    "\n",
    "Three broad currents converged to create the field:\n",
    "\n",
    "1. **The logic of the brain.** Starting with the philosophical question of how the nervous system implements thought, McCulloch and Pitts (1943) showed that idealized neurons could compute any logical function. This line of thinking continued through Hebb's learning postulate, Rosenblatt's Perceptron, and eventually the connectionist revolution of the 1980s.\n",
    "\n",
    "2. **The mathematics of learning.** From the statistical tradition (Fisher, Neyman-Pearson), through the engineering tradition of adaptive filters (Widrow-Hoff), to the optimization perspective (gradient descent, backpropagation), a separate thread developed the mathematical tools for making systems that learn from data.\n",
    "\n",
    "3. **The theory of computation.** Turing (1936), von Neumann, and the cybernetics movement (Wiener 1948) provided the conceptual framework for understanding what can be computed and how. The question of what neural networks *cannot* do --- culminating in Minsky and Papert's devastating critique (1969) --- belongs to this tradition.\n",
    "\n",
    "This chapter presents the key milestones, the intellectual lineages connecting them, brief biographies of the major figures, and an analysis of the \"AI winter\" that nearly killed the field.\n",
    "\n",
    "Understanding this history is not mere antiquarianism. Many of the debates that shaped the field --- the power and limits of single-layer networks, the credit assignment problem, the tension between symbolic and subsymbolic approaches --- recur in new forms today. As the saying goes: those who do not study the history of neural networks are doomed to reinvent the Perceptron.\n",
    "\n",
    "```{note}\n",
    "**Historical context matters.** The McCulloch-Pitts neuron was conceived in the same decade as the Manhattan Project, Shannon's information theory, and Turing's code-breaking at Bletchley Park. The intellectual ferment of the 1940s --- where mathematicians, physicists, biologists, and engineers were being thrown together by wartime necessity --- created the conditions for radical cross-disciplinary thinking. The Macy Conferences on Cybernetics (1946--1953) were a direct product of this environment, bringing together McCulloch, Wiener, von Neumann, Shannon, Bateson, and others in freewheeling discussions that would have been impossible in peacetime academic silos.\n",
    "```\n",
    "\n",
    "```{tip}\n",
    "When studying the history of any scientific field, pay attention to the **social and institutional factors** as much as the technical ones. Funding decisions, academic politics, publication venues, and personal relationships all shaped which ideas flourished and which were suppressed. The AI winter is a cautionary tale about how a single influential book (*Perceptrons*) can redirect an entire field --- for better or worse.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a871d80",
   "metadata": {},
   "source": [
    "## 2. Timeline Data\n",
    "\n",
    "Below we define the key milestones in the development of neural networks from 1943 to 1989."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "858c2f8e",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as mpatches\n",
    "from matplotlib.patches import FancyBboxPatch\n",
    "%matplotlib inline\n",
    "\n",
    "milestones = [\n",
    "    {\"year\": 1943, \"who\": \"McCulloch & Pitts\", \"what\": \"Formal neuron model\",\n",
    "     \"detail\": \"First mathematical model of a neuron as a binary threshold unit. \"\n",
    "               \"Proved that networks of such neurons can compute any Boolean function.\",\n",
    "     \"category\": \"theory\"},\n",
    "    {\"year\": 1948, \"who\": \"Norbert Wiener\", \"what\": \"Cybernetics published\",\n",
    "     \"detail\": \"Established the interdisciplinary field studying feedback and control \"\n",
    "               \"in both biological and artificial systems.\",\n",
    "     \"category\": \"theory\"},\n",
    "    {\"year\": 1949, \"who\": \"Donald Hebb\", \"what\": \"Hebbian learning postulate\",\n",
    "     \"detail\": \"Proposed that when neuron A repeatedly helps fire neuron B, the \"\n",
    "               \"connection A->B strengthens. First learning rule.\",\n",
    "     \"category\": \"learning\"},\n",
    "    {\"year\": 1956, \"who\": \"Dartmouth Conference\", \"what\": \"AI as a field is born\",\n",
    "     \"detail\": \"McCarthy, Minsky, Rochester, Shannon organize the Dartmouth Summer Research \"\n",
    "               \"Project on Artificial Intelligence. The term 'artificial intelligence' is coined.\",\n",
    "     \"category\": \"theory\"},\n",
    "    {\"year\": 1957, \"who\": \"Frank Rosenblatt\", \"what\": \"The Perceptron\",\n",
    "     \"detail\": \"First neural network with adjustable weights and a learning algorithm. \"\n",
    "               \"The Mark I Perceptron was a physical machine with photocells and potentiometers.\",\n",
    "     \"category\": \"learning\"},\n",
    "    {\"year\": 1958, \"who\": \"Rosenblatt\", \"what\": \"Perceptron learning result\",\n",
    "     \"detail\": \"Gave an early finite-step learning guarantee for linearly separable data; \"\n",
    "               \"later analyses such as Novikoff (1962) gave the standard clean theorem.\",\n",
    "     \"category\": \"learning\"},\n",
    "    {\"year\": 1960, \"who\": \"Widrow & Hoff\", \"what\": \"ADALINE and delta rule\",\n",
    "     \"detail\": \"Introduced the Adaptive Linear Neuron (ADALINE) and the LMS learning rule. \"\n",
    "               \"Gradient-based alternative to the perceptron rule.\",\n",
    "     \"category\": \"learning\"},\n",
    "    {\"year\": 1965, \"who\": \"Ivakhnenko & Lapa\", \"what\": \"GMDH: early deep learning\",\n",
    "     \"detail\": \"Group Method of Data Handling (GMDH). A method for training networks \"\n",
    "               \"with multiple layers by training layer-by-layer.\",\n",
    "     \"category\": \"learning\"},\n",
    "    {\"year\": 1967, \"who\": \"Shun'ichi Amari\", \"what\": \"SGD for MLPs\",\n",
    "     \"detail\": \"Proposed using stochastic gradient descent to train multi-layer networks. \"\n",
    "               \"Published in Japanese, largely unknown in the West until much later.\",\n",
    "     \"category\": \"learning\"},\n",
    "    {\"year\": 1969, \"who\": \"Minsky & Papert\", \"what\": \"'Perceptrons' published\",\n",
    "     \"detail\": \"Rigorously proved limitations of single-layer perceptrons. \"\n",
    "               \"Widely (mis)interpreted as proving neural networks in general are limited. \"\n",
    "               \"Became one important factor in the first AI winter for NNs.\",\n",
    "     \"category\": \"theory\"},\n",
    "    {\"year\": 1970, \"who\": \"Seppo Linnainmaa\", \"what\": \"Automatic differentiation\",\n",
    "     \"detail\": \"Master's thesis describing reverse-mode automatic differentiation --- \"\n",
    "               \"the mathematical core of backpropagation.\",\n",
    "     \"category\": \"backprop\"},\n",
    "    {\"year\": 1974, \"who\": \"Paul Werbos\", \"what\": \"Backpropagation (PhD thesis)\",\n",
    "     \"detail\": \"First clear application of reverse-mode differentiation to train \"\n",
    "               \"multi-layer networks.\",\n",
    "     \"category\": \"backprop\"},\n",
    "    {\"year\": 1980, \"who\": \"Kunihiko Fukushima\", \"what\": \"Neocognitron\",\n",
    "     \"detail\": \"Hierarchical neural network for visual pattern recognition. \"\n",
    "               \"Direct precursor to modern convolutional neural networks.\",\n",
    "     \"category\": \"architecture\"},\n",
    "    {\"year\": 1982, \"who\": \"John Hopfield\", \"what\": \"Hopfield networks\",\n",
    "     \"detail\": \"Recurrent networks with symmetric connections and an energy function. \"\n",
    "               \"Reinvigorated interest in neural networks among physicists.\",\n",
    "     \"category\": \"architecture\"},\n",
    "    {\"year\": 1985, \"who\": \"Hinton & Sejnowski\", \"what\": \"Boltzmann machines\",\n",
    "     \"detail\": \"Stochastic recurrent networks that can learn internal representations. \"\n",
    "               \"Based on statistical mechanics.\",\n",
    "     \"category\": \"learning\"},\n",
    "    {\"year\": 1986, \"who\": \"Rumelhart, Hinton & Williams\", \"what\": \"Backpropagation popularized\",\n",
    "     \"detail\": \"Clearly demonstrated that backpropagation could train multi-layer networks \"\n",
    "               \"to learn useful internal representations. Helped end the AI winter for NNs.\",\n",
    "     \"category\": \"backprop\"},\n",
    "    {\"year\": 1989, \"who\": \"Hornik, Stinchcombe & White\", \"what\": \"Universal approximation theorem\",\n",
    "     \"detail\": \"Proved that feedforward networks with a single hidden layer can approximate \"\n",
    "               \"any continuous function on compact sets.\",\n",
    "     \"category\": \"theory\"},\n",
    "    {\"year\": 1989, \"who\": \"Yann LeCun\", \"what\": \"Convolutional networks for digits\",\n",
    "     \"detail\": \"Applied backpropagation to convolutional neural networks for handwritten \"\n",
    "               \"digit recognition. One of the first practical successes of deep learning.\",\n",
    "     \"category\": \"architecture\"},\n",
    "]\n",
    "\n",
    "print(f\"{'Year':<6} {'Who':<30} {'What'}\")\n",
    "print(\"=\" * 80)\n",
    "for m in milestones:\n",
    "    print(f\"{m['year']:<6} {m['who']:<30} {m['what']}\")\n",
    "\n",
    "print(f\"\\nTotal milestones: {len(milestones)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4724381d",
   "metadata": {},
   "source": [
    "## 3. Timeline Visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70280cf6",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# --- Timeline Visualization ---\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as mpatches\n",
    "%matplotlib inline\n",
    "\n",
    "category_colors = {\n",
    "    'theory': '#2196F3', 'learning': '#4CAF50',\n",
    "    'backprop': '#FF5722', 'architecture': '#9C27B0',\n",
    "}\n",
    "category_labels = {\n",
    "    'theory': 'Theory & Foundations', 'learning': 'Learning Rules',\n",
    "    'backprop': 'Backpropagation Lineage', 'architecture': 'Network Architectures',\n",
    "}\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(20, 12))\n",
    "\n",
    "years = [m['year'] for m in milestones]\n",
    "min_year, max_year = min(years) - 2, max(years) + 2\n",
    "ax.plot([min_year, max_year], [0, 0], 'k-', linewidth=2, zorder=1)\n",
    "\n",
    "positions = []\n",
    "prev_year = None\n",
    "above = True\n",
    "for i, m in enumerate(milestones):\n",
    "    if m['year'] == prev_year:\n",
    "        above = not above\n",
    "    else:\n",
    "        above = not above if i > 0 else True\n",
    "    positions.append(1 if above else -1)\n",
    "    prev_year = m['year']\n",
    "\n",
    "y_offsets = []\n",
    "above_count = 0\n",
    "below_count = 0\n",
    "for p in positions:\n",
    "    if p > 0:\n",
    "        above_count += 1\n",
    "        y_offsets.append(0.6 + (above_count % 3) * 0.4)\n",
    "    else:\n",
    "        below_count += 1\n",
    "        y_offsets.append(-0.6 - (below_count % 3) * 0.4)\n",
    "\n",
    "for i, m in enumerate(milestones):\n",
    "    color = category_colors[m['category']]\n",
    "    y = y_offsets[i]\n",
    "    ax.plot([m['year'], m['year']], [0, y * 0.85], color=color,\n",
    "            linewidth=1.5, linestyle='-', alpha=0.7, zorder=2)\n",
    "    ax.scatter(m['year'], 0, s=80, c=color, zorder=3, edgecolors='white', linewidth=1.5)\n",
    "    label = f\"{m['year']}\\n{m['who']}\\n{m['what']}\"\n",
    "    va = 'bottom' if y > 0 else 'top'\n",
    "    ax.annotate(label, xy=(m['year'], y), fontsize=8, ha='center', va=va,\n",
    "                fontweight='bold',\n",
    "                bbox=dict(boxstyle='round,pad=0.4', facecolor=color, alpha=0.15,\n",
    "                          edgecolor=color, linewidth=1.5), color='black')\n",
    "\n",
    "for year in range(1945, 1990, 5):\n",
    "    ax.annotate(str(year), xy=(year, 0), xytext=(year, -0.15),\n",
    "                fontsize=9, ha='center', va='top', color='gray')\n",
    "    ax.plot([year, year], [-0.05, 0.05], 'k-', linewidth=1)\n",
    "\n",
    "ax.axvspan(1969, 1982, alpha=0.08, color='gray', zorder=0)\n",
    "ax.annotate('\"AI Winter\" for Neural Networks', xy=(1975.5, -1.7),\n",
    "            fontsize=11, ha='center', va='center', fontstyle='italic', color='gray',\n",
    "            bbox=dict(boxstyle='round,pad=0.3', facecolor='lightyellow',\n",
    "                      edgecolor='gray', alpha=0.8))\n",
    "\n",
    "legend_patches = [mpatches.Patch(color=color, label=category_labels[cat], alpha=0.7)\n",
    "                  for cat, color in category_colors.items()]\n",
    "ax.legend(handles=legend_patches, loc='upper left', fontsize=10,\n",
    "          framealpha=0.9, edgecolor='gray')\n",
    "\n",
    "ax.set_xlim(min_year, max_year)\n",
    "ax.set_ylim(-2.2, 2.2)\n",
    "ax.set_xlabel('Year', fontsize=13)\n",
    "ax.set_title('Key Milestones in Neural Network History (1943-1989)',\n",
    "             fontsize=16, fontweight='bold', pad=20)\n",
    "ax.spines['left'].set_visible(False)\n",
    "ax.spines['right'].set_visible(False)\n",
    "ax.spines['top'].set_visible(False)\n",
    "ax.get_yaxis().set_visible(False)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91d1c510",
   "metadata": {},
   "source": [
    "### Enhanced Timeline: Color-Coded Eras\n",
    "\n",
    "The following visualization divides the history into four distinct eras, each with its own color scheme, showing how the field evolved through periods of excitement, critique, dormancy, and renaissance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad47ff04",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as mpatches\n",
    "%matplotlib inline\n",
    "\n",
    "# Define the four eras\n",
    "eras = [\n",
    "    {'name': 'Pioneers Era',   'start': 1940, 'end': 1956, 'color': '#1565C0'},\n",
    "    {'name': 'Perceptron Era', 'start': 1956, 'end': 1969, 'color': '#2E7D32'},\n",
    "    {'name': 'AI Winter',      'start': 1969, 'end': 1982, 'color': '#B71C1C'},\n",
    "    {'name': 'Renaissance',    'start': 1982, 'end': 1992, 'color': '#FF6F00'},\n",
    "]\n",
    "\n",
    "def get_era_color(year):\n",
    "    for era in eras:\n",
    "        if era['start'] <= year < era['end']:\n",
    "            return era['color']\n",
    "    return '#888888'\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(20, 10))\n",
    "\n",
    "# Draw era backgrounds\n",
    "for era in eras:\n",
    "    ax.axvspan(era['start'], era['end'], alpha=0.12, color=era['color'], zorder=0)\n",
    "    mid = (era['start'] + era['end']) / 2\n",
    "    ax.text(mid, 4.5, era['name'], ha='center', va='center', fontsize=13,\n",
    "            fontweight='bold', color=era['color'], alpha=0.8,\n",
    "            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', edgecolor=era['color'],\n",
    "                      alpha=0.9, linewidth=1.5))\n",
    "\n",
    "# Draw central timeline\n",
    "ax.plot([1940, 1992], [0, 0], 'k-', linewidth=3, zorder=2)\n",
    "\n",
    "# Place milestones\n",
    "side = 1\n",
    "prev_year = None\n",
    "\n",
    "for i, m in enumerate(milestones):\n",
    "    year = m['year']\n",
    "    color = get_era_color(year)\n",
    "\n",
    "    if year != prev_year:\n",
    "        side *= -1\n",
    "    prev_year = year\n",
    "\n",
    "    y_base = side * (1.0 + (i % 3) * 0.8)\n",
    "\n",
    "    ax.plot([year, year], [0, y_base * 0.7], color=color, linewidth=1.5, alpha=0.6, zorder=2)\n",
    "    ax.scatter(year, 0, s=60, c=color, zorder=3, edgecolors='white', linewidth=1)\n",
    "\n",
    "    label_text = f\"{year}: {m['who']}\\n{m['what']}\"\n",
    "    va = 'bottom' if y_base > 0 else 'top'\n",
    "    ax.text(year, y_base, label_text, ha='center', va=va, fontsize=7.5,\n",
    "            fontweight='bold', color='black',\n",
    "            bbox=dict(boxstyle='round,pad=0.3', facecolor=color, alpha=0.15,\n",
    "                      edgecolor=color, linewidth=1))\n",
    "\n",
    "# Year axis ticks\n",
    "for year in range(1945, 1992, 5):\n",
    "    ax.plot([year, year], [-0.15, 0.15], 'k-', linewidth=1)\n",
    "    ax.text(year, -0.3, str(year), ha='center', va='top', fontsize=9, color='gray')\n",
    "\n",
    "era_patches = [mpatches.Patch(color=e['color'], alpha=0.5,\n",
    "               label=f\"{e['name']} ({e['start']}-{e['end']})\") for e in eras]\n",
    "ax.legend(handles=era_patches, loc='lower left', fontsize=10,\n",
    "          framealpha=0.95, edgecolor='gray', title='Historical Eras', title_fontsize=11)\n",
    "\n",
    "ax.set_xlim(1938, 1993)\n",
    "ax.set_ylim(-4.5, 5.5)\n",
    "ax.set_xlabel('Year', fontsize=13)\n",
    "ax.set_title('Neural Network History: Four Eras (1943-1989)', fontsize=16, fontweight='bold', pad=15)\n",
    "ax.spines['left'].set_visible(False)\n",
    "ax.spines['right'].set_visible(False)\n",
    "ax.spines['top'].set_visible(False)\n",
    "ax.get_yaxis().set_visible(False)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "916d3260",
   "metadata": {},
   "source": [
    "### Key Papers Reference Table\n",
    "\n",
    "The following table summarizes the most important papers in early neural network research, showing year, authors, title, and primary contribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "719b0eb3",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(18, 10))\n",
    "ax.axis('off')\n",
    "ax.set_title('Key Papers in Neural Network History (1943-1989)',\n",
    "             fontsize=16, fontweight='bold', pad=20)\n",
    "\n",
    "headers = ['Year', 'Author(s)', 'Title / Contribution', 'Impact']\n",
    "\n",
    "papers = [\n",
    "    ['1943', 'McCulloch & Pitts',\n",
    "     'A Logical Calculus of the Ideas\\nImmanent in Nervous Activity',\n",
    "     'First formal neuron model;\\nuniversality of M-P networks'],\n",
    "    ['1949', 'Donald Hebb',\n",
    "     'The Organization of Behavior',\n",
    "     'First learning rule;\\n\"fire together, wire together\"'],\n",
    "    ['1956', 'S.C. Kleene',\n",
    "     'Representation of Events in\\nNerve Nets and Finite Automata',\n",
    "     'M-P nets = finite automata;\\nintroduced regular expressions'],\n",
    "    ['1958', 'Frank Rosenblatt',\n",
    "     'The Perceptron: A Probabilistic\\nModel for Information Storage',\n",
    "     'First trainable neural network;\\nconvergence theorem'],\n",
    "    ['1960', 'Widrow & Hoff',\n",
    "     'Adaptive Switching Circuits\\n(ADALINE)',\n",
    "     'LMS / delta rule;\\nstill used in signal processing'],\n",
    "    ['1969', 'Minsky & Papert',\n",
    "     'Perceptrons: An Introduction\\nto Computational Geometry',\n",
    "     'Proved single-layer limits;\\ntriggered AI winter'],\n",
    "    ['1974', 'Paul Werbos',\n",
    "     'Beyond Regression\\n(PhD thesis)',\n",
    "     'First backpropagation\\nfor neural networks'],\n",
    "    ['1982', 'John Hopfield',\n",
    "     'Neural Networks and Physical\\nSystems with Emergent Properties',\n",
    "     'Energy-based networks;\\nreinvigorated the field'],\n",
    "    ['1986', 'Rumelhart, Hinton\\n& Williams',\n",
    "     'Learning Representations by\\nBack-Propagating Errors',\n",
    "     'Popularized backprop;\\nended the AI winter'],\n",
    "    ['1989', 'Hornik, Stinchcombe\\n& White',\n",
    "     'Multilayer Feedforward Networks\\nAre Universal Approximators',\n",
    "     'Theoretical foundation for\\nmulti-layer networks'],\n",
    "    ['1989', 'Yann LeCun et al.',\n",
    "     'Backpropagation Applied to\\nHandwritten Zip Code Recognition',\n",
    "     'First practical CNN success;\\nprecursor to modern deep learning'],\n",
    "]\n",
    "\n",
    "table = ax.table(cellText=papers, colLabels=headers, loc='center', cellLoc='center')\n",
    "table.auto_set_font_size(False)\n",
    "table.set_fontsize(9)\n",
    "table.scale(1.0, 2.2)\n",
    "\n",
    "for j in range(len(headers)):\n",
    "    cell = table[0, j]\n",
    "    cell.set_facecolor('#1A237E')\n",
    "    cell.set_text_props(color='white', fontweight='bold', fontsize=10)\n",
    "    cell.set_edgecolor('#E8EAF6')\n",
    "\n",
    "era_colors_by_year = {\n",
    "    '1943': '#E3F2FD', '1949': '#E3F2FD', '1956': '#E8F5E9',\n",
    "    '1958': '#E8F5E9', '1960': '#E8F5E9', '1969': '#FFEBEE',\n",
    "    '1974': '#FFEBEE', '1982': '#FFF3E0', '1986': '#FFF3E0',\n",
    "    '1989': '#FFF3E0',\n",
    "}\n",
    "\n",
    "for i in range(1, len(papers) + 1):\n",
    "    year = papers[i-1][0]\n",
    "    bg = era_colors_by_year.get(year, 'white')\n",
    "    for j in range(len(headers)):\n",
    "        cell = table[i, j]\n",
    "        cell.set_facecolor(bg)\n",
    "        cell.set_edgecolor('#E0E0E0')\n",
    "\n",
    "col_widths = [0.06, 0.15, 0.35, 0.25]\n",
    "for j, w in enumerate(col_widths):\n",
    "    for i in range(len(papers) + 1):\n",
    "        cell = table[i, j]\n",
    "        cell.set_width(w)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef22dc32",
   "metadata": {},
   "source": [
    "## 4. The Intellectual Lineages\n",
    "\n",
    "The milestones above are not isolated events. They form three major **lineages** of ideas:\n",
    "\n",
    "### Lineage 1: Backpropagation\n",
    "\n",
    "$$\\text{McCulloch-Pitts (1943)} \\to \\text{Rosenblatt (1957)} \\to \\text{Widrow-Hoff (1960)} \\to \\text{Linnainmaa (1970)} \\to \\text{Werbos (1974)} \\to \\text{RHW (1986)}$$\n",
    "\n",
    "This lineage traces the development of the key algorithm for training neural networks: **backpropagation of errors**.\n",
    "\n",
    "### Lineage 2: Learning Rules\n",
    "\n",
    "$$\\text{Hebb (1949)} \\to \\text{Rosenblatt (1958)} \\to \\text{Hopfield (1982)} \\to \\text{Boltzmann machines (1985)}$$\n",
    "\n",
    "This lineage focuses on the **principles** of learning --- how connections between neurons should change in response to experience.\n",
    "\n",
    "### Lineage 3: Theory and Limits\n",
    "\n",
    "$$\\text{McCulloch-Pitts (1943)} \\to \\text{Minsky-Papert (1969)} \\to \\text{Hornik/Cybenko (1989)}$$\n",
    "\n",
    "This lineage addresses the fundamental question: **what can neural networks compute?**\n",
    "\n",
    "Let us visualize these lineages.\n",
    "\n",
    "```{tip}\n",
    "Notice how the three lineages **converge** in the mid-1980s. Backpropagation provided the *algorithm*, the learning rule tradition provided the *principles*, and the universality theorems provided the *theoretical justification*. This convergence is what made the neural network renaissance possible. In research, breakthroughs often happen when multiple independent threads finally come together.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "568e9eef",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# --- Intellectual Lineage Diagram ---\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(18, 10))\n",
    "\n",
    "lineages = {\n",
    "    'Backpropagation': {\n",
    "        'color': '#FF5722', 'y': 3.0,\n",
    "        'nodes': [\n",
    "            (1943, 'McCulloch\\n& Pitts'),\n",
    "            (1957, 'Rosenblatt\\nPerceptron'),\n",
    "            (1960, 'Widrow\\n& Hoff'),\n",
    "            (1970, 'Linnainmaa\\nAuto-diff'),\n",
    "            (1974, 'Werbos\\nBackprop'),\n",
    "            (1986, 'Rumelhart,\\nHinton,\\nWilliams'),\n",
    "        ]\n",
    "    },\n",
    "    'Learning Rules': {\n",
    "        'color': '#4CAF50', 'y': 0.0,\n",
    "        'nodes': [\n",
    "            (1949, 'Hebb\\nLearning\\nPostulate'),\n",
    "            (1958, 'Rosenblatt\\nConvergence\\nTheorem'),\n",
    "            (1982, 'Hopfield\\nNetworks'),\n",
    "            (1985, 'Boltzmann\\nMachines'),\n",
    "        ]\n",
    "    },\n",
    "    'Theory & Limits': {\n",
    "        'color': '#2196F3', 'y': -3.0,\n",
    "        'nodes': [\n",
    "            (1943, 'McCulloch\\n& Pitts\\nUniversality'),\n",
    "            (1969, 'Minsky &\\nPapert\\nLimitations'),\n",
    "            (1989, 'Universal\\nApproximation\\nTheorem'),\n",
    "        ]\n",
    "    }\n",
    "}\n",
    "\n",
    "for name, lineage in lineages.items():\n",
    "    color = lineage['color']\n",
    "    y = lineage['y']\n",
    "    nodes = lineage['nodes']\n",
    "\n",
    "    for i in range(len(nodes) - 1):\n",
    "        x1, _ = nodes[i]\n",
    "        x2, _ = nodes[i + 1]\n",
    "        ax.annotate('', xy=(x2 - 1.5, y), xytext=(x1 + 1.5, y),\n",
    "                    arrowprops=dict(arrowstyle='->', color=color, lw=2.5, alpha=0.6))\n",
    "\n",
    "    for year, label in nodes:\n",
    "        ax.annotate(f\"{year}\\n{label}\", xy=(year, y),\n",
    "                    fontsize=8.5, ha='center', va='center', fontweight='bold',\n",
    "                    bbox=dict(boxstyle='round,pad=0.5', facecolor=color,\n",
    "                              alpha=0.2, edgecolor=color, linewidth=2))\n",
    "\n",
    "    ax.text(1940, y, name, fontsize=12, fontweight='bold', color=color,\n",
    "            ha='right', va='center',\n",
    "            bbox=dict(boxstyle='round,pad=0.3', facecolor='white',\n",
    "                      edgecolor=color, linewidth=1.5))\n",
    "\n",
    "cross_connections = [\n",
    "    ((1943, 3.0), (1943, -3.0), 'gray'),\n",
    "    ((1957, 3.0), (1958, 0.0), '#888888'),\n",
    "    ((1969, -3.0), (1982, 0.0), '#888888'),\n",
    "    ((1949, 0.0), (1957, 3.0), '#888888'),\n",
    "]\n",
    "\n",
    "for (x1, y1), (x2, y2), color in cross_connections:\n",
    "    ax.annotate('', xy=(x2, y2 + 0.5 * np.sign(y1 - y2)),\n",
    "                xytext=(x1, y1 - 0.5 * np.sign(y1 - y2)),\n",
    "                arrowprops=dict(arrowstyle='->', color=color, lw=1.5,\n",
    "                                linestyle='dashed', alpha=0.4))\n",
    "\n",
    "ax.set_xlim(1935, 1995)\n",
    "ax.set_ylim(-5, 5)\n",
    "ax.set_xlabel('Year', fontsize=13)\n",
    "ax.set_title('Three Intellectual Lineages of Neural Networks',\n",
    "             fontsize=16, fontweight='bold')\n",
    "ax.spines['left'].set_visible(False)\n",
    "ax.spines['right'].set_visible(False)\n",
    "ax.spines['top'].set_visible(False)\n",
    "ax.get_yaxis().set_visible(False)\n",
    "\n",
    "for year in range(1940, 1995, 5):\n",
    "    ax.axvline(x=year, color='lightgray', linestyle=':', alpha=0.5, zorder=0)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7920666f",
   "metadata": {},
   "source": [
    "## 5. Key Figures: Biographical Sketches\n",
    "\n",
    "```{note}\n",
    "**On the human side of science.** The biographies below reveal that the history of neural networks was shaped as much by personality, circumstance, and tragedy as by mathematical insight. Walter Pitts's brilliance was matched by personal suffering. Frank Rosenblatt died before seeing his ideas vindicated. The AI winter was driven as much by academic politics as by theoretical limitations. Keep these human dimensions in mind as you study the technical material.\n",
    "```\n",
    "\n",
    "### Warren McCulloch (1898--1969) and Walter Pitts (1923--1969)\n",
    "\n",
    "**McCulloch** was a polymath: philosopher, neurophysiologist, poet, and intellectual provocateur. Educated at Haverford (philosophy), Columbia (medicine), and trained in neurology, he spent decades thinking about the \"embodiment\" of mind in the nervous system. He was a central figure in the **Macy Conferences** on cybernetics (1946--1953), where he helped shape the interdisciplinary field of systems theory.\n",
    "\n",
    "**Pitts** was a tragic genius. Self-educated, he fled an abusive home in Detroit at 15 and found intellectual refuge at the University of Chicago, where he impressed both Carnap and Rashevsky. After the 1943 paper, Pitts moved to MIT to work with Wiener and McCulloch, producing another landmark paper with Jerome Lettvin (\"What the Frog's Eye Tells the Frog's Brain,\" 1959). But a rupture with Wiener in the mid-1950s --- apparently caused by personal misunderstandings and Wiener's volatile temperament --- devastated Pitts. He withdrew from academic life, began drinking heavily, and never completed his PhD. He died in 1969 of a pulmonary embolism at the age of 46, on the same year as McCulloch.\n",
    "\n",
    "### Donald Olding Hebb (1904--1985)\n",
    "\n",
    "A Canadian psychologist who studied under Karl Lashley at Harvard. Hebb was interested in how learning changes the brain at the cellular level. In his 1949 book *The Organization of Behavior*, he proposed the famous learning rule: \"When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased.\"\n",
    "\n",
    "This \"Hebbian learning\" is often summarized as **\"neurons that fire together, wire together.\"** While biologically oversimplified, this principle remains foundational in both computational neuroscience and machine learning.\n",
    "\n",
    "### Frank Rosenblatt (1928--1971)\n",
    "\n",
    "A psychologist at Cornell Aeronautical Laboratory who built the **Perceptron** --- the first neural network that could **learn** from data. Rosenblatt established the perceptron as a learning machine and gave an early finite-step guarantee for linearly separable data; later analyses such as Novikoff (1962) supplied the standard clean convergence theorem.\n",
    "\n",
    "The **Mark I Perceptron** was a physical machine built by the Navy: it used 400 photocells as inputs, randomly wired to a layer of \"association units,\" which fed into a set of output units with adjustable potentiometers for weights.\n",
    "\n",
    "Rosenblatt died in a boating accident on Chesapeake Bay in 1971, at the age of 43. He did not live to see the vindication of neural networks in the 1980s.\n",
    "\n",
    "### Marvin Minsky (1927--2016) and Seymour Papert (1928--2016)\n",
    "\n",
    "**Minsky** was one of the founders of the MIT AI Lab. Ironically, he had built one of the earliest neural network machines (the SNARC, 1951) before turning against the approach. Their 1969 book *Perceptrons* provided rigorous proofs that single-layer perceptrons cannot compute functions like XOR, parity, and connectedness. The book is widely credited (or blamed) for triggering the **first neural network winter**.\n",
    "\n",
    "### Paul Werbos (b. 1947)\n",
    "\n",
    "A Harvard graduate student who, in his 1974 PhD thesis, described how to apply **backpropagation** to train multi-layer neural networks. His work was largely ignored at the time.\n",
    "\n",
    "### David Rumelhart (1942--2011), Geoffrey Hinton (b. 1947), and Ronald Williams (b. 1949)\n",
    "\n",
    "The trio who, in their celebrated 1986 Nature paper, demonstrated that backpropagation could train multi-layer neural networks to learn useful **internal representations**. **Hinton** became the most prominent figure in modern deep learning and shared the 2024 Nobel Prize in Physics with John Hopfield."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bce2c84",
   "metadata": {},
   "source": [
    "## 6. The AI Winter (1969--1982)\n",
    "\n",
    "### What Happened\n",
    "\n",
    "```{warning}\n",
    "**The AI Winter: A Cautionary Tale About Funding and Groupthink**\n",
    "\n",
    "The publication of Minsky and Papert's *Perceptrons* in 1969 did not merely point out theoretical limitations of single-layer networks. It sent a **signal** to funding agencies, university departments, and young researchers that neural networks were a discredited approach. The consequences were severe and lasted over a decade.\n",
    "\n",
    "The key lesson: **a technically correct but narrowly scoped critique can have outsized negative effects** when amplified by institutional dynamics. Minsky and Papert proved limitations of *single-layer* perceptrons, but the community interpreted this as a condemnation of *all* neural networks. Funding agencies, always looking for reasons to cut programs, seized on the book as justification.\n",
    "```\n",
    "\n",
    "The consequences were severe:\n",
    "\n",
    "1. **Funding dried up.** DARPA and other major funders redirected money from neural network research to symbolic AI approaches.\n",
    "\n",
    "2. **Careers were damaged.** Researchers working on neural networks found it difficult to publish in top venues, get grants, or find faculty positions.\n",
    "\n",
    "3. **The narrative hardened.** The nuanced mathematical results of *Perceptrons* were reduced to a slogan: \"Neural networks don't work.\"\n",
    "\n",
    "4. **Alternative approaches dominated.** The 1970s and early 1980s were the era of **symbolic AI**: expert systems, production rules, frames, and logic-based reasoning.\n",
    "\n",
    "### What Survived\n",
    "\n",
    "```{note}\n",
    "**Persistence in the face of adversity.** Despite the hostile funding environment, a small number of researchers continued working on neural networks throughout the winter. Their persistence --- often at significant personal and professional cost --- is what kept the field alive long enough for the renaissance of the 1980s. Science depends on researchers who are willing to work on unfashionable problems.\n",
    "```\n",
    "\n",
    "Despite the winter, neural network research did not entirely stop:\n",
    "\n",
    "- **James Anderson** (1972) and **Teuvo Kohonen** (1972) independently developed **associative memory** models.\n",
    "- **Stephen Grossberg** continued developing neural network models at Boston University.\n",
    "- **Kunihiko Fukushima** (1980) developed the **Neocognitron** in Japan.\n",
    "- **Paul Werbos** (1974, 1982) continued developing backpropagation.\n",
    "- **John Hopfield** (1982) provided the crucial spark that reignited the field.\n",
    "\n",
    "### The Thaw\n",
    "\n",
    "The winter began to thaw around 1982--1985. By 1987--1988, neural networks were back. The first NeurIPS conference was held in 1987.\n",
    "\n",
    "```{tip}\n",
    "**What we can learn from the AI winter:** (1) Single-layer limitations do not imply multi-layer limitations. (2) The absence of a known training algorithm does not mean no algorithm exists. (3) Academic consensus can be wrong for extended periods. (4) Cross-disciplinary perspectives can break intellectual logjams. (5) Persistence matters.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d88df7b6",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# --- Visualization: Research Activity Over Time ---\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(14, 6))\n",
    "\n",
    "years_curve = np.arange(1943, 1990, 0.1)\n",
    "\n",
    "def interest_curve(t):\n",
    "    val = 0.0\n",
    "    val += 20 * np.exp(-0.5 * ((t - 1943) / 3)**2)\n",
    "    val += 40 * np.exp(-0.5 * ((t - 1958) / 3)**2)\n",
    "    if t > 1963:\n",
    "        val += 10 / (1 + np.exp(0.5 * (t - 1963)))\n",
    "    val += -30 * np.exp(-0.5 * ((t - 1972) / 4)**2)\n",
    "    val = max(val, 5)\n",
    "    val += 15 * np.exp(-0.5 * ((t - 1982) / 2)**2)\n",
    "    val += 60 * np.exp(-0.5 * ((t - 1987) / 2)**2)\n",
    "    return val\n",
    "\n",
    "interest_vals = np.array([interest_curve(t) for t in years_curve])\n",
    "\n",
    "ax.fill_between(years_curve, 0, interest_vals, alpha=0.3, color='steelblue')\n",
    "ax.plot(years_curve, interest_vals, color='steelblue', linewidth=2)\n",
    "\n",
    "annotations = [\n",
    "    (1943, 22, 'McCulloch &\\nPitts (1943)', 'above'),\n",
    "    (1957, 45, 'Perceptron\\n(1957)', 'above'),\n",
    "    (1969, 18, 'Minsky & Papert\\n(1969)', 'below'),\n",
    "    (1975, 8, 'AI Winter', 'below'),\n",
    "    (1982, 22, 'Hopfield\\n(1982)', 'above'),\n",
    "    (1986, 55, 'Backpropagation\\n(1986)', 'above'),\n",
    "]\n",
    "\n",
    "for year, y_target, label, pos in annotations:\n",
    "    if pos == 'above':\n",
    "        y_text = y_target + 10\n",
    "        va = 'bottom'\n",
    "    else:\n",
    "        y_text = max(y_target - 8, 1)\n",
    "        va = 'top'\n",
    "    ax.annotate(label, xy=(year, y_target), xytext=(year, y_text),\n",
    "                fontsize=9, ha='center', va=va, fontweight='bold',\n",
    "                arrowprops=dict(arrowstyle='->', color='black', lw=1),\n",
    "                bbox=dict(boxstyle='round,pad=0.3', facecolor='lightyellow',\n",
    "                          edgecolor='gray', alpha=0.9))\n",
    "\n",
    "ax.axvspan(1969, 1982, alpha=0.1, color='red', zorder=0)\n",
    "ax.set_xlabel('Year', fontsize=13)\n",
    "ax.set_ylabel('Relative Research Activity\\n(stylized)', fontsize=12)\n",
    "ax.set_title('The Rise, Fall, and Resurgence of Neural Networks',\n",
    "             fontsize=15, fontweight='bold')\n",
    "ax.set_xlim(1940, 1992)\n",
    "ax.set_ylim(0, 80)\n",
    "ax.grid(True, alpha=0.2)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dff5a0f8",
   "metadata": {},
   "source": [
    "## 7. Contested Attributions\n",
    "\n",
    "```{note}\n",
    "**On priority disputes in science.** The history of neural networks is plagued by contested attributions. This is not unique to neural networks --- similar disputes occur in virtually every scientific field. The pattern is recurring: an idea is discovered independently by multiple researchers, often in different countries and languages. The researcher who *popularizes* the idea (typically by publishing in a high-profile English-language venue) receives the credit, while earlier discoverers are forgotten.\n",
    "```\n",
    "\n",
    "### Backpropagation\n",
    "\n",
    "The algorithm universally associated with Rumelhart, Hinton, and Williams (1986) was independently discovered multiple times:\n",
    "\n",
    "1. **Henry J. Kelley** (1960) --- optimal control theory\n",
    "2. **Arthur E. Bryson** (1961) --- dynamic programming for gradients\n",
    "3. **Seppo Linnainmaa** (1970) --- reverse-mode automatic differentiation\n",
    "4. **Paul Werbos** (1974) --- first application to neural networks\n",
    "5. **David Parker** (1985) --- independent rediscovery\n",
    "6. **Yann LeCun** (1985) --- independent proposal\n",
    "7. **Rumelhart, Hinton, and Williams** (1986) --- popularization\n",
    "\n",
    "### The Broader Lesson\n",
    "\n",
    "These contested attributions remind us that:\n",
    "\n",
    "1. **Important ideas are often discovered independently** by researchers in different fields and countries.\n",
    "2. **Publication venue and language matter** --- work published in obscure journals or in languages other than English may be overlooked.\n",
    "3. **Popularization is different from invention** --- making an idea influential is a genuine contribution, but it should not erase the earlier discoverers.\n",
    "4. **History is written by the victors** --- the narrative that emerges is shaped by who becomes famous, not necessarily by who did the work first."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7687b104",
   "metadata": {},
   "source": [
    "## 8. Reading Guide\n",
    "\n",
    "For the serious student, here is a recommended reading order for the primary sources:\n",
    "\n",
    "### Essential Reading (in order)\n",
    "\n",
    "1. **McCulloch & Pitts (1943).** Read the first 3 pages carefully for the axioms and the definition of the neuron.\n",
    "\n",
    "2. **Hebb (1949).** *The Organization of Behavior*, Chapter 4. Focus on the postulate about synaptic modification.\n",
    "\n",
    "3. **Rosenblatt (1958).** \"The Perceptron.\" Read the introduction and the convergence theorem.\n",
    "\n",
    "4. **Minsky & Papert (1969).** *Perceptrons.* Read Introduction and Chapter 1, then Chapters 5 (parity) and 11 (connectedness).\n",
    "\n",
    "5. **Rumelhart, Hinton & Williams (1986).** \"Learning Representations by Back-Propagating Errors.\" Only 4 pages. Read every word.\n",
    "\n",
    "6. **Hornik, Stinchcombe & White (1989).** \"Multilayer Feedforward Networks Are Universal Approximators.\"\n",
    "\n",
    "### Historical Analysis\n",
    "\n",
    "- **Olazaran (1996).** \"A Sociological Study of the Official History of the Perceptrons Controversy.\"\n",
    "- **Schmidhuber (2015).** \"Deep Learning in Neural Networks: An Overview.\" "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c59949c",
   "metadata": {},
   "source": [
    "## 9. Exercises\n",
    "\n",
    "### Exercise 3.1: McCulloch & Pitts Abstract\n",
    "\n",
    "Read the abstract of McCulloch and Pitts (1943). In your own words (approximately 200 words), answer:\n",
    "- What problem are McCulloch and Pitts trying to solve?\n",
    "- What is their key insight?\n",
    "- What are the main results?\n",
    "- What are the main limitations they acknowledge?\n",
    "\n",
    "### Exercise 3.2: The Mark I Perceptron\n",
    "\n",
    "Research the **Mark I Perceptron** built by Frank Rosenblatt. Answer:\n",
    "1. What were the physical components of the machine?\n",
    "2. What kind of tasks could it learn?\n",
    "3. How did the popular press report on it?\n",
    "4. In what ways was the physical Mark I different from Rosenblatt's theoretical model?\n",
    "\n",
    "### Exercise 3.3: The Post-1969 Stagnation\n",
    "\n",
    "Explain in approximately 300 words why the field of neural networks stagnated after Minsky and Papert's *Perceptrons* (1969). Address:\n",
    "1. What *exactly* did Minsky and Papert prove?\n",
    "2. How were their results interpreted by the broader AI community?\n",
    "3. What role did funding politics play?\n",
    "4. What research continued during the \"winter\"?\n",
    "5. What ultimately ended the winter?\n",
    "\n",
    "```{hint}\n",
    ":class: dropdown\n",
    "\n",
    "For Exercise 3.3, consider the distinction between what Minsky and Papert *proved* (limitations of single-layer perceptrons for specific function classes like parity and connectedness) and what the community *concluded* (that all neural networks were useless). The expanded 1988 edition of *Perceptrons* includes an epilogue where the authors reflect on how their work was interpreted. Also consider the role of DARPA funding decisions and the concurrent rise of symbolic AI (expert systems) as a competing paradigm.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a028ada0",
   "metadata": {},
   "source": [
    "## 10. References\n",
    "\n",
    "### Primary Sources (Chronological)\n",
    "\n",
    "1. **McCulloch, W.S. and Pitts, W.** (1943). \"A Logical Calculus of the Ideas Immanent in Nervous Activity.\" *Bulletin of Mathematical Biophysics*, 5(4), 115--133.\n",
    "\n",
    "2. **Wiener, N.** (1948). *Cybernetics: Or Control and Communication in the Animal and the Machine*. MIT Press.\n",
    "\n",
    "3. **Hebb, D.O.** (1949). *The Organization of Behavior: A Neuropsychological Theory*. Wiley.\n",
    "\n",
    "4. **Kleene, S.C.** (1956). \"Representation of Events in Nerve Nets and Finite Automata.\" In Shannon and McCarthy (eds.), *Automata Studies*, Princeton University Press.\n",
    "\n",
    "5. **Rosenblatt, F.** (1958). \"The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.\" *Psychological Review*, 65(6), 386--408.\n",
    "\n",
    "6. **Widrow, B. and Hoff, M.E.** (1960). \"Adaptive Switching Circuits.\" *IRE WESCON Convention Record*, Part 4, 96--104.\n",
    "\n",
    "7. **Ivakhnenko, A.G. and Lapa, V.G.** (1965). *Cybernetic Predicting Devices*. CCM Information Corporation.\n",
    "\n",
    "8. **Amari, S.** (1967). \"A Theory of Adaptive Pattern Classifiers.\" *IEEE Transactions on Electronic Computers*, EC-16(3), 299--307.\n",
    "\n",
    "9. **Minsky, M.L. and Papert, S.A.** (1969). *Perceptrons: An Introduction to Computational Geometry*. MIT Press. (Expanded edition 1988.)\n",
    "\n",
    "10. **Linnainmaa, S.** (1970). \"The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors.\" Master's thesis, University of Helsinki.\n",
    "\n",
    "11. **Werbos, P.J.** (1974). \"Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.\" PhD thesis, Harvard University.\n",
    "\n",
    "12. **Fukushima, K.** (1980). \"Neocognitron.\" *Biological Cybernetics*, 36(4), 193--202.\n",
    "\n",
    "13. **Hopfield, J.J.** (1982). \"Neural Networks and Physical Systems with Emergent Collective Computational Abilities.\" *PNAS*, 79(8), 2554--2558.\n",
    "\n",
    "14. **Hinton, G.E. and Sejnowski, T.J.** (1985). \"Learning and Relearning in Boltzmann Machines.\" In *PDP*, Vol. 1, Chapter 7.\n",
    "\n",
    "15. **Rumelhart, D.E., Hinton, G.E., and Williams, R.J.** (1986). \"Learning Representations by Back-Propagating Errors.\" *Nature*, 323(6088), 533--536.\n",
    "\n",
    "16. **Hornik, K., Stinchcombe, M., and White, H.** (1989). \"Multilayer Feedforward Networks Are Universal Approximators.\" *Neural Networks*, 2(5), 359--366.\n",
    "\n",
    "17. **LeCun, Y. et al.** (1989). \"Backpropagation Applied to Handwritten Zip Code Recognition.\" *Neural Computation*, 1(4), 541--551.\n",
    "\n",
    "### Historical and Philosophical Sources\n",
    "\n",
    "18. **Piccinini, G.** (2004). \"The First Computational Theory of Mind and Brain.\" *Synthese*, 141(2), 175--215.\n",
    "\n",
    "19. **Abraham, T.H.** (2002). \"(Physio)logical Circuits.\" *Journal of the History of the Behavioral Sciences*, 38(1), 3--25.\n",
    "\n",
    "20. **Olazaran, M.** (1996). \"A Sociological Study of the Official History of the Perceptrons Controversy.\" *Social Studies of Science*, 26(3), 611--659.\n",
    "\n",
    "21. **Schmidhuber, J.** (2015). \"Deep Learning in Neural Networks: An Overview.\" *Neural Networks*, 61, 85--117.\n",
    "\n",
    "22. **Anderson, J.A. and Rosenfeld, E.** (eds.) (1988). *Neurocomputing: Foundations of Research*. MIT Press."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
