{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": [
    "# Chapter 9: Minsky and Papert's Analysis\n",
    "\n",
    "## The Book That Changed AI History\n",
    "\n",
    "In 1969, Marvin Minsky and Seymour Papert published *Perceptrons: An Introduction to Computational Geometry*, a rigorous mathematical analysis of what single-layer perceptrons can and cannot compute. The book's impact was enormous and, in many ways, disproportionate to its actual claims. While the theorems were correct, their interpretation by the broader AI community led to a near-complete abandonment of neural network research for over a decade.\n",
    "\n",
    "### The Authors\n",
    "\n",
    "**Marvin Minsky** (1927--2016) was a co-founder of the MIT AI Lab, a pioneer in artificial intelligence, and one of the most influential figures in computer science. He had actually built one of the first neural network machines (SNARC) in 1951, but by the 1960s had become skeptical of the perceptron approach.\n",
    "\n",
    "**Seymour Papert** (1928--2016) was a mathematician and computer scientist who had worked with Jean Piaget in Geneva before joining MIT. He brought a geometric and topological perspective to the analysis of perceptrons, which became the foundation of the book's theoretical framework.\n",
    "\n",
    "Together, they produced a masterpiece of mathematical analysis that was, unfortunately, read more for its conclusions than its nuances."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-0b",
   "metadata": {},
   "source": "```{danger}\n**AI Winter context** --- After this book, skepticism toward neural networks grew and funding declined substantially for ~15 years. The downturn had multiple causes, but *Perceptrons* became a focal point for the skepticism.\n\nThe publication of *Perceptrons* in 1969 did not merely slow down neural network research --- it nearly killed it. DARPA, the primary funder of AI research in the United States, redirected its resources almost entirely toward symbolic AI approaches (expert systems, logic programming, knowledge representation). Academic departments stopped hiring neural network researchers. Conference papers on the topic were rejected. Graduate students were actively discouraged from pursuing neural network dissertations. The field entered a period now called the **first AI Winter**, lasting roughly from 1969 to 1982.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-0c",
   "metadata": {},
   "source": [
    "```{warning}\n",
    "**What Minsky & Papert overstated** --- They proved limitations of **single-layer** perceptrons, but the public (and funders) interpreted this as limitations of **ALL** neural networks.\n",
    "\n",
    "The mathematical results in *Perceptrons* apply strictly to single-layer perceptrons (or, more precisely, to perceptrons of bounded order). Minsky and Papert were aware that multi-layer networks could overcome many of these limitations, but they expressed strong skepticism about the trainability of such networks. The nuance was lost in translation: what reached funding agencies and the broader AI community was the simple message that \"perceptrons don't work.\" This conflation of \"single-layer perceptrons have provable limitations\" with \"neural networks are a dead end\" was the most consequential misinterpretation in the history of AI.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-0d",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "**What they actually proved vs. what people thought they proved:**\n",
    "\n",
    "| What Minsky & Papert **proved** | What people **believed** |\n",
    "|:------|:------|\n",
    "| Single-layer perceptrons cannot compute XOR | Neural networks cannot compute XOR |\n",
    "| PARITY requires examining all inputs simultaneously | Neural networks cannot learn complex patterns |\n",
    "| Bounded-order perceptrons cannot detect connectedness | Neural networks are useless for vision |\n",
    "| No known efficient learning algorithm for multi-layer nets (in 1969) | No learning algorithm **exists** for multi-layer nets |\n",
    "\n",
    "The left column consists of correct, important mathematical theorems. The right column consists of false conclusions that the community drew. The gap between the two cost the field over a decade of progress.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-1",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from itertools import combinations\n",
    "\n",
    "plt.rcParams.update({\n",
    "    'figure.figsize': (8, 6),\n",
    "    'font.size': 12,\n",
    "    'axes.grid': True,\n",
    "    'grid.alpha': 0.3\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": [
    "## 1. The Formal Framework\n",
    "\n",
    "Minsky and Papert studied a generalized form of the perceptron that is more powerful than Rosenblatt's original model.\n",
    "\n",
    "### The Minsky-Papert Perceptron\n",
    "\n",
    "Let $R$ be a finite set (the **retina** --- think of it as a set of pixel positions). An input is a subset $X \\subseteq R$ (the set of \"active\" pixels). The perceptron computes a predicate $\\psi$ on inputs:\n",
    "\n",
    "$$\\psi(X) = 1 \\quad \\text{iff} \\quad \\sum_{i} \\alpha_i \\varphi_i(X) > \\theta$$\n",
    "\n",
    "where:\n",
    "- Each $\\varphi_i$ is a **partial predicate** that depends only on a bounded subset $S_i \\subseteq R$.\n",
    "- Each $\\alpha_i \\in \\mathbb{R}$ is a weight.\n",
    "- $\\theta \\in \\mathbb{R}$ is a threshold.\n",
    "\n",
    "### Difference from Rosenblatt\n",
    "\n",
    "In Rosenblatt's perceptron, each $\\varphi_i$ examines a **single** input point: $\\varphi_i(X) = \\mathbf{1}[r_i \\in X]$. In the Minsky-Papert framework, each $\\varphi_i$ can be any Boolean function of the inputs within its receptive field $S_i$. This makes the model **strictly more powerful** than Rosenblatt's, so any limitation proved for this model applies a fortiori to the simpler version.\n",
    "\n",
    "### Key Insight\n",
    "\n",
    "The central question is: **how large must the receptive fields $S_i$ be** to compute various predicates? If every predicate of interest requires looking at the entire retina, then the perceptron architecture provides no computational savings over a lookup table."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3",
   "metadata": {},
   "source": [
    "## 2. Order of a Perceptron\n",
    "\n",
    "The **order** of a perceptron representation is the key complexity measure in Minsky and Papert's analysis.\n",
    "\n",
    "**Definition.** The *order* of a perceptron $\\psi$ is:\n",
    "\n",
    "$$\\text{ord}(\\psi) = \\min\\{k : \\text{there exists a representation with all } |S_i| \\leq k\\}$$\n",
    "\n",
    "In other words, the order is the size of the largest receptive field needed by any partial predicate $\\varphi_i$.\n",
    "\n",
    "### Examples\n",
    "\n",
    "- **Order 1** = classical Rosenblatt perceptron. Each $\\varphi_i$ looks at a single input.\n",
    "- **Order 2** = each partial predicate can look at pairs of inputs.\n",
    "- **Order $k$** = each partial predicate can examine up to $k$ inputs simultaneously.\n",
    "- **Order $n$** (where $n = |R|$) = the perceptron can look at all inputs. This is equivalent to a threshold function of arbitrary Boolean features.\n",
    "\n",
    "### The Hierarchy\n",
    "\n",
    "Higher order means more power. An order-$k$ perceptron can simulate any order-$(k-1)$ perceptron (by ignoring extra inputs in its receptive fields). The question is: which predicates require high order?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": [
    "## 3. The Parity Theorem\n",
    "\n",
    "This is the most important result in the book and the one most directly relevant to the XOR problem.\n",
    "\n",
    "```{admonition} Theorem (Parity Theorem --- Minsky & Papert, 1969)\n",
    ":class: important\n",
    "\n",
    "The PARITY predicate on $n$ bits requires order $n$. That is, any perceptron computing PARITY must have at least one partial predicate that examines **all** $n$ inputs.\n",
    "\n",
    "Formally: $\\text{ord}(\\text{PARITY}) = n$.\n",
    "\n",
    "This means that no matter how cleverly you design the partial predicates, no matter how many of them you use, and no matter what weights you assign --- if each partial predicate looks at fewer than $n$ inputs, the perceptron **cannot** compute parity.\n",
    "```\n",
    "\n",
    "```{admonition} Proof\n",
    ":class: dropdown\n",
    "\n",
    "Let $R = \\{1, 2, \\ldots, n\\}$ be the set of input positions. For $X \\subseteq R$, define:\n",
    "\n",
    "$$\\text{PARITY}(X) = |X| \\mod 2$$\n",
    "\n",
    "Let $P_0 = \\{X \\subseteq R : |X| \\text{ is even}\\}$ and $P_1 = \\{X \\subseteq R : |X| \\text{ is odd}\\}$.\n",
    "\n",
    "Note that $|P_0| = |P_1| = 2^{n-1}$ (exactly half of all subsets have even size).\n",
    "\n",
    "Suppose for contradiction that PARITY can be computed by a perceptron of order $k < n$:\n",
    "\n",
    "$$\\text{PARITY}(X) = 1 \\iff \\sum_i \\alpha_i \\varphi_i(X) > \\theta$$\n",
    "\n",
    "where each $\\varphi_i$ depends only on $S_i \\subseteq R$ with $|S_i| \\leq k < n$.\n",
    "\n",
    "Define the **linear functional**:\n",
    "\n",
    "$$L(X) = \\sum_i \\alpha_i \\varphi_i(X)$$\n",
    "\n",
    "**Key Lemma.** For each partial predicate $\\varphi_i$ with $|S_i| < n$, there exists a bit $j_i \\in R \\setminus S_i$ (a bit not examined by $\\varphi_i$). Consider the involution $\\tau_{j_i}: 2^R \\to 2^R$ that flips bit $j_i$:\n",
    "\n",
    "$$\\tau_{j_i}(X) = X \\triangle \\{j_i\\} = \\begin{cases} X \\cup \\{j_i\\} & \\text{if } j_i \\notin X \\\\ X \\setminus \\{j_i\\} & \\text{if } j_i \\in X \\end{cases}$$\n",
    "\n",
    "This operation:\n",
    "1. **Toggles parity**: $\\text{PARITY}(\\tau_{j_i}(X)) = 1 - \\text{PARITY}(X)$, so $\\tau_{j_i}$ maps $P_0 \\to P_1$ bijectively.\n",
    "2. **Preserves $\\varphi_i$**: Since $j_i \\notin S_i$, the predicate $\\varphi_i$ cannot distinguish $X$ from $\\tau_{j_i}(X)$, so $\\varphi_i(\\tau_{j_i}(X)) = \\varphi_i(X)$.\n",
    "\n",
    "Using property 2, for each $\\varphi_i$:\n",
    "\n",
    "$$\\sum_{X \\in P_0} \\varphi_i(X) = \\sum_{X \\in P_0} \\varphi_i(\\tau_{j_i}(X)) = \\sum_{Y \\in P_1} \\varphi_i(Y)$$\n",
    "\n",
    "where the last step uses the bijection $\\tau_{j_i}: P_0 \\to P_1$.\n",
    "\n",
    "Therefore, by linearity:\n",
    "\n",
    "$$\\sum_{X \\in P_0} L(X) = \\sum_i \\alpha_i \\sum_{X \\in P_0} \\varphi_i(X) = \\sum_i \\alpha_i \\sum_{Y \\in P_1} \\varphi_i(Y) = \\sum_{Y \\in P_1} L(Y)$$\n",
    "\n",
    "But for PARITY to be correctly computed, we need:\n",
    "- $L(X) > \\theta$ for all $X \\in P_1$ (odd parity classified as 1)\n",
    "- $L(X) \\leq \\theta$ for all $X \\in P_0$ (even parity classified as 0)\n",
    "\n",
    "This gives:\n",
    "\n",
    "$$\\sum_{Y \\in P_1} L(Y) > |P_1| \\cdot \\theta = 2^{n-1} \\cdot \\theta$$\n",
    "\n",
    "$$\\sum_{X \\in P_0} L(X) \\leq |P_0| \\cdot \\theta = 2^{n-1} \\cdot \\theta$$\n",
    "\n",
    "So $\\sum_{Y \\in P_1} L(Y) > \\sum_{X \\in P_0} L(X)$.\n",
    "\n",
    "But we just showed $\\sum_{X \\in P_0} L(X) = \\sum_{Y \\in P_1} L(Y)$.\n",
    "\n",
    "$$\\boxed{\\text{Contradiction. Therefore } \\text{ord}(\\text{PARITY}) = n. \\quad \\blacksquare}$$\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Demonstration: the parity symmetry argument\n",
    "# For n=4, show that flipping any bit preserves partial predicate sums\n",
    "\n",
    "n = 4\n",
    "all_subsets = []\n",
    "for i in range(2**n):\n",
    "    subset = set()\n",
    "    for bit in range(n):\n",
    "        if i & (1 << bit):\n",
    "            subset.add(bit)\n",
    "    all_subsets.append(frozenset(subset))\n",
    "\n",
    "P0 = [X for X in all_subsets if len(X) % 2 == 0]  # even parity\n",
    "P1 = [X for X in all_subsets if len(X) % 2 == 1]  # odd parity\n",
    "\n",
    "print(f\"n = {n}\")\n",
    "print(f\"Total subsets: {len(all_subsets)}\")\n",
    "print(f\"|P0| (even parity) = {len(P0)}\")\n",
    "print(f\"|P1| (odd parity) = {len(P1)}\")\n",
    "\n",
    "# Define a partial predicate on S = {0, 1} (order 2 < n=4)\n",
    "S = frozenset({0, 1})\n",
    "\n",
    "def phi(X, S):\n",
    "    \"\"\"A partial predicate: returns 1 if X intersects S in exactly 1 element.\"\"\"\n",
    "    return int(len(X & S) == 1)\n",
    "\n",
    "sum_P0 = sum(phi(X, S) for X in P0)\n",
    "sum_P1 = sum(phi(X, S) for X in P1)\n",
    "\n",
    "print(f\"\\nPartial predicate phi on S = {set(S)}: 'exactly one element of S is in X'\")\n",
    "print(f\"Sum over P0: {sum_P0}\")\n",
    "print(f\"Sum over P1: {sum_P1}\")\n",
    "print(f\"Equal? {sum_P0 == sum_P1}  (as predicted by the theorem)\")\n",
    "\n",
    "# Verify for multiple partial predicates\n",
    "print(\"\\nVerification for all order-2 partial predicates on pairs:\")\n",
    "for pair in combinations(range(n), 2):\n",
    "    S_test = frozenset(pair)\n",
    "    # Try different predicates on this pair\n",
    "    for name, pred in [\n",
    "        (\"both in X\", lambda X, S: int(S.issubset(X))),\n",
    "        (\"at least one in X\", lambda X, S: int(len(X & S) >= 1)),\n",
    "        (\"exactly one in X\", lambda X, S: int(len(X & S) == 1)),\n",
    "        (\"neither in X\", lambda X, S: int(len(X & S) == 0))\n",
    "    ]:\n",
    "        s0 = sum(pred(X, S_test) for X in P0)\n",
    "        s1 = sum(pred(X, S_test) for X in P1)\n",
    "        assert s0 == s1, f\"FAILED for {set(S_test)}, {name}\"\n",
    "\n",
    "print(\"All partial predicates of order 2 have equal sums over P0 and P1.\")\n",
    "print(\"No linear combination can distinguish the two classes!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-5b",
   "metadata": {},
   "source": [
    "## 3a. Visualizing Why Parity Requires All Inputs\n",
    "\n",
    "The parity function has a unique structure: knowing **any** proper subset of the inputs tells you **nothing** about the output. Flipping any single unobserved bit toggles the parity. This is what makes it maximally hard for a bounded-order perceptron."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5c",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Visualization: Parity function showing why it requires examining ALL inputs\n",
    "# For each subset of inputs observed, show that both parities are equally likely\n",
    "\n",
    "fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
    "\n",
    "n = 4\n",
    "all_inputs = np.array([[int(b) for b in format(i, f'0{n}b')] for i in range(2**n)])\n",
    "parities = np.sum(all_inputs, axis=1) % 2\n",
    "\n",
    "# Panel 1: Observing 1 bit\n",
    "ax = axes[0]\n",
    "observed_bit = 0\n",
    "for val in [0, 1]:\n",
    "    mask = all_inputs[:, observed_bit] == val\n",
    "    p_even = np.sum(parities[mask] == 0)\n",
    "    p_odd = np.sum(parities[mask] == 1)\n",
    "    x_pos = [val - 0.15, val + 0.15]\n",
    "    bars = ax.bar(x_pos, [p_even, p_odd], width=0.25,\n",
    "                  color=['blue', 'red'], alpha=0.7,\n",
    "                  edgecolor='black', linewidth=1.5)\n",
    "    ax.text(val - 0.15, p_even + 0.2, str(p_even), ha='center', fontsize=11, fontweight='bold')\n",
    "    ax.text(val + 0.15, p_odd + 0.2, str(p_odd), ha='center', fontsize=11, fontweight='bold')\n",
    "\n",
    "ax.set_xticks([0, 1])\n",
    "ax.set_xticklabels(['$x_1=0$', '$x_1=1$'], fontsize=12)\n",
    "ax.set_ylabel('Count', fontsize=12)\n",
    "ax.set_title('Observe 1 bit: $x_1$\\nParity is 50/50', fontsize=13)\n",
    "ax.legend(['Even parity', 'Odd parity'], fontsize=10)\n",
    "ax.set_ylim(0, 6)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "# Panel 2: Observing 2 bits\n",
    "ax = axes[1]\n",
    "positions = []\n",
    "labels = []\n",
    "for v1 in [0, 1]:\n",
    "    for v2 in [0, 1]:\n",
    "        mask = (all_inputs[:, 0] == v1) & (all_inputs[:, 1] == v2)\n",
    "        p_even = np.sum(parities[mask] == 0)\n",
    "        p_odd = np.sum(parities[mask] == 1)\n",
    "        pos = v1 * 2 + v2\n",
    "        x_pos = [pos - 0.15, pos + 0.15]\n",
    "        ax.bar(x_pos, [p_even, p_odd], width=0.25,\n",
    "               color=['blue', 'red'], alpha=0.7,\n",
    "               edgecolor='black', linewidth=1.5)\n",
    "        ax.text(pos - 0.15, p_even + 0.1, str(p_even), ha='center', fontsize=10, fontweight='bold')\n",
    "        ax.text(pos + 0.15, p_odd + 0.1, str(p_odd), ha='center', fontsize=10, fontweight='bold')\n",
    "        labels.append(f'({v1},{v2})')\n",
    "\n",
    "ax.set_xticks(range(4))\n",
    "ax.set_xticklabels(labels, fontsize=11)\n",
    "ax.set_xlabel('$(x_1, x_2)$', fontsize=12)\n",
    "ax.set_ylabel('Count', fontsize=12)\n",
    "ax.set_title('Observe 2 bits: $x_1, x_2$\\nStill 50/50!', fontsize=13)\n",
    "ax.set_ylim(0, 4)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "# Panel 3: Observing 3 bits\n",
    "ax = axes[2]\n",
    "labels3 = []\n",
    "for v1 in [0, 1]:\n",
    "    for v2 in [0, 1]:\n",
    "        for v3 in [0, 1]:\n",
    "            mask = (all_inputs[:, 0] == v1) & (all_inputs[:, 1] == v2) & (all_inputs[:, 2] == v3)\n",
    "            p_even = np.sum(parities[mask] == 0)\n",
    "            p_odd = np.sum(parities[mask] == 1)\n",
    "            pos = v1 * 4 + v2 * 2 + v3\n",
    "            x_pos = [pos - 0.15, pos + 0.15]\n",
    "            ax.bar(x_pos, [p_even, p_odd], width=0.25,\n",
    "                   color=['blue', 'red'], alpha=0.7,\n",
    "                   edgecolor='black', linewidth=1.5)\n",
    "            ax.text(pos - 0.15, p_even + 0.05, str(p_even), ha='center', fontsize=9, fontweight='bold')\n",
    "            ax.text(pos + 0.15, p_odd + 0.05, str(p_odd), ha='center', fontsize=9, fontweight='bold')\n",
    "            labels3.append(f'{v1}{v2}{v3}')\n",
    "\n",
    "ax.set_xticks(range(8))\n",
    "ax.set_xticklabels(labels3, fontsize=9)\n",
    "ax.set_xlabel('$(x_1, x_2, x_3)$', fontsize=12)\n",
    "ax.set_ylabel('Count', fontsize=12)\n",
    "ax.set_title('Observe 3 of 4 bits\\nSTILL 50/50!', fontsize=13)\n",
    "ax.set_ylim(0, 2.5)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.suptitle('Parity on 4 bits: Observing any proper subset of inputs gives NO information\\n'\n",
    "             'Blue = even parity, Red = odd parity --- always balanced until you see ALL bits',\n",
    "             fontsize=14, y=1.06)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(\"Key insight: No matter how many bits you observe (up to n-1),\")\n",
    "print(\"the remaining unseen bit can always flip the parity.\")\n",
    "print(\"This is why the Parity Theorem requires order n.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-6",
   "metadata": {},
   "source": "## 4. The Group Invariance Theorem\n\nThis is the deepest and most technically sophisticated result in Minsky and Papert's book.\n\n### Setting\n\nLet $G$ be a group acting on the retina $R$ (e.g., the group of translations, rotations, or reflections of a grid). The action extends to subsets: for $g \\in G$ and $X \\subseteq R$, define $g(X) = \\{g(r) : r \\in X\\}$.\n\nA predicate $\\psi$ is **$G$-invariant** if $\\psi(g(X)) = \\psi(X)$ for all $g \\in G$ and $X \\subseteq R$.\n\n```{admonition} Theorem (Group Invariance Theorem)\n:class: important\n\nIf $G$ acts **transitively** on $R$ (meaning for any $r_1, r_2 \\in R$, there exists $g \\in G$ with $g(r_1) = r_2$), then any $G$-invariant predicate computable by a bounded-order perceptron depends only on $|X|$ (the number of active points).\n\n**Consequence:** Many interesting visual predicates (e.g., \"is there a vertical line in the image?\") are invariant under translations. The theorem says that if a perceptron has bounded order and computes such a predicate, it can only depend on the total number of active pixels --- a very crude statistic. This means that most **geometrically interesting** predicates cannot be computed by bounded-order perceptrons.\n```\n\n```{admonition} Proof\n:class: dropdown\n\n**Proof Sketch.**\n\n1. **Symmetrization**: For any partial predicate $\\varphi$ on $S \\subseteq R$, define its symmetrization:\n\n$$\\bar{\\varphi}(X) = \\frac{1}{|G|} \\sum_{g \\in G} \\varphi(g^{-1}(X))$$\n\n2. Because the perceptron computes a $G$-invariant function and $G$ acts transitively, the symmetrized predicates suffice.\n\n3. **Orbit counting**: By transitivity, symmetrized predicates are strongly constrained: the group action forces bounded-order partial predicates to lose access to fine-grained geometric structure.\n\nThe key insight is that symmetry under a transitive group severely restricts what a bounded-order perceptron can distinguish --- it can only access coarse statistics of the input, not detailed spatial structure. The formal details depend on the specific setup of orbits and receptive fields. $\\blacksquare$\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-7",
   "metadata": {},
   "source": "## 5. Connectedness\n\nThe **connectedness** predicate is perhaps the most visually compelling example of a function that perceptrons cannot compute.\n\n### The Problem\n\nConsider an $N \\times N$ grid of pixels. Given a set $X$ of active pixels, determine whether the active pixels form a **connected** figure (i.e., any two active pixels can be joined by a path through adjacent active pixels).\n\n### Theorem\n\n**Theorem (Minsky and Papert).** Connectedness cannot be computed by diameter-limited perceptrons with uniformly bounded local receptive fields. On larger grids, the required receptive-field size must grow with the geometric size of the image; the exact lower bound depends on the formal setup.\n\n### Proof Sketch\n\nThe key idea is to construct two figures that are **identical within any local neighborhood** but differ globally in connectedness:\n\n1. **Figure A** (connected): A spiral or long winding path that passes through many regions of the grid.\n2. **Figure B** (disconnected): The same path, broken at a single point far from any local neighborhood being examined.\n\nAny partial predicate with receptive field smaller than the distance between the endpoints cannot distinguish these two figures. Therefore, no bounded-order perceptron can compute connectedness.\n\n### Intuition\n\nConnectedness is inherently a **global** property. To determine whether two distant pixels are connected, you must trace a path between them, which may require examining the entire image. No collection of local observations can reliably determine global connectivity."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Visual demonstration: connected vs. disconnected figures\n",
    "# that look identical locally\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "N = 15\n",
    "\n",
    "# Create a winding path\n",
    "def create_path(N):\n",
    "    path = []\n",
    "    # Horizontal across row 2\n",
    "    for j in range(1, N-1):\n",
    "        path.append((2, j))\n",
    "    # Down right side\n",
    "    for i in range(2, N-2):\n",
    "        path.append((i, N-2))\n",
    "    # Horizontal across near bottom\n",
    "    for j in range(N-2, 0, -1):\n",
    "        path.append((N-3, j))\n",
    "    # Down left side\n",
    "    for i in range(N-3, 5, -1):\n",
    "        path.append((i, 1))\n",
    "    # Horizontal in middle\n",
    "    for j in range(1, N-3):\n",
    "        path.append((6, j))\n",
    "    # Down again\n",
    "    for i in range(6, 10):\n",
    "        path.append((i, N-4))\n",
    "    return path\n",
    "\n",
    "path = create_path(N)\n",
    "\n",
    "# Connected version\n",
    "grid_connected = np.zeros((N, N))\n",
    "for (i, j) in path:\n",
    "    if 0 <= i < N and 0 <= j < N:\n",
    "        grid_connected[i, j] = 1\n",
    "\n",
    "# Disconnected version: break the path at one point\n",
    "grid_disconnected = grid_connected.copy()\n",
    "break_point = path[len(path) // 2]  # break in the middle\n",
    "grid_disconnected[break_point[0], break_point[1]] = 0\n",
    "\n",
    "ax1 = axes[0]\n",
    "ax1.imshow(grid_connected, cmap='Blues', interpolation='nearest')\n",
    "ax1.set_title('Connected Figure', fontsize=14)\n",
    "ax1.set_xticks(range(N))\n",
    "ax1.set_yticks(range(N))\n",
    "ax1.grid(True, alpha=0.3)\n",
    "\n",
    "ax2 = axes[1]\n",
    "ax2.imshow(grid_disconnected, cmap='Blues', interpolation='nearest')\n",
    "ax2.set_title('Disconnected Figure (one pixel removed)', fontsize=14)\n",
    "# Mark the break point\n",
    "ax2.plot(break_point[1], break_point[0], 'rx', markersize=15, markeredgewidth=3)\n",
    "ax2.annotate('Break here', (break_point[1], break_point[0]),\n",
    "             xytext=(break_point[1]+1.5, break_point[0]-1.5),\n",
    "             fontsize=11, color='red',\n",
    "             arrowprops=dict(arrowstyle='->', color='red'))\n",
    "ax2.set_xticks(range(N))\n",
    "ax2.set_yticks(range(N))\n",
    "ax2.grid(True, alpha=0.3)\n",
    "\n",
    "plt.suptitle('Connectedness: A single pixel change far from any local view\\n'\n",
    "             'changes the global property. No local predicate can detect this.',\n",
    "             fontsize=13, y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(f\"Total active pixels (connected): {int(grid_connected.sum())}\")\n",
    "print(f\"Total active pixels (disconnected): {int(grid_disconnected.sum())}\")\n",
    "print(f\"Difference: just 1 pixel at position {break_point}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-9",
   "metadata": {},
   "source": [
    "## 6. Order Results: Summary Table\n",
    "\n",
    "The following table summarizes the key results from Minsky and Papert's analysis. It shows the **order** (minimum receptive field size) required for various predicates.\n",
    "\n",
    "| Predicate | Order Required | Notes |\n",
    "|:----------|:-------------:|:------|\n",
    "| OR | 1 | $w_i = 1$, $\\theta = 0.5$ |\n",
    "| AND | 1 | $w_i = 1$, $\\theta = n - 0.5$ |\n",
    "| MAJORITY | 1 | $w_i = 1$, $\\theta = n/2$ |\n",
    "| PARITY | $n$ | Must examine ALL inputs |\n",
    "| Connectedness | $\\Omega(\\sqrt{N})$ for $N \\times N$ grid | Inherently global |\n",
    "| Convexity | $\\Omega(\\sqrt{N})$ | Similar to connectedness |\n",
    "| Euler number | $O(1)$ | Surprisingly local! |\n",
    "\n",
    "### Key Observations\n",
    "\n",
    "1. **OR, AND, MAJORITY** are all order 1: each partial predicate examines a single input. These are exactly the linearly separable functions.\n",
    "\n",
    "2. **PARITY** is worst-case: it requires order $n$ (the maximum possible). No local information helps at all.\n",
    "\n",
    "3. **Connectedness** is between: it requires large but not maximal order. The $\\Omega(\\sqrt{N})$ bound reflects the fact that you must \"see across\" the image.\n",
    "\n",
    "4. **Euler number** (number of connected components minus number of holes) is surprisingly computable by a low-order perceptron, because it can be expressed as a sum of local contributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-10",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Demonstrate: OR, AND, MAJORITY are order-1 (linearly separable)\n",
    "# while PARITY is not\n",
    "\n",
    "n_bits = 4\n",
    "\n",
    "# Generate all 2^n binary inputs\n",
    "inputs = np.array([[int(b) for b in format(i, f'0{n_bits}b')] for i in range(2**n_bits)])\n",
    "\n",
    "# Define predicates\n",
    "def or_pred(x): return int(np.sum(x) >= 1)\n",
    "def and_pred(x): return int(np.sum(x) == n_bits)\n",
    "def majority_pred(x): return int(np.sum(x) > n_bits/2)\n",
    "def parity_pred(x): return int(np.sum(x) % 2 == 1)\n",
    "\n",
    "predicates = [\n",
    "    ('OR', or_pred, 'sum >= 1'),\n",
    "    ('AND', and_pred, f'sum = {n_bits}'),\n",
    "    ('MAJORITY', majority_pred, f'sum > {n_bits}/2'),\n",
    "    ('PARITY', parity_pred, 'sum mod 2')\n",
    "]\n",
    "\n",
    "fig, axes = plt.subplots(1, 4, figsize=(18, 4))\n",
    "\n",
    "for idx, (name, pred, rule) in enumerate(predicates):\n",
    "    ax = axes[idx]\n",
    "    sums = np.sum(inputs, axis=1)\n",
    "    outputs = np.array([pred(x) for x in inputs])\n",
    "    \n",
    "    # Plot: x-axis = sum of inputs, y-axis = output\n",
    "    # Jitter for visibility\n",
    "    np.random.seed(42)\n",
    "    jitter = np.random.uniform(-0.1, 0.1, len(sums))\n",
    "    colors = ['blue' if o == 0 else 'red' for o in outputs]\n",
    "    \n",
    "    ax.scatter(sums + jitter, outputs + jitter * 0.3, c=colors, s=60, alpha=0.7, edgecolors='black')\n",
    "    ax.set_xlabel('Sum of inputs', fontsize=11)\n",
    "    ax.set_ylabel('Output', fontsize=11)\n",
    "    ax.set_title(f'{name}\\n(Rule: {rule})', fontsize=12)\n",
    "    ax.set_yticks([0, 1])\n",
    "    ax.set_xticks(range(n_bits + 1))\n",
    "    \n",
    "    # Check if the predicate depends only on the sum (order 1)\n",
    "    sum_consistent = True\n",
    "    for s in range(n_bits + 1):\n",
    "        mask = sums == s\n",
    "        if mask.any():\n",
    "            vals = outputs[mask]\n",
    "            if len(set(vals)) > 1:\n",
    "                sum_consistent = False\n",
    "                break\n",
    "    \n",
    "    order_str = \"Order 1\" if sum_consistent else f\"Order {n_bits}\"\n",
    "    color = 'green' if sum_consistent else 'red'\n",
    "    ax.text(0.5, -0.2, order_str, transform=ax.transAxes, ha='center',\n",
    "            fontsize=13, fontweight='bold', color=color)\n",
    "\n",
    "plt.suptitle(f'Predicates on {n_bits}-bit inputs: Order-1 predicates depend only on the sum',\n",
    "             fontsize=14, y=1.05)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(\"\\nNotice: OR, AND, MAJORITY all produce consistent outputs for each sum value.\")\n",
    "print(\"PARITY does not: inputs with the same sum can have different parity.\")\n",
    "print(\"This is why PARITY requires order n --- the sum alone is not enough.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-11",
   "metadata": {},
   "source": [
    "## 7. The AI Winter\n",
    "\n",
    "The publication of *Perceptrons* in 1969 had consequences far beyond what a mathematics monograph normally achieves. It contributed to what is now called the **first AI winter** --- a period of dramatically reduced funding, interest, and research activity in neural networks.\n",
    "\n",
    "### The Impact\n",
    "\n",
    "1. **Funding dried up.** Government agencies (especially DARPA in the US) shifted funding away from neural networks toward symbolic AI. The message --- whether accurate or not --- was that perceptrons were a dead end.\n",
    "\n",
    "2. **Careers were derailed.** Researchers working on neural networks found it difficult to publish papers, get grants, or find academic positions. The field became unfashionable.\n",
    "\n",
    "3. **Research went underground.** Neural network work did not stop entirely, but it moved to the margins of the field.\n",
    "\n",
    "### Frank Rosenblatt's Death\n",
    "\n",
    "Frank Rosenblatt, the inventor of the perceptron, died in a boating accident on the Chesapeake Bay on July 11, 1971, at the age of 43. He did not live to see the vindication of his ideas. His death deprived the field of its most passionate advocate at precisely the moment when advocacy was most needed.\n",
    "\n",
    "### What Survived\n",
    "\n",
    "Despite the winter, several researchers continued important work on neural networks:\n",
    "\n",
    "- **Stephen Grossberg** (1970s--): Developed Adaptive Resonance Theory (ART) and continued rigorous mathematical analysis of neural dynamics.\n",
    "- **Teuvo Kohonen** (1970s--): Developed self-organizing maps for unsupervised learning.\n",
    "- **Shun-ichi Amari** (1970s--): Developed mathematical foundations for neural learning in Japan.\n",
    "- **Kunihiko Fukushima** (1980): Created the Neocognitron, a precursor to convolutional neural networks.\n",
    "- **Paul Werbos** (1974): Described the backpropagation algorithm in his PhD thesis, though it went largely unnoticed until the 1980s.\n",
    "\n",
    "These researchers kept the flame alive during the dark years, and their work provided crucial foundations for the renaissance that would come in the 1980s."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-11b",
   "metadata": {},
   "source": [
    "## 7a. Timeline of the AI Winter and Funding Impact\n",
    "\n",
    "The following visualization shows the dramatic impact of *Perceptrons* on the trajectory of neural network research, from the optimistic early years through the winter and eventual revival."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-11c",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Timeline of funding cuts and the AI Winter\n",
    "fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(16, 10), gridspec_kw={'height_ratios': [2, 1]})\n",
    "\n",
    "# --- Top panel: Estimated relative funding for neural network research ---\n",
    "years = np.arange(1955, 1992)\n",
    "# Approximate relative funding curve (stylized)\n",
    "funding = np.zeros_like(years, dtype=float)\n",
    "for i, yr in enumerate(years):\n",
    "    if yr < 1957:\n",
    "        funding[i] = 10 + (yr - 1955) * 5\n",
    "    elif yr < 1965:\n",
    "        funding[i] = 20 + (yr - 1957) * 10  # Rapid growth\n",
    "    elif yr < 1969:\n",
    "        funding[i] = 100 - (yr - 1965) * 5  # Slight decline\n",
    "    elif yr < 1970:\n",
    "        funding[i] = 80 - 40  # Sharp cut after Perceptrons\n",
    "    elif yr < 1982:\n",
    "        funding[i] = max(5, 40 - (yr - 1970) * 3)  # Continued decline\n",
    "    elif yr < 1986:\n",
    "        funding[i] = 10 + (yr - 1982) * 5  # Slow recovery\n",
    "    else:\n",
    "        funding[i] = 30 + (yr - 1986) * 15  # Rapid recovery after backprop\n",
    "\n",
    "ax1.fill_between(years, 0, funding, alpha=0.3, color='steelblue')\n",
    "ax1.plot(years, funding, 'b-', linewidth=2)\n",
    "\n",
    "# Mark key events\n",
    "events_top = [\n",
    "    (1957, 'Perceptron\\ninvented', 'green'),\n",
    "    (1969, 'Minsky-Papert\\n\"Perceptrons\"', 'red'),\n",
    "    (1971, 'Rosenblatt\\ndeath', 'black'),\n",
    "    (1974, 'Werbos\\nbackprop\\nthesis', 'orange'),\n",
    "    (1982, 'Hopfield\\nnetworks', 'blue'),\n",
    "    (1986, 'Backprop\\npublished\\nin Nature', 'green'),\n",
    "]\n",
    "\n",
    "for yr, label, color in events_top:\n",
    "    idx = yr - 1955\n",
    "    if 0 <= idx < len(funding):\n",
    "        ax1.axvline(x=yr, color=color, linestyle='--', alpha=0.5, linewidth=1.5)\n",
    "        ax1.annotate(label, (yr, funding[idx]),\n",
    "                     xytext=(yr, funding[idx] + 15),\n",
    "                     fontsize=9, ha='center', color=color, fontweight='bold',\n",
    "                     arrowprops=dict(arrowstyle='->', color=color, linewidth=1.5))\n",
    "\n",
    "# Shade the AI Winter\n",
    "ax1.axvspan(1969, 1982, alpha=0.1, color='red', label='AI Winter (~1969-1982)')\n",
    "\n",
    "ax1.set_ylabel('Relative Funding Level\\n(stylized)', fontsize=12)\n",
    "ax1.set_title('Neural Network Research Funding: The AI Winter', fontsize=14)\n",
    "ax1.legend(fontsize=11, loc='upper left')\n",
    "ax1.set_xlim(1955, 1991)\n",
    "ax1.set_ylim(0, 140)\n",
    "ax1.grid(True, alpha=0.3)\n",
    "\n",
    "# --- Bottom panel: Key publications and milestones ---\n",
    "milestones = [\n",
    "    (1943, 'McCulloch-Pitts model'),\n",
    "    (1949, 'Hebb\\'s learning rule'),\n",
    "    (1957, 'Rosenblatt: Perceptron'),\n",
    "    (1958, 'Convergence Theorem'),\n",
    "    (1960, 'ADALINE (Widrow)'),\n",
    "    (1969, 'Perceptrons (book)'),\n",
    "    (1971, 'Rosenblatt dies'),\n",
    "    (1974, 'Werbos: backprop'),\n",
    "    (1980, 'Neocognitron'),\n",
    "    (1982, 'Hopfield networks'),\n",
    "    (1985, 'Boltzmann machine'),\n",
    "    (1986, 'Backprop in Nature'),\n",
    "    (1988, 'Perceptrons epilogue'),\n",
    "    (1989, 'Universal Approx. Thm'),\n",
    "]\n",
    "\n",
    "for i, (yr, label) in enumerate(milestones):\n",
    "    side = 1 if i % 2 == 0 else -1\n",
    "    color = 'red' if 1969 <= yr <= 1982 else 'green' if yr >= 1982 else 'steelblue'\n",
    "    ax2.plot(yr, 0, 'o', color=color, markersize=8, zorder=5)\n",
    "    ax2.annotate(label, (yr, 0), xytext=(yr, side * 0.5),\n",
    "                fontsize=8, ha='center', va='center', color=color,\n",
    "                fontweight='bold',\n",
    "                arrowprops=dict(arrowstyle='-', color=color, linewidth=0.5))\n",
    "\n",
    "ax2.axhline(y=0, color='black', linewidth=2)\n",
    "ax2.axvspan(1969, 1982, alpha=0.1, color='red')\n",
    "ax2.set_xlim(1940, 1992)\n",
    "ax2.set_ylim(-1.2, 1.2)\n",
    "ax2.set_xlabel('Year', fontsize=12)\n",
    "ax2.set_title('Key Milestones in Neural Network History', fontsize=13)\n",
    "ax2.set_yticks([])\n",
    "ax2.grid(True, axis='x', alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-12",
   "metadata": {},
   "source": [
    "## 8. What Was Correct vs. What Was Overstated\n",
    "\n",
    "With the benefit of hindsight, we can evaluate Minsky and Papert's claims more carefully.\n",
    "\n",
    "### What Was Correct\n",
    "\n",
    "The mathematical theorems in the book are **entirely correct**. Single-layer perceptrons genuinely cannot compute XOR, PARITY, connectedness, or many other important functions. The proofs are rigorous and elegant. These results remain valid and important today.\n",
    "\n",
    "### What Was Acknowledged but Dismissed\n",
    "\n",
    "Minsky and Papert were well aware that multi-layer networks could overcome the limitations they proved. In the original 1969 edition, they wrote:\n",
    "\n",
    "> *\"The perceptron has shown itself worthy of study despite (and even because of!) its severe limitations. It has many features to attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version.\"*\n",
    "\n",
    "The key phrase is the last sentence: they **acknowledged** that multi-layer networks existed but **expressed skepticism** that the desirable properties (especially learnability) would carry over. This was a reasonable concern in 1969, when no efficient learning algorithm for multi-layer networks was known.\n",
    "\n",
    "### The \"No Learning Algorithm\" Claim\n",
    "\n",
    "The most consequential claim was that there was no known way to **train** multi-layer networks. This was:\n",
    "- **Technically true** in 1969: no practical learning algorithm for multi-layer networks was widely known.\n",
    "- **Wrong in spirit**: the backpropagation algorithm had been described in various forms (Kelley 1960, Bryson 1961, Dreyfus 1962), and Werbos would describe it explicitly for neural networks in 1974.\n",
    "\n",
    "### The 1988 Epilogue\n",
    "\n",
    "In the expanded 1988 edition, Minsky and Papert added a new chapter addressing the backpropagation revolution. They acknowledged the success of multi-layer networks but raised new concerns:\n",
    "- Training time might be impractical for large networks.\n",
    "- Local minima might prevent convergence.\n",
    "- The representations learned might not generalize well.\n",
    "\n",
    "Some of these concerns proved valid (training time is indeed a challenge), while others were overstated (local minima are less problematic than feared, and generalization has been remarkably successful)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-13",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Timeline visualization: The AI Winter and Recovery\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(16, 6))\n",
    "\n",
    "# Timeline events\n",
    "events = [\n",
    "    (1943, 'McCulloch-Pitts\\nneuron model', 'green'),\n",
    "    (1949, 'Hebb\\'s\\nlearning rule', 'green'),\n",
    "    (1957, 'Rosenblatt\\'s\\nPerceptron', 'green'),\n",
    "    (1958, 'Perceptron\\nConvergence\\nTheorem', 'green'),\n",
    "    (1969, 'Minsky & Papert\\n\"Perceptrons\"', 'red'),\n",
    "    (1971, 'Rosenblatt\\ndeath', 'black'),\n",
    "    (1974, 'Werbos:\\nbackprop\\n(PhD thesis)', 'orange'),\n",
    "    (1980, 'Fukushima:\\nNeocognitron', 'orange'),\n",
    "    (1982, 'Hopfield\\nnetworks', 'blue'),\n",
    "    (1986, 'Rumelhart et al.:\\nbackprop\\npublished', 'green'),\n",
    "    (1988, 'Minsky-Papert\\nepilogue', 'purple'),\n",
    "    (1989, 'Universal\\nApprox.\\nTheorem', 'green'),\n",
    "]\n",
    "\n",
    "# Draw timeline\n",
    "years = [e[0] for e in events]\n",
    "ax.plot([min(years)-2, max(years)+2], [0, 0], 'k-', linewidth=2)\n",
    "\n",
    "for i, (year, label, color) in enumerate(events):\n",
    "    side = 1 if i % 2 == 0 else -1\n",
    "    ax.plot(year, 0, 'o', color=color, markersize=10, zorder=5)\n",
    "    ax.plot([year, year], [0, side * 0.5], '-', color=color, linewidth=1.5)\n",
    "    ax.text(year, side * 0.6, label, ha='center', va='bottom' if side > 0 else 'top',\n",
    "            fontsize=8.5, color=color, fontweight='bold')\n",
    "    ax.text(year, side * 0.15, str(year), ha='center', va='center', fontsize=8, color='gray')\n",
    "\n",
    "# Shade the AI Winter\n",
    "ax.axvspan(1969, 1982, alpha=0.1, color='blue', label='AI Winter (approx. 1969-1982)')\n",
    "\n",
    "ax.set_xlim(1940, 1992)\n",
    "ax.set_ylim(-1.5, 1.5)\n",
    "ax.set_xlabel('Year', fontsize=12)\n",
    "ax.set_title('Timeline: The Rise, Fall, and Revival of Neural Networks', fontsize=14)\n",
    "ax.legend(loc='lower right', fontsize=11)\n",
    "ax.set_yticks([])\n",
    "ax.grid(True, axis='x', alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-14",
   "metadata": {},
   "source": [
    "## 9. Exercises\n",
    "\n",
    "### Exercise 9.1: MAJORITY is Order 1\n",
    "\n",
    "The MAJORITY predicate on $n$ bits returns 1 if more than half the bits are 1:\n",
    "\n",
    "$$\\text{MAJORITY}(x_1, \\ldots, x_n) = \\begin{cases} 1 & \\text{if } \\sum_i x_i > n/2 \\\\ 0 & \\text{otherwise} \\end{cases}$$\n",
    "\n",
    "Explain why MAJORITY can be computed by an order-1 perceptron (i.e., a classical Rosenblatt perceptron). Give the explicit weights and threshold.\n",
    "\n",
    "*Hint:* What are the partial predicates $\\varphi_i$ for an order-1 perceptron?\n",
    "\n",
    "\n",
    "### Exercise 9.2: Connectedness Requires Global Information\n",
    "\n",
    "Give an **intuitive explanation** (not a formal proof) of why determining whether a set of pixels forms a connected figure requires looking at the entire image. Use the following thought experiment:\n",
    "\n",
    "Imagine you can only look at small patches of the image (say, $3 \\times 3$ regions). Construct two images that look identical in every $3 \\times 3$ patch but differ in connectivity. What property of the path forces you to look at a large region?\n",
    "\n",
    "\n",
    "### Exercise 9.3: The 1988 Epilogue\n",
    "\n",
    "In the 1988 expanded edition of *Perceptrons*, Minsky and Papert added a chapter responding to the backpropagation revolution. Consider the following questions:\n",
    "\n",
    "1. What specific concerns did they raise about the scalability of backpropagation?\n",
    "2. Which of these concerns have been validated by subsequent research, and which have been resolved?\n",
    "3. In what sense was their original (1969) skepticism about multi-layer learning algorithms justified, and in what sense was it overstated?\n",
    "\n",
    "Write a thoughtful 1-page response addressing these questions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-15",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Space for Exercise 9.1: Verify that MAJORITY is order-1\n",
    "\n",
    "n = 5  # number of bits\n",
    "\n",
    "# Generate all 2^n inputs\n",
    "inputs = np.array([[int(b) for b in format(i, f'0{n}b')] for i in range(2**n)])\n",
    "\n",
    "# Order-1 perceptron for MAJORITY:\n",
    "# phi_i(X) = x_i (each partial predicate looks at one bit)\n",
    "# weights: alpha_i = 1 for all i\n",
    "# threshold: theta = n/2\n",
    "\n",
    "w = np.ones(n)  # all weights = 1\n",
    "theta = n / 2    # threshold\n",
    "\n",
    "print(f\"MAJORITY on {n} bits:\")\n",
    "print(f\"Weights: {w}\")\n",
    "print(f\"Threshold: {theta}\")\n",
    "print(f\"Decision: output 1 iff w.x > theta, i.e., sum(x) > {theta}\")\n",
    "print()\n",
    "\n",
    "correct = 0\n",
    "total = 0\n",
    "for x in inputs:\n",
    "    pred = int(np.dot(w, x) > theta)\n",
    "    true = int(np.sum(x) > n/2)\n",
    "    if pred == true:\n",
    "        correct += 1\n",
    "    total += 1\n",
    "\n",
    "print(f\"Accuracy: {correct}/{total} = {correct/total:.1%}\")\n",
    "print(\"\\nMAJORITY is correctly computed by an order-1 perceptron.\")\n",
    "print(\"Each partial predicate examines exactly one input bit.\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}