{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": "# Chapter 8: The XOR Problem\n\n## The Most Famous Impossibility Result in Neural Network Theory\n\nThe XOR (exclusive-or) problem stands as one of the most important results in the history of artificial intelligence. It is the simplest Boolean function that a single-layer perceptron **cannot** compute. This seemingly modest observation had enormous consequences: it helped trigger the first \"AI winter,\" a period of reduced funding and interest in neural network research that lasted from roughly 1969 to 1986.\n\nIn this chapter, we provide two complete proofs that XOR is not linearly separable, visualize the geometric obstruction, watch a perceptron fail to learn XOR, and then show how adding a single hidden layer resolves the problem entirely.\n\n### Why XOR Matters\n\nXOR is not just an abstract curiosity. It is:\n- The **simplest** non-linearly-separable Boolean function (only 2 inputs, 4 data points).\n- **Functionally complete** when combined with AND and a constant 1: {XOR, AND, 1} can generate any Boolean function via algebraic normal form.\n- The **2-bit case** of the PARITY function, which generalizes to $n$ bits.\n- A **pedagogical gem**: the failure is visible in a 2D plot with just 4 points."
  },
  {
   "cell_type": "markdown",
   "id": "cell-0b",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "**Why XOR is the simplest non-linearly-separable function.**\n",
    "\n",
    "There are exactly $2^{2^2} = 16$ Boolean functions of two variables. Of these 16, exactly **14 are linearly separable** (AND, OR, NAND, NOR, constant 0/1, identity, negation, implications, etc.). Only **2 are not**: XOR and XNOR (its complement). XOR uses the absolute minimum --- 2 inputs, 4 data points --- to demonstrate the failure of linear separability. You literally cannot construct a simpler counterexample.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-0c",
   "metadata": {},
   "source": [
    "```{danger}\n",
    "**THE critical limitation** --- XOR impossibility killed perceptron research and triggered the first AI Winter. This is the single most consequential result in early neural network history.\n",
    "\n",
    "When Minsky and Papert published their proof in 1969, funding agencies interpreted it as evidence that neural networks were a dead end. DARPA and other agencies redirected funding to symbolic AI. Researchers who had devoted their careers to neural networks found themselves unable to publish or obtain grants. The field went into hibernation for nearly **17 years**, until the backpropagation revolution of 1986.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-0d",
   "metadata": {},
   "source": [
    "```{note}\n",
    "**The Minsky-Rosenblatt Controversy**\n",
    "\n",
    "The story of the XOR problem is inseparable from the rivalry between **Marvin Minsky** and **Frank Rosenblatt**. Both were at Cornell in the late 1950s. Rosenblatt was the optimistic champion of the perceptron, making bold claims about its potential (some justified, some not). Minsky was the skeptic, who saw fundamental limitations that Rosenblatt's enthusiasm glossed over.\n",
    "\n",
    "Minsky's 1969 book *Perceptrons*, co-authored with Seymour Papert, was in part a response to what Minsky perceived as overblown hype. The XOR impossibility proof was the book's centerpiece. Rosenblatt died in a boating accident in 1971, at age 43, before he could respond to or outlive the criticism. The field he pioneered would not recover until long after his death.\n",
    "\n",
    "The tragedy is that Minsky and Papert were **mathematically correct** but **strategically devastating**. Their theorems applied only to single-layer perceptrons, yet the broader community interpreted them as condemning all neural networks.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-1",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.patches import Polygon\n",
    "from matplotlib.collections import PatchCollection\n",
    "\n",
    "# Set consistent style\n",
    "plt.rcParams.update({\n",
    "    'figure.figsize': (8, 6),\n",
    "    'font.size': 12,\n",
    "    'axes.grid': True,\n",
    "    'grid.alpha': 0.3\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": [
    "## 1. XOR Definition\n",
    "\n",
    "The **exclusive-or** function returns 1 when exactly one of its inputs is 1, and 0 otherwise.\n",
    "\n",
    "### Truth Table\n",
    "\n",
    "| $x_1$ | $x_2$ | $x_1 \\oplus x_2$ |\n",
    "|:-----:|:-----:|:-----------------:|\n",
    "| 0     | 0     | 0                 |\n",
    "| 0     | 1     | 1                 |\n",
    "| 1     | 0     | 1                 |\n",
    "| 1     | 1     | 0                 |\n",
    "\n",
    "### Algebraic Identities\n",
    "\n",
    "XOR can be expressed algebraically in several equivalent ways:\n",
    "\n",
    "$$x_1 \\oplus x_2 = x_1 + x_2 - 2x_1 x_2$$\n",
    "\n",
    "This is because:\n",
    "- When both are 0: $0 + 0 - 0 = 0$\n",
    "- When one is 1: $1 + 0 - 0 = 1$ or $0 + 1 - 0 = 1$\n",
    "- When both are 1: $1 + 1 - 2 = 0$\n",
    "\n",
    "Equivalently:\n",
    "\n",
    "$$x_1 \\oplus x_2 = (x_1 - x_2)^2$$\n",
    "\n",
    "since $(x_1 - x_2)^2 = x_1^2 - 2x_1 x_2 + x_2^2 = x_1 - 2x_1 x_2 + x_2$ for binary inputs (where $x^2 = x$)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3",
   "metadata": {},
   "source": [
    "## 2. Proof 1: Algebraic Proof of Non-Separability\n",
    "\n",
    "```{admonition} Theorem (XOR Impossibility)\n",
    ":class: important\n",
    "\n",
    "No single perceptron (with step activation) can compute XOR. That is, there exist no weights $w_1, w_2$ and bias $b$ such that\n",
    "\n",
    "$$f(x_1, x_2) = \\text{step}(w_1 x_1 + w_2 x_2 + b)$$\n",
    "\n",
    "computes the XOR function.\n",
    "```\n",
    "\n",
    "**Proof.** Suppose for contradiction that there exist weights $w_1, w_2$ and bias $b$ such that\n",
    "\n",
    "$$f(x_1, x_2) = \\text{step}(w_1 x_1 + w_2 x_2 + b)$$\n",
    "\n",
    "computes XOR, where $\\text{step}(z) = 1$ if $z \\geq 0$ and $\\text{step}(z) = 0$ if $z < 0$.\n",
    "\n",
    "From the truth table, we require:\n",
    "\n",
    "| Input | Output | Condition |\n",
    "|-------|--------|----------|\n",
    "| $(0,0) \\to 0$ | $w_1(0) + w_2(0) + b < 0$ | $b < 0$ ... (i) |\n",
    "| $(0,1) \\to 1$ | $w_1(0) + w_2(1) + b \\geq 0$ | $w_2 + b \\geq 0$ ... (ii) |\n",
    "| $(1,0) \\to 1$ | $w_1(1) + w_2(0) + b \\geq 0$ | $w_1 + b \\geq 0$ ... (iii) |\n",
    "| $(1,1) \\to 0$ | $w_1(1) + w_2(1) + b < 0$ | $w_1 + w_2 + b < 0$ ... (iv) |\n",
    "\n",
    "Adding inequalities (ii) and (iii):\n",
    "\n",
    "$$(w_2 + b) + (w_1 + b) \\geq 0$$\n",
    "\n",
    "$$w_1 + w_2 + 2b \\geq 0$$\n",
    "\n",
    "$$w_1 + w_2 \\geq -2b \\quad \\text{...(v)}$$\n",
    "\n",
    "From inequality (iv):\n",
    "\n",
    "$$w_1 + w_2 < -b \\quad \\text{...(vi)}$$\n",
    "\n",
    "Combining (v) and (vi):\n",
    "\n",
    "$$-2b \\leq w_1 + w_2 < -b$$\n",
    "\n",
    "This requires $-2b < -b$, which simplifies to:\n",
    "\n",
    "$$b > 0$$\n",
    "\n",
    "But this **contradicts** inequality (i), which requires $b < 0$.\n",
    "\n",
    "$$\\boxed{\\text{Contradiction. Therefore no such } w_1, w_2, b \\text{ exist. } \\blacksquare}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": [
    "## 3. Proof 2: Geometric Proof of Non-Separability\n",
    "\n",
    "```{admonition} Proof (Geometric Argument)\n",
    ":class: important\n",
    "\n",
    "**Theorem.** The XOR dataset is not linearly separable.\n",
    "\n",
    "**Proof.** Consider the two classes:\n",
    "- Class 0 (output 0): $C_0 = \\{(0,0), (1,1)\\}$\n",
    "- Class 1 (output 1): $C_1 = \\{(0,1), (1,0)\\}$\n",
    "\n",
    "The **convex hull** of $C_0$ is the line segment from $(0,0)$ to $(1,1)$:\n",
    "\n",
    "$$\\text{conv}(C_0) = \\{\\lambda(0,0) + (1-\\lambda)(1,1) : \\lambda \\in [0,1]\\} = \\{(t, t) : t \\in [0,1]\\}$$\n",
    "\n",
    "The **convex hull** of $C_1$ is the line segment from $(0,1)$ to $(1,0)$:\n",
    "\n",
    "$$\\text{conv}(C_1) = \\{\\lambda(0,1) + (1-\\lambda)(1,0) : \\lambda \\in [0,1]\\} = \\{(1-s, s) : s \\in [0,1]\\}$$\n",
    "\n",
    "These two segments **intersect** at $(0.5, 0.5)$ (set $t = 0.5$ and $s = 0.5$).\n",
    "\n",
    "By the **Separating Hyperplane Theorem**, two sets are linearly separable if and only if their convex hulls are disjoint. Since $\\text{conv}(C_0) \\cap \\text{conv}(C_1) = \\{(0.5, 0.5)\\} \\neq \\emptyset$, the XOR dataset is not linearly separable. $\\blacksquare$\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Geometric proof: Visualize XOR points and convex hulls\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(8, 8))\n",
    "\n",
    "# XOR data points\n",
    "class0_x = np.array([[0, 0], [1, 1]])  # output 0\n",
    "class1_x = np.array([[0, 1], [1, 0]])  # output 1\n",
    "\n",
    "# Draw convex hulls (line segments in this case)\n",
    "# Class 0 hull: line from (0,0) to (1,1) - diagonal\n",
    "t = np.linspace(0, 1, 100)\n",
    "ax.plot(t, t, 'b-', linewidth=3, alpha=0.4, label='conv(Class 0)')\n",
    "ax.fill_between(t, t - 0.02, t + 0.02, alpha=0.15, color='blue')\n",
    "\n",
    "# Class 1 hull: line from (0,1) to (1,0) - anti-diagonal  \n",
    "ax.plot(t, 1 - t, 'r-', linewidth=3, alpha=0.4, label='conv(Class 1)')\n",
    "ax.fill_between(t, 1 - t - 0.02, 1 - t + 0.02, alpha=0.15, color='red')\n",
    "\n",
    "# Plot the intersection point\n",
    "ax.plot(0.5, 0.5, 'k*', markersize=20, zorder=10, label='Intersection (0.5, 0.5)')\n",
    "\n",
    "# Plot data points\n",
    "ax.scatter(class0_x[:, 0], class0_x[:, 1], c='blue', s=200, zorder=5,\n",
    "           edgecolors='black', linewidths=2, marker='o', label='Class 0: XOR=0')\n",
    "ax.scatter(class1_x[:, 0], class1_x[:, 1], c='red', s=200, zorder=5,\n",
    "           edgecolors='black', linewidths=2, marker='s', label='Class 1: XOR=1')\n",
    "\n",
    "# Annotate points\n",
    "offsets = {(0, 0): (-0.15, -0.1), (1, 1): (0.05, 0.05),\n",
    "           (0, 1): (-0.15, 0.05), (1, 0): (0.05, -0.1)}\n",
    "for pt, label in [((0, 0), '(0,0)\\u21920'), ((1, 1), '(1,1)\\u21920'),\n",
    "                   ((0, 1), '(0,1)\\u21921'), ((1, 0), '(1,0)\\u21921')]:\n",
    "    ox, oy = offsets[pt]\n",
    "    ax.annotate(label, pt, xytext=(pt[0]+ox, pt[1]+oy), fontsize=13, fontweight='bold')\n",
    "\n",
    "ax.set_xlim(-0.3, 1.3)\n",
    "ax.set_ylim(-0.3, 1.3)\n",
    "ax.set_xlabel('$x_1$', fontsize=14)\n",
    "ax.set_ylabel('$x_2$', fontsize=14)\n",
    "ax.set_title('XOR: Convex Hulls Intersect \\u2192 Not Linearly Separable', fontsize=14)\n",
    "ax.legend(fontsize=11, loc='upper right')\n",
    "ax.set_aspect('equal')\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-5b",
   "metadata": {},
   "source": [
    "## 3a. AND vs XOR: Side-by-Side Decision Boundary Comparison\n",
    "\n",
    "To build intuition for why XOR fails where AND succeeds, let us place the two problems side by side. For AND, a single line cleanly separates the classes. For XOR, no line can do the job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5c",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_and = np.array([0, 0, 0, 1])\n",
    "y_xor = np.array([0, 1, 1, 0])\n",
    "\n",
    "for ax, y_vals, title, subtitle in [\n",
    "    (axes[0], y_and, 'AND Function', 'Linearly Separable'),\n",
    "    (axes[1], y_xor, 'XOR Function', 'NOT Linearly Separable')\n",
    "]:\n",
    "    # Decision region background\n",
    "    xx, yy = np.meshgrid(np.linspace(-0.5, 1.5, 300), np.linspace(-0.5, 1.5, 300))\n",
    "    \n",
    "    if 'AND' in title:\n",
    "        # AND: w1=1, w2=1, b=-1.5 => separating line: x1 + x2 = 1.5\n",
    "        Z = (xx + yy >= 1.5).astype(float)\n",
    "        ax.contourf(xx, yy, Z, levels=[-0.5, 0.5, 1.5], colors=['#BBDDFF', '#FFBBBB'], alpha=0.4)\n",
    "        ax.contour(xx, yy, Z, levels=[0.5], colors=['green'], linewidths=3)\n",
    "        ax.text(0.2, 1.35, 'Separating line: $x_1 + x_2 = 1.5$', fontsize=11, color='green', fontweight='bold')\n",
    "    else:\n",
    "        # XOR: show several failed attempts\n",
    "        x_line = np.linspace(-0.5, 1.5, 100)\n",
    "        for slope, intercept, ls in [(-1, 0.5, '--'), (-1, 1.5, '-.'), (-0.5, 0.8, ':')]:\n",
    "            ax.plot(x_line, slope * x_line + intercept, color='gray', linestyle=ls,\n",
    "                    linewidth=2, alpha=0.5)\n",
    "        ax.text(0.05, 1.35, 'No single line works!', fontsize=12, color='red', fontweight='bold')\n",
    "\n",
    "    # Plot data points\n",
    "    for i in range(4):\n",
    "        color = 'red' if y_vals[i] == 1 else 'blue'\n",
    "        marker = 's' if y_vals[i] == 1 else 'o'\n",
    "        ax.scatter(X[i, 0], X[i, 1], c=color, s=300, marker=marker,\n",
    "                   edgecolors='black', linewidths=2, zorder=5)\n",
    "        ax.annotate(f'({X[i,0]},{X[i,1]})\\u2192{y_vals[i]}',\n",
    "                    (X[i, 0], X[i, 1]),\n",
    "                    xytext=(X[i, 0] + 0.08, X[i, 1] - 0.12),\n",
    "                    fontsize=11, fontweight='bold')\n",
    "    \n",
    "    ax.set_xlim(-0.5, 1.5)\n",
    "    ax.set_ylim(-0.5, 1.5)\n",
    "    ax.set_xlabel('$x_1$', fontsize=14)\n",
    "    ax.set_ylabel('$x_2$', fontsize=14)\n",
    "    ax.set_title(f'{title}\\n({subtitle})', fontsize=14)\n",
    "    ax.set_aspect('equal')\n",
    "    ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.suptitle('AND vs XOR: Why One Line Is Enough for AND but Not for XOR',\n",
    "             fontsize=15, y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-5d",
   "metadata": {},
   "source": [
    "## 3b. Convex Hull Intersection: Why XOR Fails\n",
    "\n",
    "The convex hull criterion provides the definitive geometric test. Below we show the convex hulls for AND (disjoint) and XOR (intersecting) side by side, making the obstruction visually obvious."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5e",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.patches import Polygon as MplPolygon\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_and = np.array([0, 0, 0, 1])\n",
    "y_xor = np.array([0, 1, 1, 0])\n",
    "\n",
    "for ax, y_vals, title in [\n",
    "    (axes[0], y_and, 'AND: Convex Hulls Are DISJOINT'),\n",
    "    (axes[1], y_xor, 'XOR: Convex Hulls INTERSECT')\n",
    "]:\n",
    "    class0 = X[y_vals == 0]\n",
    "    class1 = X[y_vals == 1]\n",
    "\n",
    "    # Draw convex hulls\n",
    "    if len(class0) >= 3:\n",
    "        # Compute simple convex hull ordering for polygon\n",
    "        from itertools import combinations\n",
    "        centroid = class0.mean(axis=0)\n",
    "        angles = np.arctan2(class0[:, 1] - centroid[1], class0[:, 0] - centroid[0])\n",
    "        order = np.argsort(angles)\n",
    "        hull_pts = class0[order]\n",
    "        poly = MplPolygon(hull_pts, alpha=0.2, color='blue', edgecolor='blue', linewidth=2)\n",
    "        ax.add_patch(poly)\n",
    "    elif len(class0) == 2:\n",
    "        ax.plot(class0[:, 0], class0[:, 1], 'b-', linewidth=3, alpha=0.4)\n",
    "    \n",
    "    if len(class1) >= 3:\n",
    "        centroid = class1.mean(axis=0)\n",
    "        angles = np.arctan2(class1[:, 1] - centroid[1], class1[:, 0] - centroid[0])\n",
    "        order = np.argsort(angles)\n",
    "        hull_pts = class1[order]\n",
    "        poly = MplPolygon(hull_pts, alpha=0.2, color='red', edgecolor='red', linewidth=2)\n",
    "        ax.add_patch(poly)\n",
    "    elif len(class1) == 2:\n",
    "        ax.plot(class1[:, 0], class1[:, 1], 'r-', linewidth=3, alpha=0.4)\n",
    "    elif len(class1) == 1:\n",
    "        pass  # Single point, hull is just the point\n",
    "    \n",
    "    # Check intersection for XOR case\n",
    "    if 'XOR' in title:\n",
    "        ax.plot(0.5, 0.5, 'k*', markersize=20, zorder=10, label='Intersection')\n",
    "        ax.annotate('Hulls intersect\\nat (0.5, 0.5)!', (0.5, 0.5),\n",
    "                    xytext=(0.6, 0.2), fontsize=12, color='black', fontweight='bold',\n",
    "                    arrowprops=dict(arrowstyle='->', color='black', linewidth=2))\n",
    "    else:\n",
    "        ax.annotate('Hulls are\\ndisjoint!', (0.5, 0.5),\n",
    "                    xytext=(0.4, 0.5), fontsize=12, color='green', fontweight='bold')\n",
    "    \n",
    "    # Plot data points\n",
    "    ax.scatter(class0[:, 0], class0[:, 1], c='blue', s=250, zorder=5,\n",
    "               edgecolors='black', linewidths=2, marker='o', label='Class 0')\n",
    "    ax.scatter(class1[:, 0], class1[:, 1], c='red', s=250, zorder=5,\n",
    "               edgecolors='black', linewidths=2, marker='s', label='Class 1')\n",
    "    \n",
    "    # Annotate points\n",
    "    for i in range(4):\n",
    "        ax.annotate(f'({X[i,0]},{X[i,1]})\\u2192{y_vals[i]}',\n",
    "                    (X[i, 0], X[i, 1]),\n",
    "                    xytext=(X[i, 0] + 0.07, X[i, 1] + 0.07),\n",
    "                    fontsize=10, fontweight='bold')\n",
    "    \n",
    "    ax.set_xlim(-0.3, 1.4)\n",
    "    ax.set_ylim(-0.3, 1.4)\n",
    "    ax.set_xlabel('$x_1$', fontsize=14)\n",
    "    ax.set_ylabel('$x_2$', fontsize=14)\n",
    "    sep_status = 'Separable' if 'AND' in title else 'NOT Separable'\n",
    "    ax.set_title(f'{title}\\n\\u2192 {sep_status}', fontsize=13)\n",
    "    ax.legend(fontsize=10, loc='upper right')\n",
    "    ax.set_aspect('equal')\n",
    "    ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.suptitle('Convex Hull Criterion: Disjoint Hulls \\u21D4 Linearly Separable',\n",
    "             fontsize=15, y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-6",
   "metadata": {},
   "source": [
    "## 4. The Perceptron Trying to Learn XOR\n",
    "\n",
    "Let us now watch a perceptron *attempt* to learn XOR using the classical perceptron learning rule. We know from the algebraic proof that convergence is impossible. The perceptron convergence theorem guarantees convergence only when the data is linearly separable. For XOR, the weights will cycle endlessly, and the number of errors per epoch will oscillate without ever reaching zero."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-7",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Perceptron trying (and failing) to learn XOR\n",
    "\n",
    "# XOR dataset\n",
    "X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y = np.array([0, 1, 1, 0])\n",
    "\n",
    "def step(z):\n",
    "    return (z >= 0).astype(int)\n",
    "\n",
    "# Perceptron learning\n",
    "np.random.seed(42)\n",
    "w = np.random.randn(2) * 0.5\n",
    "b = 0.0\n",
    "lr = 0.1\n",
    "n_epochs = 100\n",
    "\n",
    "errors_per_epoch = []\n",
    "weight_history = [(w.copy(), b)]\n",
    "\n",
    "for epoch in range(n_epochs):\n",
    "    errors = 0\n",
    "    for i in range(len(X)):\n",
    "        z = np.dot(w, X[i]) + b\n",
    "        y_pred = step(z)\n",
    "        error = y[i] - y_pred\n",
    "        if error != 0:\n",
    "            w = w + lr * error * X[i]\n",
    "            b = b + lr * error\n",
    "            errors += 1\n",
    "    errors_per_epoch.append(errors)\n",
    "    weight_history.append((w.copy(), b))\n",
    "\n",
    "print(f\"Final weights: w = {w}, b = {b:.4f}\")\n",
    "print(f\"Errors in last 10 epochs: {errors_per_epoch[-10:]}\")\n",
    "print(f\"Total epochs with zero errors: {sum(1 for e in errors_per_epoch if e == 0)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Visualize the perceptron's failure\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
    "\n",
    "# Plot 1: Errors per epoch (oscillating)\n",
    "ax1 = axes[0]\n",
    "ax1.plot(range(1, n_epochs + 1), errors_per_epoch, 'r-o', markersize=3, linewidth=1)\n",
    "ax1.set_xlabel('Epoch', fontsize=12)\n",
    "ax1.set_ylabel('Number of Errors', fontsize=12)\n",
    "ax1.set_title('Perceptron on XOR: Errors Never Reach Zero', fontsize=13)\n",
    "ax1.set_ylim(-0.5, max(errors_per_epoch) + 0.5)\n",
    "ax1.axhline(y=0, color='green', linestyle='--', alpha=0.7, label='Target: 0 errors')\n",
    "ax1.legend(fontsize=11)\n",
    "ax1.grid(True, alpha=0.3)\n",
    "\n",
    "# Plot 2: Weight trajectory\n",
    "ax2 = axes[1]\n",
    "w1_hist = [wh[0][0] for wh in weight_history]\n",
    "w2_hist = [wh[0][1] for wh in weight_history]\n",
    "b_hist = [wh[1] for wh in weight_history]\n",
    "\n",
    "ax2.plot(w1_hist, w2_hist, 'b-', alpha=0.5, linewidth=0.8)\n",
    "ax2.plot(w1_hist[0], w2_hist[0], 'go', markersize=12, label='Start', zorder=5)\n",
    "ax2.plot(w1_hist[-1], w2_hist[-1], 'rs', markersize=12, label='End', zorder=5)\n",
    "ax2.set_xlabel('$w_1$', fontsize=14)\n",
    "ax2.set_ylabel('$w_2$', fontsize=14)\n",
    "ax2.set_title('Weight Trajectory: Cycling, Never Settling', fontsize=13)\n",
    "ax2.legend(fontsize=11)\n",
    "ax2.grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-9",
   "metadata": {},
   "source": [
    "As predicted by the theory, the perceptron **never converges**. The error count oscillates between 1 and 4 across epochs, and the weight vector cycles through different configurations without settling. This is not a failure of the learning rate or initialization --- it is a fundamental impossibility."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10",
   "metadata": {},
   "source": "## 5. Why XOR Matters\n\nThe XOR problem is far more than an academic exercise. It occupies a central place in the theory and history of neural networks for several interconnected reasons:\n\n### Simplest Non-Linearly-Separable Function\nAmong all Boolean functions of two variables, XOR is the **simplest** that cannot be computed by a single perceptron. The 16 Boolean functions of 2 inputs include AND, OR, NAND, NOR (all linearly separable) and XOR, XNOR (not separable). XOR uses the minimum number of inputs and data points to exhibit the limitation.\n\n### Functional Completeness\nThe set $\\{\\text{XOR}, \\text{AND}, 1\\}$ is **functionally complete**: with a constant 1 available, XOR and AND generate every Boolean function via algebraic normal form. This means that the inability to compute XOR is not a minor gap --- it locks out an entire class of computations.\n\n### Parity Generalization\nXOR is $\\text{PARITY}(2)$: it outputs 1 when an odd number of inputs are 1. The $n$-bit parity function\n\n$$\\text{PARITY}(x_1, \\ldots, x_n) = x_1 \\oplus x_2 \\oplus \\cdots \\oplus x_n$$\n\nis equally impossible for a single perceptron, and Minsky and Papert proved this requires examining **all** input bits simultaneously (order $n$).\n\n### Pedagogical Clarity\nWith only 4 data points in 2 dimensions, the XOR problem can be fully visualized. The geometric intuition --- two classes whose convex hulls intersect --- is immediate and memorable."
  },
  {
   "cell_type": "markdown",
   "id": "cell-11",
   "metadata": {},
   "source": "## 6. The Solution: Adding a Hidden Layer\n\nThe key insight is that XOR can be decomposed into operations that **are** individually linearly separable:\n\n$$\\text{XOR}(x_1, x_2) = \\text{OR}(x_1, x_2) \\;\\text{AND}\\; \\text{NAND}(x_1, x_2)$$\n\nThis is because:\n- OR returns 1 when at least one input is 1.\n- NAND returns 0 only when both inputs are 1.\n- Their AND returns 1 exactly when at least one input is 1 but not both --- which is XOR.\n\n### Network Architecture\n\n```\nInput Layer      Hidden Layer       Output Layer\n                  ┌─────┐\n  x₁ ─────────── │ OR  │──────┐\n       ╲         └─────┘      │   ┌─────┐\n        ╲                     ├───│ AND │──── y\n         ╲       ┌──────┐    │   └─────┘\n  x₂ ────╲───── │ NAND │────┘\n           ╲     └──────┘\n            ╲─────────────────\n```\n\n### Exact Weights\n\n- **Hidden neuron 1 (OR):** $h_1 = \\text{step}(x_1 + x_2 - 0.5)$\n  - Fires when $x_1 + x_2 \\geq 0.5$, i.e., when at least one input is 1.\n\n- **Hidden neuron 2 (NAND):** $h_2 = \\text{step}(-x_1 - x_2 + 1.5)$\n  - Fires when $-x_1 - x_2 + 1.5 \\geq 0$, i.e., when $x_1 + x_2 \\leq 1.5$, i.e., NOT both 1.\n\n- **Output neuron (AND):** $y = \\text{step}(h_1 + h_2 - 1.5)$\n  - Fires when $h_1 + h_2 \\geq 1.5$, i.e., when both $h_1 = 1$ and $h_2 = 1$."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-12",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Implementation: XOR via OR + NAND + AND\n",
    "\n",
    "def step(z):\n",
    "    return (z >= 0).astype(int)\n",
    "\n",
    "# XOR dataset\n",
    "X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_true = np.array([0, 1, 1, 0])\n",
    "\n",
    "# Construction 1: OR + NAND + AND\n",
    "print(\"Construction 1: XOR = OR AND NAND\")\n",
    "print(\"=\" * 55)\n",
    "print(f\"{'x1':>3} {'x2':>3} | {'h1(OR)':>7} {'h2(NAND)':>9} | {'y(AND)':>7} {'target':>7}\")\n",
    "print(\"-\" * 55)\n",
    "\n",
    "for i in range(4):\n",
    "    x1, x2 = X[i]\n",
    "    # Hidden layer\n",
    "    h1 = step(np.array([x1 + x2 - 0.5]))  # OR\n",
    "    h2 = step(np.array([-x1 - x2 + 1.5]))  # NAND\n",
    "    # Output layer\n",
    "    y_pred = step(np.array([h1[0] + h2[0] - 1.5]))  # AND\n",
    "    \n",
    "    match = '  OK' if y_pred[0] == y_true[i] else '  FAIL'\n",
    "    print(f\"{x1:>3.0f} {x2:>3.0f} | {h1[0]:>7} {h2[0]:>9} | {y_pred[0]:>7} {y_true[i]:>7}{match}\")\n",
    "\n",
    "print(\"\\nAll outputs match! The 2-layer network correctly computes XOR.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-13",
   "metadata": {},
   "source": [
    "## 7. Hidden Space Transformation\n",
    "\n",
    "The hidden layer performs a **nonlinear coordinate transformation** that maps the original 2D input space to a new 2D hidden space where the classes become linearly separable.\n",
    "\n",
    "The mapping $(x_1, x_2) \\mapsto (h_1, h_2)$ is:\n",
    "\n",
    "| $(x_1, x_2)$ | $h_1 = \\text{OR}$ | $h_2 = \\text{NAND}$ | $(h_1, h_2)$ | XOR |\n",
    "|:-------------:|:---------:|:----------:|:------------:|:---:|\n",
    "| $(0, 0)$      | 0         | 1          | $(0, 1)$     | 0   |\n",
    "| $(0, 1)$      | 1         | 1          | $(1, 1)$     | 1   |\n",
    "| $(1, 0)$      | 1         | 1          | $(1, 1)$     | 1   |\n",
    "| $(1, 1)$      | 1         | 0          | $(1, 0)$     | 0   |\n",
    "\n",
    "Notice that in hidden space:\n",
    "- Class 0 maps to $\\{(0,1), (1,0)\\}$\n",
    "- Class 1 maps to $\\{(1,1), (1,1)\\}$ (both inputs map to the same point!)\n",
    "\n",
    "These are trivially linearly separable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-14",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Side-by-side: original space vs. hidden space\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "# Original space\n",
    "ax1 = axes[0]\n",
    "X_class0 = X[y_true == 0]\n",
    "X_class1 = X[y_true == 1]\n",
    "\n",
    "ax1.scatter(X_class0[:, 0], X_class0[:, 1], c='blue', s=250, zorder=5,\n",
    "            edgecolors='black', linewidths=2, marker='o', label='XOR = 0')\n",
    "ax1.scatter(X_class1[:, 0], X_class1[:, 1], c='red', s=250, zorder=5,\n",
    "            edgecolors='black', linewidths=2, marker='s', label='XOR = 1')\n",
    "\n",
    "# Draw convex hulls\n",
    "ax1.plot([0, 1], [0, 1], 'b--', alpha=0.4, linewidth=2)\n",
    "ax1.plot([0, 1], [1, 0], 'r--', alpha=0.4, linewidth=2)\n",
    "ax1.plot(0.5, 0.5, 'k*', markersize=15, zorder=10)\n",
    "\n",
    "for pt, label in zip(X, ['(0,0)\\u21920', '(0,1)\\u21921', '(1,0)\\u21921', '(1,1)\\u21920']):\n",
    "    ax1.annotate(label, pt, xytext=(pt[0]+0.05, pt[1]+0.07), fontsize=11)\n",
    "\n",
    "ax1.set_xlim(-0.3, 1.4)\n",
    "ax1.set_ylim(-0.3, 1.4)\n",
    "ax1.set_xlabel('$x_1$', fontsize=14)\n",
    "ax1.set_ylabel('$x_2$', fontsize=14)\n",
    "ax1.set_title('Original Space: NOT Separable', fontsize=14)\n",
    "ax1.legend(fontsize=11)\n",
    "ax1.set_aspect('equal')\n",
    "ax1.grid(True, alpha=0.3)\n",
    "\n",
    "# Hidden space\n",
    "ax2 = axes[1]\n",
    "\n",
    "# Compute hidden representations\n",
    "H = np.zeros((4, 2))\n",
    "for i in range(4):\n",
    "    x1, x2 = X[i]\n",
    "    H[i, 0] = int(x1 + x2 - 0.5 >= 0)   # OR\n",
    "    H[i, 1] = int(-x1 - x2 + 1.5 >= 0)   # NAND\n",
    "\n",
    "H_class0 = H[y_true == 0]\n",
    "H_class1 = H[y_true == 1]\n",
    "\n",
    "ax2.scatter(H_class0[:, 0], H_class0[:, 1], c='blue', s=250, zorder=5,\n",
    "            edgecolors='black', linewidths=2, marker='o', label='XOR = 0')\n",
    "ax2.scatter(H_class1[:, 0], H_class1[:, 1], c='red', s=350, zorder=5,\n",
    "            edgecolors='black', linewidths=2, marker='s', label='XOR = 1 (both map here)')\n",
    "\n",
    "# Draw a separating line in hidden space: h1 + h2 = 1.5\n",
    "h1_line = np.linspace(-0.3, 1.5, 100)\n",
    "h2_line = 1.5 - h1_line\n",
    "ax2.plot(h1_line, h2_line, 'g-', linewidth=3, label='Separating line: $h_1+h_2=1.5$')\n",
    "\n",
    "# Annotate hidden points\n",
    "ax2.annotate('(0,0)\\u2192(0,1)', (0, 1), xytext=(-0.35, 1.1), fontsize=10)\n",
    "ax2.annotate('(1,1)\\u2192(1,0)', (1, 0), xytext=(1.05, 0.1), fontsize=10)\n",
    "ax2.annotate('(0,1),(1,0)\\u2192(1,1)', (1, 1), xytext=(0.55, 1.1), fontsize=10)\n",
    "\n",
    "ax2.set_xlim(-0.5, 1.6)\n",
    "ax2.set_ylim(-0.5, 1.6)\n",
    "ax2.set_xlabel('$h_1$ (OR)', fontsize=14)\n",
    "ax2.set_ylabel('$h_2$ (NAND)', fontsize=14)\n",
    "ax2.set_title('Hidden Space: Separable!', fontsize=14)\n",
    "ax2.legend(fontsize=10, loc='lower left')\n",
    "ax2.set_aspect('equal')\n",
    "ax2.grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-15",
   "metadata": {},
   "source": [
    "## 8. Alternative Construction: The \"Band\" Approach\n",
    "\n",
    "There is a second elegant way to solve XOR with 2 hidden neurons, using a different geometric idea.\n",
    "\n",
    "### Construction 2: Dual Threshold\n",
    "\n",
    "- **Hidden neuron 1:** $h_1 = \\text{step}(x_1 + x_2 - 0.5)$\n",
    "  - Fires when $x_1 + x_2 \\geq 0.5$ (i.e., when the sum is at least 1)\n",
    "\n",
    "- **Hidden neuron 2:** $h_2 = \\text{step}(x_1 + x_2 - 1.5)$\n",
    "  - Fires when $x_1 + x_2 \\geq 1.5$ (i.e., when both inputs are 1)\n",
    "\n",
    "- **Output neuron:** $y = \\text{step}(h_1 - h_2 - 0.5)$\n",
    "  - Fires when $h_1 = 1$ and $h_2 = 0$ (exactly in the \"band\" between the two thresholds)\n",
    "\n",
    "### Geometric Interpretation\n",
    "\n",
    "The two hidden neurons define two **parallel lines** in input space:\n",
    "- Line 1: $x_1 + x_2 = 0.5$\n",
    "- Line 2: $x_1 + x_2 = 1.5$\n",
    "\n",
    "The output neuron selects the **band** between these two lines: it fires for points where $0.5 \\leq x_1 + x_2 < 1.5$. This band contains exactly the XOR=1 points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-16",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Alternative Construction: Band approach\n",
    "\n",
    "print(\"Construction 2: Dual Threshold (Band)\")\n",
    "print(\"=\" * 60)\n",
    "print(f\"{'x1':>3} {'x2':>3} | {'sum':>4} {'h1(>=0.5)':>10} {'h2(>=1.5)':>10} | {'y':>3} {'target':>7}\")\n",
    "print(\"-\" * 60)\n",
    "\n",
    "for i in range(4):\n",
    "    x1, x2 = X[i]\n",
    "    s = x1 + x2\n",
    "    h1 = int(s >= 0.5)\n",
    "    h2 = int(s >= 1.5)\n",
    "    y_pred = int(h1 - h2 - 0.5 >= 0)\n",
    "    match = 'OK' if y_pred == y_true[i] else 'FAIL'\n",
    "    print(f\"{x1:>3.0f} {x2:>3.0f} | {s:>4.1f} {h1:>10} {h2:>10} | {y_pred:>3} {y_true[i]:>7}  {match}\")\n",
    "\n",
    "print(\"\\nAll outputs match!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-17",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Visualize the \"band\" construction\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(8, 8))\n",
    "\n",
    "# Draw the two parallel decision boundaries\n",
    "x_range = np.linspace(-0.3, 1.5, 200)\n",
    "\n",
    "# Line 1: x1 + x2 = 0.5\n",
    "ax.plot(x_range, 0.5 - x_range, 'g-', linewidth=2.5, label='$x_1 + x_2 = 0.5$')\n",
    "# Line 2: x1 + x2 = 1.5\n",
    "ax.plot(x_range, 1.5 - x_range, 'purple', linewidth=2.5, linestyle='-', label='$x_1 + x_2 = 1.5$')\n",
    "\n",
    "# Shade the band between the two lines\n",
    "x_fill = np.linspace(-0.3, 1.5, 200)\n",
    "y_lower = np.clip(0.5 - x_fill, -0.5, 1.5)\n",
    "y_upper = np.clip(1.5 - x_fill, -0.5, 1.5)\n",
    "ax.fill_between(x_fill, y_lower, y_upper, alpha=0.15, color='orange', label='Band: XOR = 1 region')\n",
    "\n",
    "# Plot data points\n",
    "X_class0 = X[y_true == 0]\n",
    "X_class1 = X[y_true == 1]\n",
    "ax.scatter(X_class0[:, 0], X_class0[:, 1], c='blue', s=250, zorder=5,\n",
    "           edgecolors='black', linewidths=2, marker='o', label='XOR = 0')\n",
    "ax.scatter(X_class1[:, 0], X_class1[:, 1], c='red', s=250, zorder=5,\n",
    "           edgecolors='black', linewidths=2, marker='s', label='XOR = 1')\n",
    "\n",
    "# Annotations\n",
    "ax.annotate('sum < 0.5', (0, 0), xytext=(-0.25, -0.2), fontsize=12, color='blue')\n",
    "ax.annotate('0.5 \\u2264 sum < 1.5', (0.5, 0.5), xytext=(0.15, 0.3), fontsize=12, color='red')\n",
    "ax.annotate('sum \\u2265 1.5', (1, 1), xytext=(1.05, 1.1), fontsize=12, color='blue')\n",
    "\n",
    "ax.set_xlim(-0.5, 1.5)\n",
    "ax.set_ylim(-0.5, 1.5)\n",
    "ax.set_xlabel('$x_1$', fontsize=14)\n",
    "ax.set_ylabel('$x_2$', fontsize=14)\n",
    "ax.set_title('XOR via Band: Two Parallel Decision Boundaries', fontsize=14)\n",
    "ax.legend(fontsize=11, loc='upper right')\n",
    "ax.set_aspect('equal')\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-18",
   "metadata": {},
   "source": [
    "## 9. Exercises\n",
    "\n",
    "### Exercise 8.1: XNOR with a 2-Layer Network\n",
    "\n",
    "The XNOR function is the complement of XOR:\n",
    "\n",
    "| $x_1$ | $x_2$ | XNOR |\n",
    "|:-----:|:-----:|:----:|\n",
    "| 0 | 0 | 1 |\n",
    "| 0 | 1 | 0 |\n",
    "| 1 | 0 | 0 |\n",
    "| 1 | 1 | 1 |\n",
    "\n",
    "Design a 2-layer network (2 hidden neurons + 1 output neuron) that computes XNOR. Give the exact weights and biases, and verify with a truth table.\n",
    "\n",
    "*Hint:* XNOR = NOT(XOR). How can you negate the output of the XOR network?\n",
    "\n",
    "\n",
    "### Exercise 8.2: Can You Solve XOR with 1 Hidden Neuron?\n",
    "\n",
    "Consider a network with 2 inputs, 1 hidden neuron (with step activation), and 1 output neuron (with step activation). Can this network compute XOR?\n",
    "\n",
    "**Prove** that it cannot. \n",
    "\n",
    "*Hint:* A single hidden neuron maps $\\{0,1\\}^2$ to $\\{0,1\\}$, producing only a 1-dimensional hidden representation. The output neuron must then separate two classes in 1D. What does the hidden neuron compute?\n",
    "\n",
    "\n",
    "### Exercise 8.3: 3-Bit Parity Network\n",
    "\n",
    "Design a network that computes $\\text{PARITY}(x_1, x_2, x_3) = x_1 \\oplus x_2 \\oplus x_3$.\n",
    "\n",
    "The truth table is:\n",
    "\n",
    "| $x_1$ | $x_2$ | $x_3$ | Parity |\n",
    "|:-----:|:-----:|:-----:|:------:|\n",
    "| 0 | 0 | 0 | 0 |\n",
    "| 0 | 0 | 1 | 1 |\n",
    "| 0 | 1 | 0 | 1 |\n",
    "| 0 | 1 | 1 | 0 |\n",
    "| 1 | 0 | 0 | 1 |\n",
    "| 1 | 0 | 1 | 0 |\n",
    "| 1 | 1 | 0 | 0 |\n",
    "| 1 | 1 | 1 | 1 |\n",
    "\n",
    "How many hidden neurons do you need? Can you organize them into layers?\n",
    "\n",
    "*Hint:* Decompose as $\\text{XOR}(\\text{XOR}(x_1, x_2), x_3)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-19",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Skeleton for Exercise 8.3: 3-bit parity\n",
    "# Students should fill in the weights\n",
    "\n",
    "X3 = np.array([[0,0,0], [0,0,1], [0,1,0], [0,1,1],\n",
    "               [1,0,0], [1,0,1], [1,1,0], [1,1,1]])\n",
    "y3 = np.array([0, 1, 1, 0, 1, 0, 0, 1])\n",
    "\n",
    "def parity_network(x):\n",
    "    \"\"\"Compute 3-bit parity using a multi-layer network.\n",
    "    \n",
    "    TODO: Fill in the weights and biases.\n",
    "    Hint: First compute XOR(x1, x2), then XOR(result, x3).\n",
    "    \"\"\"\n",
    "    x1, x2, x3 = x\n",
    "    \n",
    "    # Layer 1: compute intermediate values for XOR(x1, x2)\n",
    "    # h1 = step(???)\n",
    "    # h2 = step(???)\n",
    "    \n",
    "    # Layer 2: combine to get XOR(x1, x2), then prepare for XOR with x3\n",
    "    # ...\n",
    "    \n",
    "    # Output layer\n",
    "    # y = step(???)\n",
    "    \n",
    "    pass  # Replace with implementation\n",
    "\n",
    "print(\"Exercise 8.3: Implement the parity_network function above.\")\n",
    "print(\"Then verify:\")\n",
    "print(\"for i in range(8):\")\n",
    "print(\"    assert parity_network(X3[i]) == y3[i]\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}