{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": [
    "# Chapter 4: Rosenblatt's Perceptron\n",
    "\n",
    "\n",
    "## Part 2: The Perceptron"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-1",
   "metadata": {},
   "source": [
    "## 4.1 Introduction: From Logic to Learning\n",
    "\n",
    "In 1957, **Frank Rosenblatt**, a psychologist at the Cornell Aeronautical Laboratory, proposed a model that would mark a watershed moment in the history of artificial intelligence: the **Perceptron**. While McCulloch and Pitts (1943) had demonstrated that networks of simple binary neurons could, in principle, compute any logical function, their model had a critical limitation: the weights (connections) had to be **designed by hand** for each task. There was no mechanism for the network to *learn* from data.\n",
    "\n",
    "Rosenblatt's key insight was to introduce **adjustable weights** and a **learning rule** that could automatically discover the correct weight settings from labeled examples. This was nothing less than the birth of supervised machine learning.\n",
    "\n",
    "```{note}\n",
    "**Historical Context: Rosenblatt and the Mark I Perceptron**\n",
    "\n",
    "Frank Rosenblatt (1928--1971) was a remarkable polymath -- a psychologist, neuroscientist, and engineer at the Cornell Aeronautical Laboratory. In 1958, he unveiled the **Mark I Perceptron**, a physical machine that could learn to recognize simple visual patterns.\n",
    "\n",
    "The Mark I was not a simulation on a digital computer; it was a custom-built electromechanical device. Its weights were implemented as **motor-driven potentiometers** -- physical resistors whose values were adjusted by small electric motors during learning. The *New York Times* reported it as a machine that could \"perceive, recognize, and identify its surroundings without any human training or control.\"\n",
    "\n",
    "Tragically, Rosenblatt died in a boating accident on his 43rd birthday in 1971, just as the field he helped create was entering its first \"AI winter.\" His contributions were only fully appreciated decades later, when neural networks experienced a renaissance in the 1980s and beyond.\n",
    "```\n",
    "\n",
    "### The Mark I Perceptron Hardware\n",
    "\n",
    "Rosenblatt did not merely propose a mathematical model; he built a physical machine. The **Mark I Perceptron** (1958) was a remarkable piece of hardware:\n",
    "\n",
    "- **Input layer**: An array of **400 cadmium sulfide (CdS) photocells** arranged in a 20x20 grid, serving as a primitive retina that could \"see\" simple images.\n",
    "- **Association layer**: **512 association units** (\"A-units\"), each randomly connected to a subset of the photocells. These connections were fixed and random, inspired by the apparent randomness of neural connectivity in the brain.\n",
    "- **Output layer**: A set of response units (\"R-units\") that produced the final classification.\n",
    "- **Adjustable weights**: Implemented as **motor-driven potentiometers** -- physical resistors whose values could be changed by small electric motors. When the learning algorithm called for a weight increase, a motor would turn the corresponding potentiometer.\n",
    "\n",
    "The machine could learn to classify simple shapes (such as distinguishing triangles from squares) purely from examples, without explicit programming. The *New York Times* famously reported it as a machine that could \"perceive, recognize, and identify its surroundings without any human training or control.\"\n",
    "\n",
    "### The Intellectual Leap\n",
    "\n",
    "The transition from McCulloch-Pitts to Rosenblatt can be summarized as:\n",
    "\n",
    "| Aspect | McCulloch-Pitts (1943) | Rosenblatt's Perceptron (1957) |\n",
    "|--------|----------------------|-------------------------------|\n",
    "| Weights | Fixed, hand-designed | Adjustable, learned from data |\n",
    "| Purpose | Model of neural computation | Pattern recognition machine |\n",
    "| Design | Logical analysis | Statistical learning |\n",
    "| Key question | What *can* neurons compute? | How can neurons *learn*? |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": [
    "## 4.2 The Mathematical Model\n",
    "\n",
    "```{admonition} Definition (Rosenblatt Perceptron)\n",
    ":class: note\n",
    "\n",
    "The **Rosenblatt Perceptron** is a binary classifier that computes, for an input vector $\\mathbf{x} \\in \\mathbb{R}^n$:\n",
    "\n",
    "$$f(\\mathbf{x}) = \\text{step}(\\mathbf{w} \\cdot \\mathbf{x} + b)$$\n",
    "\n",
    "where the **Heaviside step function** is:\n",
    "\n",
    "$$\\text{step}(z) = \\begin{cases} 1 & \\text{if } z \\geq 0 \\\\ 0 & \\text{if } z < 0 \\end{cases}$$\n",
    "\n",
    "The model is parameterized by a **weight vector** $\\mathbf{w} = (w_1, w_2, \\ldots, w_n) \\in \\mathbb{R}^n$ and a **bias** $b \\in \\mathbb{R}$. The key innovation over the McCulloch-Pitts neuron is that $\\mathbf{w}$ and $b$ are **learned from data** via a supervised learning rule, rather than being hand-designed.\n",
    "```\n",
    "\n",
    "### Components\n",
    "\n",
    "Let us define each component precisely:\n",
    "\n",
    "1. **Input vector**: $\\mathbf{x} = (x_1, x_2, \\ldots, x_n) \\in \\mathbb{R}^n$. This is the feature representation of a single data point. For the Mark I Perceptron, $n = 512$ (the outputs of the association units).\n",
    "\n",
    "2. **Weight vector**: $\\mathbf{w} = (w_1, w_2, \\ldots, w_n) \\in \\mathbb{R}^n$. Each weight $w_i$ represents the strength of the connection from input $x_i$ to the output neuron. Positive weights are **excitatory**; negative weights are **inhibitory**.\n",
    "\n",
    "3. **Bias**: $b \\in \\mathbb{R}$. The bias (also called the **threshold** when written as $-b$) determines how easy it is for the neuron to fire. A positive bias makes the neuron more likely to output 1; a negative bias makes it harder.\n",
    "\n",
    "4. **Dot product (pre-activation)**: The quantity $z = \\mathbf{w} \\cdot \\mathbf{x} + b = \\sum_{i=1}^n w_i x_i + b$ is the **pre-activation** or **net input**. It measures how much the input aligns with the weight vector.\n",
    "\n",
    "5. **Step function**: The step function produces a binary decision:\n",
    "   - If $z \\geq 0$: the perceptron outputs **1** (\"fires\", \"active\", \"positive class\")\n",
    "   - If $z < 0$: the perceptron outputs **0** (\"silent\", \"inactive\", \"negative class\")\n",
    "\n",
    "### Alternative Conventions\n",
    "\n",
    "Some texts use the $\\{-1, +1\\}$ convention for labels and outputs:\n",
    "\n",
    "$$f(\\mathbf{x}) = \\text{sign}(\\mathbf{w} \\cdot \\mathbf{x} + b) = \\begin{cases} +1 & \\text{if } \\mathbf{w} \\cdot \\mathbf{x} + b \\geq 0 \\\\ -1 & \\text{if } \\mathbf{w} \\cdot \\mathbf{x} + b < 0 \\end{cases}$$\n",
    "\n",
    "This convention is common in the convergence theorem analysis and in support vector machines. We will use both conventions as appropriate."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3",
   "metadata": {},
   "source": [
    "## 4.3 The Bias Trick\n",
    "\n",
    "It is often convenient to absorb the bias into the weight vector by **augmenting** the input with a constant feature $x_0 = 1$. Define:\n",
    "\n",
    "$$\\tilde{\\mathbf{x}} = (1, x_1, x_2, \\ldots, x_n) \\in \\mathbb{R}^{n+1}$$\n",
    "\n",
    "$$\\tilde{\\mathbf{w}} = (b, w_1, w_2, \\ldots, w_n) \\in \\mathbb{R}^{n+1}$$\n",
    "\n",
    "Then:\n",
    "\n",
    "$$\\tilde{\\mathbf{w}} \\cdot \\tilde{\\mathbf{x}} = b \\cdot 1 + w_1 x_1 + w_2 x_2 + \\cdots + w_n x_n = \\mathbf{w} \\cdot \\mathbf{x} + b$$\n",
    "\n",
    "So the perceptron model simplifies to:\n",
    "\n",
    "$$f(\\tilde{\\mathbf{x}}) = \\text{step}(\\tilde{\\mathbf{w}} \\cdot \\tilde{\\mathbf{x}})$$\n",
    "\n",
    "This \"bias trick\" has several advantages:\n",
    "\n",
    "- **Notational simplicity**: We deal with a single vector $\\tilde{\\mathbf{w}}$ instead of $\\mathbf{w}$ and $b$ separately.\n",
    "- **Algorithmic uniformity**: The learning rule updates all components of $\\tilde{\\mathbf{w}}$ identically.\n",
    "- **Geometric clarity**: The decision boundary passes through the origin in the augmented $(n+1)$-dimensional space.\n",
    "\n",
    "In practice, we often keep the bias separate for implementation clarity, but the bias trick is indispensable for theoretical analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": [
    "## 4.4 Python Implementation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Set global plotting style\n",
    "plt.rcParams.update({\n",
    "    'figure.figsize': (8, 6),\n",
    "    'font.size': 12,\n",
    "    'axes.grid': True,\n",
    "    'grid.alpha': 0.3\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-6",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Perceptron:\n",
    "    \"\"\"Rosenblatt's Perceptron model.\n",
    "    \n",
    "    A single-layer binary classifier that computes:\n",
    "        f(x) = step(w . x + b)\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    n_features : int\n",
    "        Number of input features (dimensionality of input space).\n",
    "    \n",
    "    Attributes\n",
    "    ----------\n",
    "    weights : np.ndarray of shape (n_features,)\n",
    "        Weight vector w.\n",
    "    bias : float\n",
    "        Bias term b.\n",
    "    \"\"\"\n",
    "    \n",
    "    def __init__(self, n_features):\n",
    "        self.n_features = n_features\n",
    "        self.weights = np.zeros(n_features)\n",
    "        self.bias = 0.0\n",
    "    \n",
    "    def decision_function(self, X):\n",
    "        \"\"\"Compute the raw pre-activation value w . x + b.\n",
    "        \n",
    "        Parameters\n",
    "        ----------\n",
    "        X : np.ndarray of shape (n_samples, n_features) or (n_features,)\n",
    "            Input data.\n",
    "        \n",
    "        Returns\n",
    "        -------\n",
    "        np.ndarray\n",
    "            Pre-activation values z = w . x + b.\n",
    "        \"\"\"\n",
    "        X = np.atleast_2d(X)\n",
    "        return X @ self.weights + self.bias\n",
    "    \n",
    "    def predict(self, X):\n",
    "        \"\"\"Predict binary class labels using the step function.\n",
    "        \n",
    "        Parameters\n",
    "        ----------\n",
    "        X : np.ndarray of shape (n_samples, n_features) or (n_features,)\n",
    "            Input data.\n",
    "        \n",
    "        Returns\n",
    "        -------\n",
    "        np.ndarray of int\n",
    "            Predicted labels (0 or 1).\n",
    "        \"\"\"\n",
    "        z = self.decision_function(X)\n",
    "        return (z >= 0).astype(int)\n",
    "    \n",
    "    def set_weights(self, weights, bias):\n",
    "        \"\"\"Manually set weights and bias.\n",
    "        \n",
    "        Parameters\n",
    "        ----------\n",
    "        weights : array-like of shape (n_features,)\n",
    "            Weight vector.\n",
    "        bias : float\n",
    "            Bias term.\n",
    "        \"\"\"\n",
    "        self.weights = np.array(weights, dtype=float)\n",
    "        self.bias = float(bias)\n",
    "    \n",
    "    def __repr__(self):\n",
    "        return (f\"Perceptron(weights={self.weights}, bias={self.bias})\")\n",
    "\n",
    "\n",
    "# Demonstration\n",
    "print(\"=== Perceptron Demonstration ===\")\n",
    "print()\n",
    "\n",
    "# Create a perceptron for 2D inputs\n",
    "p = Perceptron(n_features=2)\n",
    "\n",
    "# Set weights to implement AND-like behavior: w = [1, 1], b = -1.5\n",
    "p.set_weights([1.0, 1.0], -1.5)\n",
    "print(f\"Model: {p}\")\n",
    "print()\n",
    "\n",
    "# Test on all binary inputs\n",
    "test_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "print(\"Input     | w.x + b  | Output\")\n",
    "print(\"-\" * 35)\n",
    "for x in test_inputs:\n",
    "    z = p.decision_function(x)\n",
    "    y = p.predict(x)\n",
    "    print(f\"  {x}    | {z[0]:+.1f}    |   {y[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-7",
   "metadata": {},
   "source": [
    "The perceptron with $\\mathbf{w} = (1, 1)$ and $b = -1.5$ computes the AND function:\n",
    "- Only when *both* inputs are 1 does $w \\cdot x + b = 1 + 1 - 1.5 = 0.5 \\geq 0$.\n",
    "- All other inputs give $z < 0$.\n",
    "\n",
    "Let us verify this is indeed correct by examining the pre-activation values."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-8",
   "metadata": {},
   "source": [
    "## 4.5 Geometric Interpretation\n",
    "\n",
    "```{tip}\n",
    "**Geometric Interpretation of the Perceptron**\n",
    "\n",
    "The perceptron partitions the input space $\\mathbb{R}^n$ into two half-spaces separated by a **hyperplane**. The key geometric facts are:\n",
    "\n",
    "1. The **weight vector** $\\mathbf{w}$ is the **normal** (perpendicular) to the decision boundary.\n",
    "2. $\\mathbf{w}$ **points toward** the positive region (output = 1).\n",
    "3. The **bias** $b$ controls the **offset** of the hyperplane from the origin.\n",
    "4. The **distance** from the origin to the hyperplane is $|b|/\\lVert\\mathbf{w}\\rVert$.\n",
    "\n",
    "Changing $\\mathbf{w}$ **rotates** the boundary; changing $b$ **shifts** it.\n",
    "```\n",
    "\n",
    "The perceptron's decision is determined by the sign of the pre-activation $z = \\mathbf{w} \\cdot \\mathbf{x} + b$. The **decision boundary** is the set of points where $z = 0$:\n",
    "\n",
    "$$\\mathcal{H} = \\{\\mathbf{x} \\in \\mathbb{R}^n : \\mathbf{w} \\cdot \\mathbf{x} + b = 0\\}$$\n",
    "\n",
    "This is a **hyperplane** in $\\mathbb{R}^n$:\n",
    "- In $\\mathbb{R}^2$: a **line**\n",
    "- In $\\mathbb{R}^3$: a **plane**\n",
    "- In $\\mathbb{R}^n$ ($n > 3$): a hyperplane (an $(n-1)$-dimensional affine subspace)\n",
    "\n",
    "### Properties of the Decision Hyperplane\n",
    "\n",
    "**Property 1: $\\mathbf{w}$ is the normal vector.** The weight vector $\\mathbf{w}$ is perpendicular (normal) to the decision hyperplane. To see why: if $\\mathbf{x}_1$ and $\\mathbf{x}_2$ are both on the boundary, then\n",
    "\n",
    "$$\\mathbf{w} \\cdot \\mathbf{x}_1 + b = 0 \\quad \\text{and} \\quad \\mathbf{w} \\cdot \\mathbf{x}_2 + b = 0$$\n",
    "\n",
    "Subtracting gives $\\mathbf{w} \\cdot (\\mathbf{x}_1 - \\mathbf{x}_2) = 0$. Any vector along the hyperplane is orthogonal to $\\mathbf{w}$, confirming that $\\mathbf{w}$ is normal to the hyperplane.\n",
    "\n",
    "**Property 2: $\\mathbf{w}$ points toward the positive region.** The positive class ($f(\\mathbf{x}) = 1$) lies on the side of the hyperplane toward which $\\mathbf{w}$ points.\n",
    "\n",
    "**Property 3: Distance from the origin.** The signed distance from the origin to the hyperplane is\n",
    "\n",
    "$$d = \\frac{b}{\\lVert\\mathbf{w}\\rVert}$$\n",
    "\n",
    "where $\\lVert\\mathbf{w}\\rVert = \\sqrt{w_1^2 + w_2^2 + \\cdots + w_n^2}$ is the Euclidean norm of the weight vector. The absolute distance from the origin is $|b| / \\lVert\\mathbf{w}\\rVert$.\n",
    "\n",
    "**Property 4: Distance of any point to the hyperplane.** For a point $\\mathbf{x}_0$, the signed distance to the hyperplane is\n",
    "\n",
    "$$d(\\mathbf{x}_0) = \\frac{\\mathbf{w} \\cdot \\mathbf{x}_0 + b}{\\lVert\\mathbf{w}\\rVert}$$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-9",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "def plot_perceptron_geometry(weights, bias, data=None, labels=None,\n",
    "                             xlim=(-3, 3), ylim=(-3, 3), title=None):\n",
    "    \"\"\"Visualize the perceptron's decision boundary, weight vector, and regions.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    weights : array-like of shape (2,)\n",
    "        Weight vector [w1, w2].\n",
    "    bias : float\n",
    "        Bias term.\n",
    "    data : np.ndarray of shape (n, 2), optional\n",
    "        Data points to plot.\n",
    "    labels : np.ndarray of shape (n,), optional\n",
    "        Labels (0 or 1) for the data points.\n",
    "    xlim, ylim : tuple\n",
    "        Axis limits.\n",
    "    title : str, optional\n",
    "        Plot title.\n",
    "    \"\"\"\n",
    "    w = np.array(weights, dtype=float)\n",
    "    b = float(bias)\n",
    "    \n",
    "    fig, ax = plt.subplots(1, 1, figsize=(8, 8))\n",
    "    \n",
    "    # Create a mesh for the decision regions\n",
    "    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 300),\n",
    "                         np.linspace(ylim[0], ylim[1], 300))\n",
    "    Z = xx * w[0] + yy * w[1] + b\n",
    "    \n",
    "    # Color the decision regions\n",
    "    ax.contourf(xx, yy, Z, levels=[-1e10, 0, 1e10],\n",
    "                colors=['#FFCCCC', '#CCCCFF'], alpha=0.4)\n",
    "    \n",
    "    # Draw the decision boundary\n",
    "    ax.contour(xx, yy, Z, levels=[0], colors='black', linewidths=2)\n",
    "    \n",
    "    # Draw the weight vector as an arrow from the origin\n",
    "    # Scale it for visualization\n",
    "    w_norm = np.linalg.norm(w)\n",
    "    if w_norm > 0:\n",
    "        # Arrow from origin in the direction of w\n",
    "        scale = min(xlim[1], ylim[1]) * 0.4\n",
    "        w_display = w / w_norm * scale\n",
    "        ax.annotate('', xy=w_display, xytext=(0, 0),\n",
    "                    arrowprops=dict(arrowstyle='->', color='red', lw=2.5))\n",
    "        ax.text(w_display[0] * 1.15, w_display[1] * 1.15, r'$\\mathbf{w}$',\n",
    "                fontsize=16, color='red', fontweight='bold',\n",
    "                ha='center', va='center')\n",
    "        \n",
    "        # Mark the closest point on the boundary to the origin\n",
    "        closest_point = -b * w / (w_norm ** 2)\n",
    "        ax.plot(*closest_point, 'ko', markersize=6)\n",
    "        \n",
    "        # Draw the distance line from origin to the boundary\n",
    "        ax.plot([0, closest_point[0]], [0, closest_point[1]],\n",
    "                'g--', linewidth=1.5, label=f'$|b|/||w|| = {abs(b)/w_norm:.2f}$')\n",
    "    \n",
    "    # Plot data points if provided\n",
    "    if data is not None and labels is not None:\n",
    "        mask_0 = labels == 0\n",
    "        mask_1 = labels == 1\n",
    "        ax.scatter(data[mask_0, 0], data[mask_0, 1], c='red', marker='o',\n",
    "                   s=100, edgecolors='black', zorder=5, label='Class 0')\n",
    "        ax.scatter(data[mask_1, 0], data[mask_1, 1], c='blue', marker='s',\n",
    "                   s=100, edgecolors='black', zorder=5, label='Class 1')\n",
    "    \n",
    "    # Mark origin\n",
    "    ax.plot(0, 0, 'k+', markersize=12, markeredgewidth=2)\n",
    "    \n",
    "    # Labels\n",
    "    ax.set_xlabel('$x_1$', fontsize=14)\n",
    "    ax.set_ylabel('$x_2$', fontsize=14)\n",
    "    ax.set_xlim(xlim)\n",
    "    ax.set_ylim(ylim)\n",
    "    ax.set_aspect('equal')\n",
    "    ax.legend(fontsize=11, loc='upper left')\n",
    "    \n",
    "    # Add region labels\n",
    "    ax.text(xlim[0] + 0.3, ylim[1] - 0.5,\n",
    "            'Output = 0\\n($\\\\mathbf{w} \\\\cdot \\\\mathbf{x} + b < 0$)',\n",
    "            fontsize=10, color='red', alpha=0.7)\n",
    "    ax.text(xlim[1] - 2.0, ylim[0] + 0.3,\n",
    "            'Output = 1\\n($\\\\mathbf{w} \\\\cdot \\\\mathbf{x} + b \\\\geq 0$)',\n",
    "            fontsize=10, color='blue', alpha=0.7)\n",
    "    \n",
    "    if title:\n",
    "        ax.set_title(title, fontsize=14, fontweight='bold')\n",
    "    else:\n",
    "        ax.set_title(f'Perceptron: $\\\\mathbf{{w}}$={w.tolist()}, $b$={b}',\n",
    "                     fontsize=14)\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "# Visualize the AND perceptron\n",
    "AND_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "AND_labels = np.array([0, 0, 0, 1])\n",
    "\n",
    "plot_perceptron_geometry(\n",
    "    weights=[1, 1], bias=-1.5,\n",
    "    data=AND_data, labels=AND_labels,\n",
    "    xlim=(-1, 2.5), ylim=(-1, 2.5),\n",
    "    title='AND Gate: $\\\\mathbf{w} = (1, 1)$, $b = -1.5$'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10",
   "metadata": {},
   "source": [
    "In the plot above, observe that:\n",
    "- The **decision boundary** (black line) is the line $x_1 + x_2 - 1.5 = 0$, i.e., $x_2 = -x_1 + 1.5$.\n",
    "- The **weight vector** $\\mathbf{w} = (1,1)$ (red arrow) is perpendicular to this line.\n",
    "- The **positive region** (blue shading, output = 1) is on the side that $\\mathbf{w}$ points toward.\n",
    "- The **distance from the origin** to the boundary is $|b|/\\|\\mathbf{w}\\| = 1.5/\\sqrt{2} \\approx 1.06$ (green dashed line).\n",
    "- Only the point $(1,1)$ is on the positive side, which is exactly the AND function."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10a",
   "metadata": {},
   "source": [
    "### 3D Weight-Space Visualization\n",
    "\n",
    "To deepen our geometric understanding, let us visualize the perceptron in 3D. The weight vector $\\mathbf{w}$ determines the normal direction of the decision hyperplane. Below, we show the weight vector and the decision plane in 3D space, along with the input data points lifted by their pre-activation values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-10b",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "\n",
    "# 3D Weight-Space Visualization\n",
    "# We show the decision plane w1*x1 + w2*x2 + b = 0 in (x1, x2, z) space,\n",
    "# where z = w.x + b is the pre-activation.\n",
    "\n",
    "w = np.array([1.0, 1.0])\n",
    "b = -1.5\n",
    "\n",
    "fig = plt.figure(figsize=(12, 8))\n",
    "ax = fig.add_subplot(111, projection='3d')\n",
    "\n",
    "# Create mesh for the pre-activation surface z = w1*x1 + w2*x2 + b\n",
    "x1_range = np.linspace(-0.5, 2.0, 40)\n",
    "x2_range = np.linspace(-0.5, 2.0, 40)\n",
    "X1, X2 = np.meshgrid(x1_range, x2_range)\n",
    "Z_surface = w[0] * X1 + w[1] * X2 + b\n",
    "\n",
    "# Plot the pre-activation surface\n",
    "ax.plot_surface(X1, X2, Z_surface, alpha=0.25, color='steelblue',\n",
    "                edgecolor='none', label='Pre-activation plane')\n",
    "\n",
    "# Plot the decision plane z = 0\n",
    "Z_zero = np.zeros_like(X1)\n",
    "ax.plot_surface(X1, X2, Z_zero, alpha=0.15, color='gray',\n",
    "                edgecolor='none')\n",
    "\n",
    "# Plot the decision boundary line (intersection of the two planes) on z=0\n",
    "x1_line = np.linspace(-0.5, 2.0, 100)\n",
    "x2_line = (-w[0] * x1_line - b) / w[1]\n",
    "valid = (x2_line >= -0.5) & (x2_line <= 2.0)\n",
    "ax.plot(x1_line[valid], x2_line[valid], np.zeros(valid.sum()),\n",
    "        'k-', linewidth=3, label='Decision boundary ($z=0$)')\n",
    "\n",
    "# Plot the weight vector as an arrow from the origin in the (x1, x2) plane\n",
    "w_norm = np.linalg.norm(w)\n",
    "w_scaled = w / w_norm * 0.8\n",
    "ax.quiver(0, 0, 0, w_scaled[0], w_scaled[1], 0,\n",
    "          color='red', arrow_length_ratio=0.15, linewidth=3,\n",
    "          label=r'Weight vector $\\mathbf{w}$')\n",
    "\n",
    "# Plot the normal to the surface (w1, w2, 1) direction (gradient of z)\n",
    "normal = np.array([w[0], w[1], 1.0])\n",
    "normal_scaled = normal / np.linalg.norm(normal) * 0.8\n",
    "ax.quiver(0.5, 0.5, w[0]*0.5+w[1]*0.5+b, normal_scaled[0], normal_scaled[1], normal_scaled[2],\n",
    "          color='green', arrow_length_ratio=0.15, linewidth=2.5,\n",
    "          label='Surface normal')\n",
    "\n",
    "# Plot the data points (AND gate) at their pre-activation values\n",
    "AND_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "AND_labels = np.array([0, 0, 0, 1])\n",
    "AND_z = AND_data @ w + b\n",
    "\n",
    "for i in range(4):\n",
    "    color = 'blue' if AND_labels[i] == 1 else 'red'\n",
    "    marker = 's' if AND_labels[i] == 1 else 'o'\n",
    "    ax.scatter(AND_data[i, 0], AND_data[i, 1], AND_z[i],\n",
    "              c=color, marker=marker, s=150, edgecolors='black',\n",
    "              zorder=5, linewidths=1.5)\n",
    "    # Vertical line from z=0 to the point\n",
    "    ax.plot([AND_data[i, 0], AND_data[i, 0]],\n",
    "            [AND_data[i, 1], AND_data[i, 1]],\n",
    "            [0, AND_z[i]], '--', color=color, alpha=0.5, linewidth=1)\n",
    "    ax.text(AND_data[i, 0] + 0.08, AND_data[i, 1] + 0.08, AND_z[i] + 0.1,\n",
    "            f'z={AND_z[i]:.1f}', fontsize=9, color=color)\n",
    "\n",
    "ax.set_xlabel('$x_1$', fontsize=13, labelpad=10)\n",
    "ax.set_ylabel('$x_2$', fontsize=13, labelpad=10)\n",
    "ax.set_zlabel('$z = \\\\mathbf{w} \\\\cdot \\\\mathbf{x} + b$', fontsize=13, labelpad=10)\n",
    "ax.set_title('3D Weight-Space Visualization\\n'\n",
    "             'AND Gate: $\\\\mathbf{w}=(1,1)$, $b=-1.5$',\n",
    "             fontsize=14, fontweight='bold')\n",
    "\n",
    "# Add text annotation for z=0 plane\n",
    "ax.text(1.8, 1.8, 0.1, '$z = 0$ plane\\n(decision boundary)',\n",
    "        fontsize=10, color='gray')\n",
    "\n",
    "ax.view_init(elev=25, azim=-60)\n",
    "ax.legend(fontsize=10, loc='upper left')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10c",
   "metadata": {},
   "source": [
    "### Comparison: McCulloch-Pitts Neuron vs. Rosenblatt Perceptron\n",
    "\n",
    "Let us systematically compare the two foundational neural models we have studied so far."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-10d",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Comparison Table: M-P Neuron vs Perceptron\n",
    "fig, ax = plt.subplots(figsize=(14, 7))\n",
    "ax.axis('off')\n",
    "\n",
    "# Define comparison data\n",
    "headers = ['Feature', 'McCulloch-Pitts Neuron (1943)', 'Rosenblatt Perceptron (1957)']\n",
    "rows = [\n",
    "    ['Inputs', 'Binary $\\\\{0, 1\\\\}$', 'Real-valued $\\\\mathbb{R}^n$'],\n",
    "    ['Weights', 'Fixed: excitatory (+1)\\nor inhibitory (absolute veto)', 'Adjustable: any real value\\n(learned from data)'],\n",
    "    ['Threshold', 'Integer threshold $\\\\theta$\\n(hand-designed)', 'Real-valued bias $b$\\n(learned)'],\n",
    "    ['Learning', 'None -- weights set by\\nhuman designer', 'Perceptron learning rule\\n(automatic from examples)'],\n",
    "    ['Activation', 'Step function\\nwith inhibitory veto', 'Heaviside step function'],\n",
    "    ['Purpose', 'Model of neural logic:\\nwhat CAN neurons compute?', 'Pattern recognition machine:\\nhow can neurons LEARN?'],\n",
    "    ['Expressiveness', 'Any Boolean function\\n(with networks)', 'Only linearly separable\\nfunctions (single layer)'],\n",
    "    ['Convergence\\nGuarantee', 'N/A (no learning)', 'Yes, for linearly separable\\ndata (Convergence Theorem)'],\n",
    "    ['Key Limitation', 'No learning mechanism', 'Cannot learn XOR\\nor non-separable functions'],\n",
    "]\n",
    "\n",
    "# Create the table\n",
    "table = ax.table(\n",
    "    cellText=rows,\n",
    "    colLabels=headers,\n",
    "    cellLoc='center',\n",
    "    loc='center',\n",
    "    colWidths=[0.2, 0.4, 0.4]\n",
    ")\n",
    "\n",
    "# Style the table\n",
    "table.auto_set_font_size(False)\n",
    "table.set_fontsize(10)\n",
    "table.scale(1.0, 2.2)\n",
    "\n",
    "# Header style\n",
    "for j in range(3):\n",
    "    cell = table[0, j]\n",
    "    cell.set_facecolor('#2C3E50')\n",
    "    cell.set_text_props(color='white', fontweight='bold', fontsize=11)\n",
    "    cell.set_height(0.06)\n",
    "\n",
    "# Row styles\n",
    "for i in range(1, len(rows) + 1):\n",
    "    color = '#EBF5FB' if i % 2 == 1 else '#FDFEFE'\n",
    "    for j in range(3):\n",
    "        cell = table[i, j]\n",
    "        cell.set_facecolor(color)\n",
    "        if j == 0:\n",
    "            cell.set_text_props(fontweight='bold')\n",
    "\n",
    "ax.set_title('McCulloch-Pitts Neuron vs. Rosenblatt Perceptron',\n",
    "             fontsize=15, fontweight='bold', pad=20)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-11",
   "metadata": {},
   "source": [
    "## 4.6 Interactive Exploration\n",
    "\n",
    "Let us explore how the decision boundary changes as we modify the weight vector $\\mathbf{w}$ and the bias $b$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-12",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Generate random 2D data: two clusters\n",
    "np.random.seed(42)\n",
    "n_per_class = 30\n",
    "\n",
    "# Cluster 1 (class 0): centered at (-1, -1)\n",
    "X0 = np.random.randn(n_per_class, 2) * 0.5 + np.array([-1, -1])\n",
    "# Cluster 2 (class 1): centered at (1, 1)\n",
    "X1 = np.random.randn(n_per_class, 2) * 0.5 + np.array([1, 1])\n",
    "\n",
    "X = np.vstack([X0, X1])\n",
    "y = np.array([0] * n_per_class + [1] * n_per_class)\n",
    "\n",
    "print(f\"Generated {len(X)} data points ({n_per_class} per class).\")\n",
    "print(f\"Class 0 center: {X0.mean(axis=0).round(2)}\")\n",
    "print(f\"Class 1 center: {X1.mean(axis=0).round(2)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-13",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Experiment 1: Changing weights rotates the decision boundary\n",
    "fig, axes = plt.subplots(2, 3, figsize=(18, 12))\n",
    "\n",
    "# Different weight vectors (different angles)\n",
    "weight_configs = [\n",
    "    ([1, 0], 0, '$\\\\mathbf{w} = (1, 0)$\\nVertical boundary'),\n",
    "    ([1, 1], 0, '$\\\\mathbf{w} = (1, 1)$\\n$45^\\\\circ$ boundary'),\n",
    "    ([0, 1], 0, '$\\\\mathbf{w} = (0, 1)$\\nHorizontal boundary'),\n",
    "    ([-1, 1], 0, '$\\\\mathbf{w} = (-1, 1)$\\n$135^\\\\circ$ boundary'),\n",
    "    ([2, 1], 0, '$\\\\mathbf{w} = (2, 1)$\\nSteep boundary'),\n",
    "    ([1, 2], 0, '$\\\\mathbf{w} = (1, 2)$\\nShallow boundary'),\n",
    "]\n",
    "\n",
    "for ax, (w, b, label) in zip(axes.flat, weight_configs):\n",
    "    w = np.array(w, dtype=float)\n",
    "    \n",
    "    # Create mesh\n",
    "    xx, yy = np.meshgrid(np.linspace(-3, 3, 200), np.linspace(-3, 3, 200))\n",
    "    Z = xx * w[0] + yy * w[1] + b\n",
    "    \n",
    "    ax.contourf(xx, yy, Z, levels=[-1e10, 0, 1e10],\n",
    "                colors=['#FFCCCC', '#CCCCFF'], alpha=0.3)\n",
    "    ax.contour(xx, yy, Z, levels=[0], colors='black', linewidths=2)\n",
    "    \n",
    "    # Plot data\n",
    "    ax.scatter(X[y == 0, 0], X[y == 0, 1], c='red', marker='o',\n",
    "               s=30, edgecolors='black', alpha=0.7, label='Class 0')\n",
    "    ax.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', marker='s',\n",
    "               s=30, edgecolors='black', alpha=0.7, label='Class 1')\n",
    "    \n",
    "    # Draw w as arrow\n",
    "    w_norm = np.linalg.norm(w)\n",
    "    w_disp = w / w_norm * 1.0\n",
    "    ax.annotate('', xy=w_disp, xytext=(0, 0),\n",
    "                arrowprops=dict(arrowstyle='->', color='red', lw=2))\n",
    "    \n",
    "    ax.set_xlim(-3, 3)\n",
    "    ax.set_ylim(-3, 3)\n",
    "    ax.set_aspect('equal')\n",
    "    ax.set_title(label, fontsize=11)\n",
    "    ax.grid(True, alpha=0.3)\n",
    "\n",
    "fig.suptitle('Effect of Changing Weights: Rotation of the Decision Boundary',\n",
    "             fontsize=14, fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-14",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Experiment 2: Changing bias shifts the decision boundary\n",
    "fig, axes = plt.subplots(2, 3, figsize=(18, 12))\n",
    "\n",
    "w_fixed = np.array([1.0, 1.0])\n",
    "bias_values = [-3, -1.5, -0.5, 0, 0.5, 1.5]\n",
    "\n",
    "for ax, b in zip(axes.flat, bias_values):\n",
    "    # Create mesh\n",
    "    xx, yy = np.meshgrid(np.linspace(-3, 3, 200), np.linspace(-3, 3, 200))\n",
    "    Z = xx * w_fixed[0] + yy * w_fixed[1] + b\n",
    "    \n",
    "    ax.contourf(xx, yy, Z, levels=[-1e10, 0, 1e10],\n",
    "                colors=['#FFCCCC', '#CCCCFF'], alpha=0.3)\n",
    "    ax.contour(xx, yy, Z, levels=[0], colors='black', linewidths=2)\n",
    "    \n",
    "    # Plot data\n",
    "    ax.scatter(X[y == 0, 0], X[y == 0, 1], c='red', marker='o',\n",
    "               s=30, edgecolors='black', alpha=0.7)\n",
    "    ax.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', marker='s',\n",
    "               s=30, edgecolors='black', alpha=0.7)\n",
    "    \n",
    "    # Mark distance from origin\n",
    "    w_norm = np.linalg.norm(w_fixed)\n",
    "    dist = abs(b) / w_norm\n",
    "    \n",
    "    ax.set_xlim(-3, 3)\n",
    "    ax.set_ylim(-3, 3)\n",
    "    ax.set_aspect('equal')\n",
    "    ax.set_title(f'$b = {b}$, distance from origin = ${dist:.2f}$', fontsize=11)\n",
    "    ax.grid(True, alpha=0.3)\n",
    "\n",
    "fig.suptitle('Effect of Changing Bias: Shifting the Decision Boundary\\n'\n",
    "             '(Fixed $\\\\mathbf{w} = (1, 1)$)',\n",
    "             fontsize=14, fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-15",
   "metadata": {},
   "source": [
    "### Key Observations\n",
    "\n",
    "1. **Changing $\\mathbf{w}$ rotates the boundary**: The direction of $\\mathbf{w}$ determines the *orientation* of the decision line. Different weight vectors lead to different angles.\n",
    "\n",
    "2. **Changing $b$ shifts the boundary**: The bias controls the *position* (offset) of the decision line. Increasing $b$ moves the boundary in the direction *opposite* to $\\mathbf{w}$ (i.e., it enlarges the positive region).\n",
    "\n",
    "3. **Together**, $\\mathbf{w}$ and $b$ fully specify any hyperplane in $\\mathbb{R}^n$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-16",
   "metadata": {},
   "source": [
    "## 4.7 Linear Separability\n",
    "\n",
    "A perceptron can only solve classification problems where the two classes can be separated by a hyperplane. This leads to a fundamental concept.\n",
    "\n",
    "### Definition\n",
    "\n",
    "> Two sets $S_0, S_1 \\subset \\mathbb{R}^n$ are **linearly separable** if there exists a hyperplane $\\mathcal{H} = \\{\\mathbf{x} : \\mathbf{w} \\cdot \\mathbf{x} + b = 0\\}$ such that:\n",
    ">\n",
    "> $$\\mathbf{w} \\cdot \\mathbf{x} + b > 0 \\quad \\text{for all } \\mathbf{x} \\in S_1$$\n",
    ">\n",
    "> $$\\mathbf{w} \\cdot \\mathbf{x} + b < 0 \\quad \\text{for all } \\mathbf{x} \\in S_0$$\n",
    "\n",
    "```{danger}\n",
    "**Fundamental Limitation**: The perceptron can ONLY learn **linearly separable** functions. If the data cannot be separated by a single hyperplane (a line in 2D, a plane in 3D), the perceptron learning algorithm will **never converge** and no set of weights will produce zero training error.\n",
    "\n",
    "The most famous example is the **XOR** function. This single limitation was the focus of Minsky and Papert's devastating 1969 critique, which contributed to the first \"AI winter.\"\n",
    "```\n",
    "\n",
    "### Equivalence with Convex Hull Disjointness\n",
    "\n",
    "An elegant geometric characterization:\n",
    "\n",
    "> **Theorem**: Two finite sets $S_0, S_1 \\subset \\mathbb{R}^n$ are linearly separable if and only if their **convex hulls** are disjoint:\n",
    ">\n",
    "> $$\\text{conv}(S_0) \\cap \\text{conv}(S_1) = \\emptyset$$\n",
    "\n",
    "Recall that the **convex hull** of a set $S$ is the smallest convex set containing $S$:\n",
    "\n",
    "$$\\text{conv}(S) = \\left\\{ \\sum_{i=1}^k \\lambda_i \\mathbf{x}_i : \\mathbf{x}_i \\in S, \\; \\lambda_i \\geq 0, \\; \\sum_{i=1}^k \\lambda_i = 1 \\right\\}$$\n",
    "\n",
    "### Geometric Intuition\n",
    "\n",
    "- **In $\\mathbb{R}^2$** (the plane): A dataset is linearly separable if you can draw a straight line between the two classes.\n",
    "- **In $\\mathbb{R}^3$** (3D space): You need a plane to separate the classes.\n",
    "- **In general $\\mathbb{R}^n$**: An $(n-1)$-dimensional hyperplane.\n",
    "\n",
    "### Not All Datasets Are Linearly Separable\n",
    "\n",
    "The most famous example of a non-linearly-separable function is **XOR** (exclusive or):\n",
    "\n",
    "| $x_1$ | $x_2$ | XOR |\n",
    "|-------|-------|-----|\n",
    "| 0     | 0     | 0   |\n",
    "| 0     | 1     | 1   |\n",
    "| 1     | 0     | 1   |\n",
    "| 1     | 1     | 0   |\n",
    "\n",
    "The positive points $(0,1)$ and $(1,0)$ are on *opposite corners* of the unit square, as are the negative points $(0,0)$ and $(1,1)$. No straight line can separate them. This limitation will be a central topic in our later discussion of Minsky and Papert's analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-17",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Visualize linearly separable vs. non-separable datasets\n",
    "fig, axes = plt.subplots(1, 3, figsize=(18, 6))\n",
    "\n",
    "# --- Panel 1: AND (linearly separable) ---\n",
    "ax = axes[0]\n",
    "AND_X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "AND_y = np.array([0, 0, 0, 1])\n",
    "\n",
    "ax.scatter(AND_X[AND_y == 0, 0], AND_X[AND_y == 0, 1], c='red', marker='o',\n",
    "           s=200, edgecolors='black', zorder=5, label='Class 0')\n",
    "ax.scatter(AND_X[AND_y == 1, 0], AND_X[AND_y == 1, 1], c='blue', marker='s',\n",
    "           s=200, edgecolors='black', zorder=5, label='Class 1')\n",
    "\n",
    "# Draw a separating line\n",
    "x_line = np.linspace(-0.5, 1.5, 100)\n",
    "y_line = -x_line + 1.5  # w=[1,1], b=-1.5\n",
    "ax.plot(x_line, y_line, 'k-', linewidth=2, label='Decision boundary')\n",
    "ax.fill_between(x_line, y_line, 2, alpha=0.1, color='blue')\n",
    "ax.fill_between(x_line, y_line, -1, alpha=0.1, color='red')\n",
    "\n",
    "ax.set_xlim(-0.5, 1.5)\n",
    "ax.set_ylim(-0.5, 1.5)\n",
    "ax.set_aspect('equal')\n",
    "ax.set_title('AND: Linearly Separable', fontsize=13, fontweight='bold',\n",
    "             color='green')\n",
    "ax.legend(fontsize=10)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "# --- Panel 2: OR (linearly separable) ---\n",
    "ax = axes[1]\n",
    "OR_y = np.array([0, 1, 1, 1])\n",
    "\n",
    "ax.scatter(AND_X[OR_y == 0, 0], AND_X[OR_y == 0, 1], c='red', marker='o',\n",
    "           s=200, edgecolors='black', zorder=5, label='Class 0')\n",
    "ax.scatter(AND_X[OR_y == 1, 0], AND_X[OR_y == 1, 1], c='blue', marker='s',\n",
    "           s=200, edgecolors='black', zorder=5, label='Class 1')\n",
    "\n",
    "# Separating line\n",
    "y_line = -x_line + 0.5  # w=[1,1], b=-0.5\n",
    "ax.plot(x_line, y_line, 'k-', linewidth=2, label='Decision boundary')\n",
    "ax.fill_between(x_line, y_line, 2, alpha=0.1, color='blue')\n",
    "ax.fill_between(x_line, y_line, -1, alpha=0.1, color='red')\n",
    "\n",
    "ax.set_xlim(-0.5, 1.5)\n",
    "ax.set_ylim(-0.5, 1.5)\n",
    "ax.set_aspect('equal')\n",
    "ax.set_title('OR: Linearly Separable', fontsize=13, fontweight='bold',\n",
    "             color='green')\n",
    "ax.legend(fontsize=10)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "# --- Panel 3: XOR (NOT linearly separable) ---\n",
    "ax = axes[2]\n",
    "XOR_y = np.array([0, 1, 1, 0])\n",
    "\n",
    "ax.scatter(AND_X[XOR_y == 0, 0], AND_X[XOR_y == 0, 1], c='red', marker='o',\n",
    "           s=200, edgecolors='black', zorder=5, label='Class 0')\n",
    "ax.scatter(AND_X[XOR_y == 1, 0], AND_X[XOR_y == 1, 1], c='blue', marker='s',\n",
    "           s=200, edgecolors='black', zorder=5, label='Class 1')\n",
    "\n",
    "# Try several lines to show none works\n",
    "for angle in [30, 60, 90, 120, 150]:\n",
    "    rad = np.radians(angle)\n",
    "    slope = -np.cos(rad) / (np.sin(rad) + 1e-10)\n",
    "    y_line = slope * (x_line - 0.5) + 0.5\n",
    "    ax.plot(x_line, y_line, '--', linewidth=1, alpha=0.4, color='gray')\n",
    "\n",
    "ax.set_xlim(-0.5, 1.5)\n",
    "ax.set_ylim(-0.5, 1.5)\n",
    "ax.set_aspect('equal')\n",
    "ax.set_title('XOR: NOT Linearly Separable', fontsize=13, fontweight='bold',\n",
    "             color='red')\n",
    "ax.text(0.5, -0.3, 'No single line can\\nseparate the classes!',\n",
    "        ha='center', fontsize=11, color='red', style='italic')\n",
    "ax.legend(fontsize=10)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.suptitle('Linear Separability of Boolean Functions', fontsize=15,\n",
    "             fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-18",
   "metadata": {},
   "source": [
    "## 4.8 Exercises\n",
    "\n",
    "### Exercise 4.1: Manual Prediction\n",
    "\n",
    "Given a perceptron with $\\mathbf{w} = (2, -3)$ and $b = 1$, compute the output $f(\\mathbf{x})$ for each of the following input vectors:\n",
    "\n",
    "1. $\\mathbf{x} = (1, 0)$\n",
    "2. $\\mathbf{x} = (0, 1)$\n",
    "3. $\\mathbf{x} = (1, 1)$\n",
    "4. $\\mathbf{x} = (3, 2)$\n",
    "5. $\\mathbf{x} = (-1, -1)$\n",
    "\n",
    "Show your work: write out $z = \\mathbf{w} \\cdot \\mathbf{x} + b$ for each.\n",
    "\n",
    "```{hint}\n",
    ":class: dropdown\n",
    "For each input, compute the dot product $z = w_1 x_1 + w_2 x_2 + b = 2x_1 - 3x_2 + 1$ and then apply the step function: output is 1 if $z \\geq 0$, otherwise 0. For example, for $\\mathbf{x} = (1, 0)$: $z = 2(1) + (-3)(0) + 1 = 3 \\geq 0$, so $f(\\mathbf{x}) = 1$.\n",
    "```\n",
    "\n",
    "### Exercise 4.2: Decision Boundary Sketch\n",
    "\n",
    "For a perceptron with $\\mathbf{w} = (1, 2)$ and $b = -3$:\n",
    "\n",
    "1. Write the equation of the decision boundary.\n",
    "2. Find the intercepts of this line with the $x_1$ and $x_2$ axes.\n",
    "3. Sketch the line by hand on a piece of paper. Mark the positive and negative regions.\n",
    "4. Compute the distance from the origin to the decision boundary.\n",
    "\n",
    "```{hint}\n",
    ":class: dropdown\n",
    "The decision boundary is $x_1 + 2x_2 - 3 = 0$, i.e., $x_2 = -0.5x_1 + 1.5$. The $x_1$-intercept is at $x_1 = 3$ (set $x_2 = 0$) and the $x_2$-intercept is at $x_2 = 1.5$ (set $x_1 = 0$). The distance from the origin is $|b|/\\|\\mathbf{w}\\| = 3/\\sqrt{5} \\approx 1.34$.\n",
    "```\n",
    "\n",
    "### Exercise 4.3: Scaling Invariance\n",
    "\n",
    "Suppose you have a perceptron with weights $\\mathbf{w}$ and bias $b$. You now multiply **all** parameters by a constant $c > 0$, obtaining $\\mathbf{w}' = c\\mathbf{w}$ and $b' = cb$.\n",
    "\n",
    "1. Does the decision boundary change? Prove your answer.\n",
    "2. Does the distance from the origin to the boundary change?\n",
    "3. What is the practical implication of this observation?\n",
    "\n",
    "```{hint}\n",
    ":class: dropdown\n",
    "Write out the boundary equation $\\mathbf{w}' \\cdot \\mathbf{x} + b' = 0$ and simplify. Since $c\\mathbf{w} \\cdot \\mathbf{x} + cb = c(\\mathbf{w} \\cdot \\mathbf{x} + b) = 0$, and $c > 0$, this is equivalent to $\\mathbf{w} \\cdot \\mathbf{x} + b = 0$. The boundary is unchanged! The distance $|b'|/\\|\\mathbf{w}'\\| = |cb|/(c\\|\\mathbf{w}\\|) = |b|/\\|\\mathbf{w}\\|$ is also unchanged.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-19",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Solution verification for Exercise 4.1\n",
    "print(\"=\" * 50)\n",
    "print(\"Exercise 4.1: Solution Verification\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "p = Perceptron(n_features=2)\n",
    "p.set_weights([2, -3], 1)\n",
    "\n",
    "test_vectors = [\n",
    "    np.array([1, 0]),\n",
    "    np.array([0, 1]),\n",
    "    np.array([1, 1]),\n",
    "    np.array([3, 2]),\n",
    "    np.array([-1, -1]),\n",
    "]\n",
    "\n",
    "print(f\"\\nPerceptron: w = {p.weights}, b = {p.bias}\")\n",
    "print()\n",
    "for i, x in enumerate(test_vectors, 1):\n",
    "    z = p.decision_function(x)[0]\n",
    "    y_hat = p.predict(x)[0]\n",
    "    print(f\"  {i}. x = {x}\")\n",
    "    print(f\"     z = w . x + b = {p.weights[0]}*{x[0]} + ({p.weights[1]})*{x[1]} + {p.bias} = {z}\")\n",
    "    print(f\"     f(x) = step({z}) = {y_hat}\")\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-20",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Solution verification for Exercise 4.2\n",
    "print(\"=\" * 50)\n",
    "print(\"Exercise 4.2: Decision Boundary Analysis\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "w = np.array([1, 2])\n",
    "b = -3\n",
    "\n",
    "print(f\"\\nBoundary equation: {w[0]}*x1 + {w[1]}*x2 + ({b}) = 0\")\n",
    "print(f\"Simplified: x1 + 2*x2 = 3\")\n",
    "print(f\"Slope-intercept: x2 = -0.5*x1 + 1.5\")\n",
    "print(f\"\\nx1-intercept: x1 = {-b/w[0]} (set x2 = 0)\")\n",
    "print(f\"x2-intercept: x2 = {-b/w[1]} (set x1 = 0)\")\n",
    "print(f\"\\nDistance from origin: |b|/||w|| = {abs(b)}/{np.linalg.norm(w):.4f} = {abs(b)/np.linalg.norm(w):.4f}\")\n",
    "\n",
    "# Visualize\n",
    "plot_perceptron_geometry(\n",
    "    weights=w, bias=b,\n",
    "    xlim=(-1, 5), ylim=(-1, 3),\n",
    "    title='Exercise 4.2: $\\\\mathbf{w}=(1,2)$, $b=-3$'\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}