{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": [
    "# Chapter 5: The Perceptron Learning Algorithm\n",
    "\n",
    "\n",
    "## Part 2: The Perceptron"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-1",
   "metadata": {},
   "source": [
    "## 5.1 Introduction: The Key Innovation\n",
    "\n",
    "The most revolutionary aspect of Rosenblatt's perceptron was not the model itself -- a single neuron computing a weighted sum passed through a threshold had been considered by McCulloch and Pitts in 1943. The breakthrough was the **learning algorithm**: a simple iterative procedure that could *automatically discover* the correct weights from labeled examples.\n",
    "\n",
    "Before Rosenblatt, building a neural network required a human designer to:\n",
    "1. Analyze the problem.\n",
    "2. Determine the correct weights and thresholds by hand.\n",
    "3. Wire the network accordingly.\n",
    "\n",
    "After Rosenblatt, the process became:\n",
    "1. Collect labeled training examples: $(\\mathbf{x}_1, y_1), (\\mathbf{x}_2, y_2), \\ldots, (\\mathbf{x}_m, y_m)$.\n",
    "2. Initialize the weights (e.g., to zero).\n",
    "3. Run the learning algorithm, which adjusts weights automatically.\n",
    "\n",
    "This shift from **design** to **learning** is the philosophical cornerstone of modern machine learning."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": [
    "## 5.2 The Weight Update Rule\n",
    "\n",
    "```{admonition} Algorithm (Perceptron Learning Rule)\n",
    ":class: important\n",
    "\n",
    "Given a training example $(\\mathbf{x}, y)$ where $y \\in \\{0, 1\\}$ is the true label and $\\hat{y} = f(\\mathbf{x})$ is the predicted label, the perceptron **update rule** is:\n",
    "\n",
    "$$\\mathbf{w}_{t+1} = \\mathbf{w}_t + \\eta(y - \\hat{y})\\mathbf{x}$$\n",
    "\n",
    "$$b_{t+1} = b_t + \\eta(y - \\hat{y})$$\n",
    "\n",
    "where $\\eta > 0$ is the **learning rate**. The update is applied **only when the perceptron makes a misclassification** ($\\hat{y} \\neq y$). When the prediction is correct, $(y - \\hat{y}) = 0$ and no change occurs.\n",
    "```\n",
    "\n",
    "```{admonition} Case Analysis\n",
    ":class: note\n",
    "\n",
    "Since both $y$ and $\\hat{y}$ are binary, there are four possibilities:\n",
    "\n",
    "| Case | $y$ | $\\hat{y}$ | $y - \\hat{y}$ | Action | Geometric Effect |\n",
    "|------|-----|-----------|---------------|--------|------------------|\n",
    "| Correct positive | 1 | 1 | 0 | No update | -- |\n",
    "| Correct negative | 0 | 0 | 0 | No update | -- |\n",
    "| **False negative** | 1 | 0 | **+1** | $\\mathbf{w} \\leftarrow \\mathbf{w} + \\eta\\mathbf{x}$ | Rotate $\\mathbf{w}$ *toward* $\\mathbf{x}$ |\n",
    "| **False positive** | 0 | 1 | **-1** | $\\mathbf{w} \\leftarrow \\mathbf{w} - \\eta\\mathbf{x}$ | Rotate $\\mathbf{w}$ *away from* $\\mathbf{x}$ |\n",
    "```\n",
    "\n",
    "```{tip}\n",
    "**Geometric Interpretation of Weight Updates**\n",
    "\n",
    "Think of the weight vector $\\mathbf{w}$ as an arrow pointing toward the \"positive\" side of the decision boundary. Each update **rotates** the hyperplane:\n",
    "\n",
    "- **False negative** ($y=1$, $\\hat{y}=0$): The input $\\mathbf{x}$ *should* be on the positive side but isn't. Adding $\\eta\\mathbf{x}$ to $\\mathbf{w}$ rotates the boundary so that $\\mathbf{w}$ points more toward $\\mathbf{x}$, pulling the positive region to include it.\n",
    "- **False positive** ($y=0$, $\\hat{y}=1$): The input $\\mathbf{x}$ *should* be on the negative side but isn't. Subtracting $\\eta\\mathbf{x}$ from $\\mathbf{w}$ rotates the boundary away from $\\mathbf{x}$, pushing the positive region away.\n",
    "\n",
    "Each update is a **local correction**: it improves classification of the current point without regard for other points. The remarkable fact (proven by the convergence theorem in Chapter 6) is that these local corrections globally converge to a correct solution, *provided one exists*.\n",
    "```\n",
    "\n",
    "```{warning}\n",
    "**Important Conditions for the Perceptron Learning Rule**\n",
    "\n",
    "- The learning rate $\\eta$ must be **strictly positive** ($\\eta > 0$). If $\\eta = 0$, no updates occur and the algorithm cannot learn.\n",
    "- The algorithm is **guaranteed to converge only for linearly separable data**. If the data is not linearly separable, the algorithm will oscillate indefinitely without finding a solution.\n",
    "- For the step activation function, the *value* of $\\eta$ does not affect the number of epochs or whether convergence occurs -- only the magnitude of the final weights changes. This is a special property of the step function.\n",
    "```\n",
    "\n",
    "```{danger}\n",
    "If the training data is **NOT linearly separable**, the perceptron learning algorithm will **loop forever** (or until the maximum epoch limit is reached). It will never achieve zero training error, and the decision boundary will oscillate without settling. There is no built-in mechanism to detect this situation -- the algorithm simply keeps updating weights indefinitely. This is why the convergence theorem's assumption of linear separability is critical.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-3",
   "metadata": {},
   "source": [
    "## 5.3 The Full Algorithm\n",
    "\n",
    "Here is the complete perceptron learning algorithm in pseudocode:\n",
    "\n",
    "\n",
    "**Algorithm**: Perceptron Learning Algorithm\n",
    "\n",
    "**Input**: Training set $\\{(\\mathbf{x}_1, y_1), \\ldots, (\\mathbf{x}_m, y_m)\\}$ with $y_i \\in \\{0,1\\}$; learning rate $\\eta > 0$; maximum epochs $T$.\n",
    "\n",
    "**Output**: Weight vector $\\mathbf{w}$ and bias $b$.\n",
    "\n",
    "1. Initialize $\\mathbf{w} \\leftarrow \\mathbf{0}$, $b \\leftarrow 0$\n",
    "2. **For** epoch $= 1, 2, \\ldots, T$:\n",
    "   1. $\\text{errors} \\leftarrow 0$\n",
    "   2. **For** each $(\\mathbf{x}_i, y_i)$ in training set:\n",
    "      1. $\\hat{y}_i \\leftarrow \\text{step}(\\mathbf{w} \\cdot \\mathbf{x}_i + b)$\n",
    "      2. **If** $\\hat{y}_i \\neq y_i$:\n",
    "         1. $\\mathbf{w} \\leftarrow \\mathbf{w} + \\eta(y_i - \\hat{y}_i)\\mathbf{x}_i$\n",
    "         2. $b \\leftarrow b + \\eta(y_i - \\hat{y}_i)$\n",
    "         3. $\\text{errors} \\leftarrow \\text{errors} + 1$\n",
    "   3. **If** $\\text{errors} = 0$: **return** $\\mathbf{w}, b$ (converged)\n",
    "3. **Return** $\\mathbf{w}, b$ (did not converge within $T$ epochs)\n",
    "\n",
    "\n",
    "**Notes**:\n",
    "- The algorithm cycles through the training set repeatedly (each full pass is an **epoch**).\n",
    "- Convergence is detected when an entire epoch passes with zero errors.\n",
    "- If the data is linearly separable, convergence is **guaranteed** (Perceptron Convergence Theorem, Chapter 6).\n",
    "- If the data is NOT linearly separable, the algorithm will **never converge** and will oscillate indefinitely."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": [
    "## 5.4 Python Implementation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-5",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.rcParams.update({\n",
    "    'figure.figsize': (8, 6),\n",
    "    'font.size': 12,\n",
    "    'axes.grid': True,\n",
    "    'grid.alpha': 0.3\n",
    "})\n",
    "\n",
    "try:\n",
    "    plt.style.use('seaborn-v0_8-whitegrid')\n",
    "except OSError:\n",
    "    pass  # Fall back to default if style not available"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-6",
   "metadata": {},
   "outputs": [],
   "source": "class Perceptron:\n    \"\"\"Rosenblatt's Perceptron with the learning algorithm.\n    \n    Parameters\n    ----------\n    n_features : int\n        Number of input features.\n    learning_rate : float, default=1.0\n        Learning rate eta.\n    max_epochs : int, default=100\n        Maximum number of training epochs.\n    \n    Attributes\n    ----------\n    weights : np.ndarray of shape (n_features,)\n        Learned weight vector.\n    bias : float\n        Learned bias.\n    weight_history : list of tuples (weights_copy, bias_copy)\n        Weight snapshots after each update (for visualization).\n    errors_per_epoch : list of int\n        Number of misclassifications in each epoch.\n    converged : bool\n        Whether the algorithm converged within max_epochs.\n    \"\"\"\n    \n    def __init__(self, n_features, learning_rate=1.0, max_epochs=100):\n        self.n_features = n_features\n        self.learning_rate = learning_rate\n        self.max_epochs = max_epochs\n        \n        # Initialize weights to zero\n        self.weights = np.zeros(n_features)\n        self.bias = 0.0\n        \n        # Training history\n        self.weight_history = []\n        self.errors_per_epoch = []\n        self.converged = False\n    \n    def decision_function(self, X):\n        \"\"\"Compute pre-activation z = w . x + b.\"\"\"\n        X = np.atleast_2d(X)\n        return X @ self.weights + self.bias\n    \n    def predict(self, X):\n        \"\"\"Predict binary labels using the step function.\"\"\"\n        return (self.decision_function(X) >= 0).astype(int)\n    \n    def fit(self, X, y, verbose=False):\n        \"\"\"Train the perceptron using the learning algorithm.\n        \n        Parameters\n        ----------\n        X : np.ndarray of shape (n_samples, n_features)\n            Training data.\n        y : np.ndarray of shape (n_samples,)\n            Binary labels (0 or 1).\n        verbose : bool\n            If True, print detailed update information.\n        \n        Returns\n        -------\n        self\n        \"\"\"\n        X = np.atleast_2d(X)\n        y = np.asarray(y)\n        n_samples = X.shape[0]\n        \n        # Reset\n        self.weights = np.zeros(self.n_features)\n        self.bias = 0.0\n        self.weight_history = [(self.weights.copy(), self.bias)]\n        self.errors_per_epoch = []\n        self.converged = False\n        \n        for epoch in range(1, self.max_epochs + 1):\n            errors = 0\n            \n            for i in range(n_samples):\n                x_i = X[i]\n                y_i = y[i]\n                y_hat = int(np.dot(self.weights, x_i) + self.bias >= 0)\n                \n                if y_hat != y_i:\n                    # Update weights and bias\n                    update = self.learning_rate * (y_i - y_hat)\n                    self.weights = self.weights + update * x_i\n                    self.bias = self.bias + update\n                    errors += 1\n                    \n                    # Record snapshot\n                    self.weight_history.append((self.weights.copy(), self.bias))\n                    \n                    if verbose:\n                        print(f\"  Epoch {epoch}, sample {i}: \"\n                              f\"x={x_i}, y={y_i}, y_hat={y_hat}, \"\n                              f\"update={update:+.0f} -> \"\n                              f\"w={self.weights}, b={self.bias}\")\n            \n            self.errors_per_epoch.append(errors)\n            \n            if verbose:\n                print(f\"  Epoch {epoch}: {errors} errors\")\n            \n            if errors == 0:\n                self.converged = True\n                if verbose:\n                    print(f\"\\n  Converged after {epoch} epochs!\")\n                break\n        \n        if not self.converged and verbose:\n            print(f\"\\n  Did NOT converge after {self.max_epochs} epochs.\")\n        \n        return self\n    \n    def plot_decision_boundary(self, X, y, ax=None, title=None):\n        \"\"\"Plot the decision boundary for 2D data.\n        \n        Parameters\n        ----------\n        X : np.ndarray of shape (n_samples, 2)\n        y : np.ndarray of shape (n_samples,)\n        ax : matplotlib axes, optional\n        title : str, optional\n        \"\"\"\n        if ax is None:\n            fig, ax = plt.subplots(figsize=(8, 6))\n        \n        # Determine plot limits\n        margin = 0.5\n        x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin\n        y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin\n        \n        # Create mesh\n        xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),\n                             np.linspace(y_min, y_max, 200))\n        Z = xx * self.weights[0] + yy * self.weights[1] + self.bias\n        \n        # Decision regions\n        ax.contourf(xx, yy, Z, levels=[-1e10, 0, 1e10],\n                    colors=['#FFCCCC', '#CCCCFF'], alpha=0.4)\n        ax.contour(xx, yy, Z, levels=[0], colors='black', linewidths=2)\n        \n        # Data points\n        ax.scatter(X[y == 0, 0], X[y == 0, 1], c='red', marker='o',\n                   s=120, edgecolors='black', zorder=5, label='Class 0')\n        ax.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', marker='s',\n                   s=120, edgecolors='black', zorder=5, label='Class 1')\n        \n        ax.set_xlabel('$x_1$', fontsize=13)\n        ax.set_ylabel('$x_2$', fontsize=13)\n        ax.legend(fontsize=11)\n        ax.grid(True, alpha=0.3)\n        \n        if title:\n            ax.set_title(title, fontsize=13, fontweight='bold')\n        else:\n            ax.set_title(f'$\\\\mathbf{{w}}$={np.round(self.weights, 3).tolist()}, '\n                         f'$b$={self.bias:.3f}', fontsize=12)\n        \n        return ax\n    \n    def plot_convergence(self, ax=None, title=None):\n        \"\"\"Plot the number of errors per epoch.\"\"\"\n        if ax is None:\n            fig, ax = plt.subplots(figsize=(8, 4))\n        \n        epochs = range(1, len(self.errors_per_epoch) + 1)\n        ax.bar(epochs, self.errors_per_epoch, color='steelblue',\n               edgecolor='black', alpha=0.7)\n        ax.set_xlabel('Epoch', fontsize=13)\n        ax.set_ylabel('Errors', fontsize=13)\n        ax.set_xticks(list(epochs))\n        ax.grid(True, alpha=0.3, axis='y')\n        \n        if title:\n            ax.set_title(title, fontsize=13, fontweight='bold')\n        else:\n            status = 'Converged' if self.converged else 'Not converged'\n            ax.set_title(f'Convergence ({status})', fontsize=13)\n        \n        return ax\n    \n    def __repr__(self):\n        return (f\"Perceptron(w={self.weights}, b={self.bias}, \"\n                f\"converged={self.converged})\")"
  },
  {
   "cell_type": "markdown",
   "id": "cell-7",
   "metadata": {},
   "source": [
    "## 5.5 Learning the AND Gate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-8",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# AND gate truth table\n",
    "X_and = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_and = np.array([0, 0, 0, 1])\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"Training Perceptron on AND Gate\")\n",
    "print(\"=\" * 60)\n",
    "print()\n",
    "print(\"Truth table:\")\n",
    "for x, y in zip(X_and, y_and):\n",
    "    print(f\"  {x[0]} AND {x[1]} = {y}\")\n",
    "print()\n",
    "\n",
    "# Train with verbose output\n",
    "p_and = Perceptron(n_features=2, learning_rate=1.0, max_epochs=20)\n",
    "p_and.fit(X_and, y_and, verbose=True)\n",
    "\n",
    "print(f\"\\nFinal model: {p_and}\")\n",
    "print(f\"\\nVerification:\")\n",
    "for x, y in zip(X_and, y_and):\n",
    "    y_hat = p_and.predict(x)[0]\n",
    "    status = 'OK' if y_hat == y else 'WRONG'\n",
    "    print(f\"  {x} -> {y_hat} (expected {y}) [{status}]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-9",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# Visualize AND gate result\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "p_and.plot_decision_boundary(X_and, y_and, ax=ax1,\n",
    "                              title='AND Gate: Learned Decision Boundary')\n",
    "\n",
    "# Enhanced convergence plot with annotations\n",
    "epochs = range(1, len(p_and.errors_per_epoch) + 1)\n",
    "bars = ax2.bar(epochs, p_and.errors_per_epoch, color='steelblue',\n",
    "               edgecolor='black', alpha=0.7)\n",
    "# Annotate each bar\n",
    "for bar_obj, err in zip(bars, p_and.errors_per_epoch):\n",
    "    if err > 0:\n",
    "        ax2.text(bar_obj.get_x() + bar_obj.get_width()/2., bar_obj.get_height() + 0.05,\n",
    "                f'{err}', ha='center', va='bottom', fontsize=10, fontweight='bold')\n",
    "ax2.set_xlabel('Epoch', fontsize=13)\n",
    "ax2.set_ylabel('Errors', fontsize=13)\n",
    "ax2.set_title('AND Gate: Convergence', fontsize=13, fontweight='bold')\n",
    "ax2.set_xticks(list(epochs))\n",
    "ax2.grid(True, alpha=0.3, axis='y')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10",
   "metadata": {},
   "source": [
    "## 5.6 Learning the OR Gate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-11",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# OR gate truth table\n",
    "X_or = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_or = np.array([0, 1, 1, 1])\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"Training Perceptron on OR Gate\")\n",
    "print(\"=\" * 60)\n",
    "print()\n",
    "print(\"Truth table:\")\n",
    "for x, y in zip(X_or, y_or):\n",
    "    print(f\"  {x[0]} OR {x[1]} = {y}\")\n",
    "print()\n",
    "\n",
    "p_or = Perceptron(n_features=2, learning_rate=1.0, max_epochs=20)\n",
    "p_or.fit(X_or, y_or, verbose=True)\n",
    "\n",
    "print(f\"\\nFinal model: {p_or}\")\n",
    "print(f\"\\nVerification:\")\n",
    "for x, y in zip(X_or, y_or):\n",
    "    y_hat = p_or.predict(x)[0]\n",
    "    status = 'OK' if y_hat == y else 'WRONG'\n",
    "    print(f\"  {x} -> {y_hat} (expected {y}) [{status}]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-12",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "p_or.plot_decision_boundary(X_or, y_or, ax=ax1,\n",
    "                             title='OR Gate: Learned Decision Boundary')\n",
    "\n",
    "# Enhanced convergence plot\n",
    "epochs = range(1, len(p_or.errors_per_epoch) + 1)\n",
    "bars = ax2.bar(epochs, p_or.errors_per_epoch, color='#2ecc71',\n",
    "               edgecolor='black', alpha=0.7)\n",
    "for bar_obj, err in zip(bars, p_or.errors_per_epoch):\n",
    "    if err > 0:\n",
    "        ax2.text(bar_obj.get_x() + bar_obj.get_width()/2., bar_obj.get_height() + 0.05,\n",
    "                f'{err}', ha='center', va='bottom', fontsize=10, fontweight='bold')\n",
    "ax2.set_xlabel('Epoch', fontsize=13)\n",
    "ax2.set_ylabel('Errors', fontsize=13)\n",
    "ax2.set_title('OR Gate: Convergence', fontsize=13, fontweight='bold')\n",
    "ax2.set_xticks(list(epochs))\n",
    "ax2.grid(True, alpha=0.3, axis='y')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-13",
   "metadata": {},
   "source": [
    "## 5.7 Learning the NAND Gate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-14",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# NAND gate truth table\n",
    "X_nand = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_nand = np.array([1, 1, 1, 0])\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"Training Perceptron on NAND Gate\")\n",
    "print(\"=\" * 60)\n",
    "print()\n",
    "print(\"Truth table:\")\n",
    "for x, y in zip(X_nand, y_nand):\n",
    "    print(f\"  {x[0]} NAND {x[1]} = {y}\")\n",
    "print()\n",
    "\n",
    "p_nand = Perceptron(n_features=2, learning_rate=1.0, max_epochs=20)\n",
    "p_nand.fit(X_nand, y_nand, verbose=True)\n",
    "\n",
    "print(f\"\\nFinal model: {p_nand}\")\n",
    "print(f\"\\nVerification:\")\n",
    "for x, y in zip(X_nand, y_nand):\n",
    "    y_hat = p_nand.predict(x)[0]\n",
    "    status = 'OK' if y_hat == y else 'WRONG'\n",
    "    print(f\"  {x} -> {y_hat} (expected {y}) [{status}]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-15",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "p_nand.plot_decision_boundary(X_nand, y_nand, ax=ax1,\n",
    "                               title='NAND Gate: Learned Decision Boundary')\n",
    "\n",
    "# Enhanced convergence plot\n",
    "epochs = range(1, len(p_nand.errors_per_epoch) + 1)\n",
    "bars = ax2.bar(epochs, p_nand.errors_per_epoch, color='#e74c3c',\n",
    "               edgecolor='black', alpha=0.7)\n",
    "for bar_obj, err in zip(bars, p_nand.errors_per_epoch):\n",
    "    if err > 0:\n",
    "        ax2.text(bar_obj.get_x() + bar_obj.get_width()/2., bar_obj.get_height() + 0.05,\n",
    "                f'{err}', ha='center', va='bottom', fontsize=10, fontweight='bold')\n",
    "ax2.set_xlabel('Epoch', fontsize=13)\n",
    "ax2.set_ylabel('Errors', fontsize=13)\n",
    "ax2.set_title('NAND Gate: Convergence', fontsize=13, fontweight='bold')\n",
    "ax2.set_xticks(list(epochs))\n",
    "ax2.grid(True, alpha=0.3, axis='y')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-16",
   "metadata": {},
   "source": [
    "## 5.8 Learning the NOR Gate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-17",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# NOR gate truth table\n",
    "X_nor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_nor = np.array([1, 0, 0, 0])\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"Training Perceptron on NOR Gate\")\n",
    "print(\"=\" * 60)\n",
    "print()\n",
    "print(\"Truth table:\")\n",
    "for x, y in zip(X_nor, y_nor):\n",
    "    print(f\"  {x[0]} NOR {x[1]} = {y}\")\n",
    "print()\n",
    "\n",
    "p_nor = Perceptron(n_features=2, learning_rate=1.0, max_epochs=20)\n",
    "p_nor.fit(X_nor, y_nor, verbose=True)\n",
    "\n",
    "print(f\"\\nFinal model: {p_nor}\")\n",
    "print(f\"\\nVerification:\")\n",
    "for x, y in zip(X_nor, y_nor):\n",
    "    y_hat = p_nor.predict(x)[0]\n",
    "    status = 'OK' if y_hat == y else 'WRONG'\n",
    "    print(f\"  {x} -> {y_hat} (expected {y}) [{status}]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-18",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "p_nor.plot_decision_boundary(X_nor, y_nor, ax=ax1,\n",
    "                              title='NOR Gate: Learned Decision Boundary')\n",
    "\n",
    "# Enhanced convergence plot\n",
    "epochs = range(1, len(p_nor.errors_per_epoch) + 1)\n",
    "bars = ax2.bar(epochs, p_nor.errors_per_epoch, color='#9b59b6',\n",
    "               edgecolor='black', alpha=0.7)\n",
    "for bar_obj, err in zip(bars, p_nor.errors_per_epoch):\n",
    "    if err > 0:\n",
    "        ax2.text(bar_obj.get_x() + bar_obj.get_width()/2., bar_obj.get_height() + 0.05,\n",
    "                f'{err}', ha='center', va='bottom', fontsize=10, fontweight='bold')\n",
    "ax2.set_xlabel('Epoch', fontsize=13)\n",
    "ax2.set_ylabel('Errors', fontsize=13)\n",
    "ax2.set_title('NOR Gate: Convergence', fontsize=13, fontweight='bold')\n",
    "ax2.set_xticks(list(epochs))\n",
    "ax2.grid(True, alpha=0.3, axis='y')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-19",
   "metadata": {},
   "source": [
    "## 5.9 Decision Boundary Evolution\n",
    "\n",
    "One of the most instructive ways to understand the perceptron learning algorithm is to visualize how the decision boundary changes after each weight update. Below, we show the decision boundary at each snapshot in the weight history during training on the AND gate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-20",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Retrain AND gate and show boundary evolution\n",
    "p_evo = Perceptron(n_features=2, learning_rate=1.0, max_epochs=20)\n",
    "p_evo.fit(X_and, y_and)\n",
    "\n",
    "n_snapshots = len(p_evo.weight_history)\n",
    "n_cols = min(4, n_snapshots)\n",
    "n_rows = (n_snapshots + n_cols - 1) // n_cols\n",
    "\n",
    "fig, axes = plt.subplots(n_rows, n_cols, figsize=(4 * n_cols, 4 * n_rows))\n",
    "if n_rows == 1 and n_cols == 1:\n",
    "    axes = np.array([axes])\n",
    "axes = np.atleast_2d(axes)\n",
    "\n",
    "# Color map for evolution steps\n",
    "cmap = plt.cm.viridis\n",
    "colors = [cmap(i / max(n_snapshots - 1, 1)) for i in range(n_snapshots)]\n",
    "\n",
    "for idx, (w_snap, b_snap) in enumerate(p_evo.weight_history):\n",
    "    row, col = divmod(idx, n_cols)\n",
    "    ax = axes[row, col]\n",
    "    \n",
    "    # Create mesh\n",
    "    xx, yy = np.meshgrid(np.linspace(-0.5, 1.5, 200),\n",
    "                         np.linspace(-0.5, 1.5, 200))\n",
    "    \n",
    "    w_norm = np.linalg.norm(w_snap)\n",
    "    if w_norm > 0:\n",
    "        Z = xx * w_snap[0] + yy * w_snap[1] + b_snap\n",
    "        ax.contourf(xx, yy, Z, levels=[-1e10, 0, 1e10],\n",
    "                    colors=['#FFCCCC', '#CCCCFF'], alpha=0.3)\n",
    "        ax.contour(xx, yy, Z, levels=[0], colors='black', linewidths=1.5)\n",
    "    \n",
    "    ax.scatter(X_and[y_and == 0, 0], X_and[y_and == 0, 1],\n",
    "               c='red', marker='o', s=80, edgecolors='black', zorder=5)\n",
    "    ax.scatter(X_and[y_and == 1, 0], X_and[y_and == 1, 1],\n",
    "               c='blue', marker='s', s=80, edgecolors='black', zorder=5)\n",
    "    \n",
    "    step_label = 'Init' if idx == 0 else f'Update {idx}'\n",
    "    ax.set_title(f'{step_label}\\nw={np.round(w_snap, 1)}, b={b_snap:.1f}',\n",
    "                 fontsize=9)\n",
    "    ax.set_xlim(-0.5, 1.5)\n",
    "    ax.set_ylim(-0.5, 1.5)\n",
    "    ax.set_aspect('equal')\n",
    "    ax.grid(True, alpha=0.3)\n",
    "\n",
    "# Hide empty axes\n",
    "for idx in range(n_snapshots, n_rows * n_cols):\n",
    "    row, col = divmod(idx, n_cols)\n",
    "    axes[row, col].set_visible(False)\n",
    "\n",
    "fig.suptitle('AND Gate: Decision Boundary Evolution',\n",
    "             fontsize=14, fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-21",
   "metadata": {},
   "source": [
    "## 5.10 Random Linearly Separable Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-22",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def generate_linearly_separable_data(n_per_class=50, separation=2.0, seed=42):\n",
    "    \"\"\"Generate 2D linearly separable data (two Gaussian clusters).\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    n_per_class : int\n",
    "        Number of points per class.\n",
    "    separation : float\n",
    "        Distance between cluster centers (larger = easier to separate).\n",
    "    seed : int\n",
    "        Random seed.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    X : np.ndarray of shape (2*n_per_class, 2)\n",
    "    y : np.ndarray of shape (2*n_per_class,)\n",
    "    \"\"\"\n",
    "    rng = np.random.RandomState(seed)\n",
    "    center0 = np.array([-separation / 2, 0])\n",
    "    center1 = np.array([separation / 2, 0])\n",
    "    \n",
    "    X0 = rng.randn(n_per_class, 2) * 0.5 + center0\n",
    "    X1 = rng.randn(n_per_class, 2) * 0.5 + center1\n",
    "    \n",
    "    X = np.vstack([X0, X1])\n",
    "    y = np.array([0] * n_per_class + [1] * n_per_class)\n",
    "    \n",
    "    return X, y\n",
    "\n",
    "\n",
    "# Train on random data\n",
    "X_rand, y_rand = generate_linearly_separable_data(n_per_class=50, separation=3.0)\n",
    "\n",
    "p_rand = Perceptron(n_features=2, learning_rate=1.0, max_epochs=100)\n",
    "p_rand.fit(X_rand, y_rand)\n",
    "\n",
    "print(f\"Converged: {p_rand.converged}\")\n",
    "print(f\"Epochs: {len(p_rand.errors_per_epoch)}\")\n",
    "print(f\"Final weights: {p_rand.weights}\")\n",
    "print(f\"Final bias: {p_rand.bias}\")\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))\n",
    "p_rand.plot_decision_boundary(X_rand, y_rand, ax=ax1,\n",
    "                               title='Random Linearly Separable Data')\n",
    "\n",
    "# Enhanced convergence with color gradient\n",
    "epochs = range(1, len(p_rand.errors_per_epoch) + 1)\n",
    "bar_colors = ['#e74c3c' if e > 0 else '#2ecc71' for e in p_rand.errors_per_epoch]\n",
    "ax2.bar(epochs, p_rand.errors_per_epoch, color=bar_colors,\n",
    "        edgecolor='black', alpha=0.7)\n",
    "ax2.set_xlabel('Epoch', fontsize=13)\n",
    "ax2.set_ylabel('Errors', fontsize=13)\n",
    "ax2.set_title('Convergence', fontsize=13, fontweight='bold')\n",
    "ax2.grid(True, alpha=0.3, axis='y')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-23",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": "import numpy as np\nimport matplotlib.pyplot as plt\n\n# Experiment: How does cluster separation affect convergence speed?\nseparations = [1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0]\nn_trials = 20\nresults = {}\n\nfor sep in separations:\n    epoch_counts = []\n    for trial in range(n_trials):\n        X_trial, y_trial = generate_linearly_separable_data(\n            n_per_class=30, separation=sep, seed=trial * 100 + int(sep * 10))\n        p_trial = Perceptron(n_features=2, learning_rate=1.0, max_epochs=500)\n        p_trial.fit(X_trial, y_trial)\n        if p_trial.converged:\n            epoch_counts.append(len(p_trial.errors_per_epoch))\n    results[sep] = epoch_counts\n\n# Plot results with enhanced styling\nfig, ax = plt.subplots(figsize=(10, 6))\n\npositions = list(range(len(separations)))\nbox_data = [results[s] for s in separations]\n\n# Filter out empty lists for boxplot (matplotlib can't plot empty data)\nnon_empty_positions = [p for p, d in zip(positions, box_data) if len(d) > 0]\nnon_empty_data = [d for d in box_data if len(d) > 0]\nnon_empty_seps = [s for s, d in zip(separations, box_data) if len(d) > 0]\n\nif non_empty_data:\n    bp = ax.boxplot(non_empty_data, positions=non_empty_positions, widths=0.6, patch_artist=True)\n\n    # Color gradient from red (hard) to green (easy)\n    cmap = plt.cm.RdYlGn\n    for i, patch in enumerate(bp['boxes']):\n        color = cmap(non_empty_positions[i] / max(len(separations) - 1, 1))\n        patch.set_facecolor(color)\n        patch.set_alpha(0.7)\n\nax.set_xticks(positions)\nax.set_xticklabels([str(s) for s in separations])\nax.set_xlabel('Cluster Separation', fontsize=13)\nax.set_ylabel('Epochs to Convergence', fontsize=13)\nax.set_title('Effect of Cluster Separation on Convergence Speed',\n             fontsize=14, fontweight='bold')\nax.grid(True, alpha=0.3, axis='y')\n\n# Add annotation only if we have valid data for the endpoints\nlast_data = results[separations[-1]]\nfirst_data = results[separations[0]]\nif last_data and first_data:\n    ax.annotate('Larger separation\\n= faster convergence',\n                xy=(len(separations)-1, np.mean(last_data)),\n                xytext=(len(separations)-2.5, np.mean(first_data) + 3),\n                fontsize=11, ha='center',\n                arrowprops=dict(arrowstyle='->', color='green', lw=1.5),\n                color='green', fontweight='bold')\n\nplt.tight_layout()\nplt.show()\n\nprint(\"\\nSummary:\")\nprint(f\"{'Separation':>12} | {'Mean Epochs':>12} | {'Std':>8} | {'Converged':>10}\")\nprint(\"-\" * 52)\nfor sep in separations:\n    data = results[sep]\n    if data:\n        print(f\"{sep:>12.1f} | {np.mean(data):>12.1f} | {np.std(data):>8.1f} | {len(data):>10}/{n_trials}\")\n    else:\n        print(f\"{sep:>12.1f} | {'N/A':>12} | {'N/A':>8} | {0:>10}/{n_trials}\")"
  },
  {
   "cell_type": "markdown",
   "id": "cell-24",
   "metadata": {},
   "source": [
    "### Observation\n",
    "\n",
    "As cluster separation increases:\n",
    "- The **margin** $\\gamma$ between the classes grows.\n",
    "- By the Perceptron Convergence Theorem, the bound on mistakes is $R^2/\\gamma^2$, so larger $\\gamma$ means fewer mistakes.\n",
    "- The perceptron converges faster, often in just a few epochs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-25",
   "metadata": {},
   "source": [
    "## 5.11 Learning Rate Experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-26",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Experiment with different learning rates\n",
    "X_lr, y_lr = generate_linearly_separable_data(n_per_class=40, separation=2.5, seed=42)\n",
    "\n",
    "learning_rates = [0.1, 0.5, 1.0, 5.0, 10.0]\n",
    "\n",
    "fig, axes = plt.subplots(2, len(learning_rates), figsize=(4 * len(learning_rates), 8))\n",
    "\n",
    "for col, eta in enumerate(learning_rates):\n",
    "    p_lr = Perceptron(n_features=2, learning_rate=eta, max_epochs=100)\n",
    "    p_lr.fit(X_lr, y_lr)\n",
    "    \n",
    "    # Decision boundary\n",
    "    ax_top = axes[0, col]\n",
    "    p_lr.plot_decision_boundary(X_lr, y_lr, ax=ax_top,\n",
    "                                title=f'$\\\\eta = {eta}$')\n",
    "    ax_top.legend([], frameon=False)  # Remove legend for cleanliness\n",
    "    \n",
    "    # Convergence\n",
    "    ax_bot = axes[1, col]\n",
    "    epochs_lr = range(1, len(p_lr.errors_per_epoch) + 1)\n",
    "    bar_colors = ['#e74c3c' if e > 0 else '#2ecc71' for e in p_lr.errors_per_epoch]\n",
    "    ax_bot.bar(epochs_lr, p_lr.errors_per_epoch, color=bar_colors,\n",
    "              edgecolor='black', alpha=0.7)\n",
    "    ax_bot.set_xlabel('Epoch', fontsize=10)\n",
    "    ax_bot.set_ylabel('Errors', fontsize=10)\n",
    "    ax_bot.set_title(f'Epochs: {len(p_lr.errors_per_epoch)}', fontsize=11, fontweight='bold')\n",
    "    ax_bot.grid(True, alpha=0.3, axis='y')\n",
    "    \n",
    "    # Print summary\n",
    "    print(f\"eta={eta:>5.1f}: converged={p_lr.converged}, \"\n",
    "          f\"epochs={len(p_lr.errors_per_epoch)}, \"\n",
    "          f\"||w||={np.linalg.norm(p_lr.weights):.2f}, \"\n",
    "          f\"b={p_lr.bias:.2f}\")\n",
    "\n",
    "fig.suptitle('Effect of Learning Rate on Perceptron Training',\n",
    "             fontsize=15, fontweight='bold', y=1.02)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-27",
   "metadata": {},
   "source": [
    "### Key Insight: Learning Rate and the Step Function\n",
    "\n",
    "For the perceptron with a **step activation function**, the learning rate $\\eta$ does not affect whether the algorithm converges or the number of epochs required. Here is why:\n",
    "\n",
    "- The decision boundary depends only on the *direction* of $\\mathbf{w}$ and the *ratio* $b/\\|\\mathbf{w}\\|$.\n",
    "- Scaling all weights and bias by $\\eta$ does not change the decision boundary: $\\text{step}(\\eta(\\mathbf{w} \\cdot \\mathbf{x} + b)) = \\text{step}(\\mathbf{w} \\cdot \\mathbf{x} + b)$.\n",
    "- Different $\\eta$ values produce weight vectors that differ only in scale, not direction.\n",
    "\n",
    "The number of **update steps** (mistakes) is the same regardless of $\\eta$. The norm $\\|\\mathbf{w}\\|$ scales linearly with $\\eta$, but the decision boundary is identical.\n",
    "\n",
    "This is a special property of the step function. For differentiable activation functions (like the sigmoid), the learning rate plays a crucial role in convergence behavior."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-28",
   "metadata": {},
   "source": [
    "## 5.12 Exercises\n",
    "\n",
    "### Exercise 5.1: Custom Dataset\n",
    "\n",
    "Create your own 2D dataset (at least 10 points per class) that is linearly separable. Train the perceptron on it and visualize the result. Try different initializations and orderings of the training data. Does the final boundary change?\n",
    "\n",
    "### Exercise 5.2: The XOR Disaster\n",
    "\n",
    "Train the perceptron on the XOR truth table. Run for 100 epochs.\n",
    "\n",
    "1. Does it converge? Why or why not?\n",
    "2. Plot the errors per epoch. What pattern do you observe?\n",
    "3. Plot the decision boundary at the final epoch. Is any configuration of $\\mathbf{w}$ and $b$ a valid solution?\n",
    "\n",
    "### Exercise 5.3: The {-1, +1} Variant\n",
    "\n",
    "Implement the perceptron learning algorithm using labels $y \\in \\{-1, +1\\}$ and the sign function. The update rule becomes:\n",
    "\n",
    "$$\\mathbf{w}_{t+1} = \\mathbf{w}_t + \\eta \\cdot y_i \\cdot \\mathbf{x}_i \\quad \\text{(on misclassification)}$$\n",
    "\n",
    "Train on the AND gate (relabeling 0 as -1) and verify it gives the same decision boundary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-29",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Exercise 5.2 starter code: XOR\n",
    "X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])\n",
    "y_xor = np.array([0, 1, 1, 0])\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"Attempting to Learn XOR\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "p_xor = Perceptron(n_features=2, learning_rate=1.0, max_epochs=100)\n",
    "p_xor.fit(X_xor, y_xor, verbose=False)\n",
    "\n",
    "print(f\"\\nConverged: {p_xor.converged}\")\n",
    "print(f\"Epochs run: {len(p_xor.errors_per_epoch)}\")\n",
    "print(f\"Final weights: {p_xor.weights}\")\n",
    "print(f\"Final bias: {p_xor.bias}\")\n",
    "\n",
    "print(\"\\nVerification:\")\n",
    "for x, y in zip(X_xor, y_xor):\n",
    "    y_hat = p_xor.predict(x)[0]\n",
    "    status = 'OK' if y_hat == y else 'WRONG'\n",
    "    print(f\"  {x} -> {y_hat} (expected {y}) [{status}]\")\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))\n",
    "\n",
    "p_xor.plot_decision_boundary(X_xor, y_xor, ax=ax1,\n",
    "                              title='XOR: Final Decision Boundary (wrong!)')\n",
    "\n",
    "# Enhanced oscillation visualization\n",
    "epochs = range(1, len(p_xor.errors_per_epoch) + 1)\n",
    "ax2.plot(list(epochs), p_xor.errors_per_epoch, 'r-o', linewidth=1.5,\n",
    "         markersize=3, alpha=0.7)\n",
    "ax2.fill_between(list(epochs), p_xor.errors_per_epoch, alpha=0.2, color='red')\n",
    "ax2.set_xlabel('Epoch', fontsize=13)\n",
    "ax2.set_ylabel('Errors', fontsize=13)\n",
    "ax2.set_title('XOR: Errors Never Reach Zero (Oscillation)', fontsize=13,\n",
    "              fontweight='bold', color='red')\n",
    "ax2.grid(True, alpha=0.3)\n",
    "ax2.axhline(y=0, color='green', linestyle='--', linewidth=1, alpha=0.5,\n",
    "            label='Target: 0 errors')\n",
    "ax2.legend(fontsize=11)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}