{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.0"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-title",
   "metadata": {},
   "source": "# Three Separation Lemmas\n*The mathematical heart of Monico's elementary UAT proof*"
  },
  {
   "cell_type": "markdown",
   "id": "cell-intro",
   "metadata": {},
   "source": "The proof strategy is to show progressively stronger separation results.\nEach lemma builds on the previous one, adding one hidden layer at each step:\n\n$$\\sigma \\text{ separates points} \\;\\longrightarrow\\; \\mathcal{N}_2 \\text{ separates points from sets} \\;\\longrightarrow\\; \\mathcal{N}_3 \\text{ separates sets from sets} \\;\\longrightarrow\\; \\mathcal{N}_4 \\text{ approximates all continuous functions.}$$\n\nThis notebook covers the three separation lemmas (Lemmas 3.1--3.3 of\n[Monico, 2024](https://arxiv.org/abs/2406.10002)). Each lemma is stated formally,\nproved step by step with auxiliary commentary, verified numerically, and illustrated\nwith interactive plots. For notation and prerequisites, see\n[IP01: Overview & Notations](ip01_monico_overview)."
  },
  {
   "cell_type": "markdown",
   "id": "cell-sec1-header",
   "metadata": {},
   "source": "---\n\n## Section 1: Lemma 3.1 --- Point Separation"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-statement",
   "metadata": {},
   "source": "```{admonition} Lemma 3.1 (Point-Point Separation)\n:class: important\n\nLet $x_0$ and $x_1$ be distinct real numbers. For each $\\varepsilon > 0$ there exist\n$s, t \\in \\mathbb{R}$ such that\n\n$$\\sigma(s + t x_0) < \\varepsilon \\qquad\\text{and}\\qquad \\sigma(s + t x_1) > 1 - \\varepsilon.$$\n\nIf, in addition, $x_0 < x_1$ and $\\varepsilon < 1/2$, then\n$\\sigma(s + tx) < \\varepsilon$ on the interval $(-\\infty, x_0]$ and\n$\\sigma(s + tx) > 1 - \\varepsilon$ on the interval $[x_1, \\infty)$.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-intuition",
   "metadata": {},
   "source": "### Intuition\n\nWe want to shove $x_0$ down to $0$ and $x_1$ up to $1$ using $\\sigma$.\nThe trick: an affine pre-map $s + tx$ lets us place $\\sigma$'s steep transition\nzone exactly between $x_0$ and $x_1$. By making $t$ large enough (i.e., making\nthe affine map steep), we can squeeze the transition into an arbitrarily narrow\nwindow, achieving any desired $\\varepsilon$-separation."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-proof-header",
   "metadata": {},
   "source": "### Proof walkthrough"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-step1",
   "metadata": {},
   "source": "**Step 1: Find the IVT targets.**\n\nBy the Intermediate Value Theorem, since $\\sigma$ is continuous with\n$\\lim_{x \\to -\\infty} \\sigma(x) = 0$ and $\\lim_{x \\to +\\infty} \\sigma(x) = 1$,\nfor any target value $c \\in (0,1)$ there exists $y \\in \\mathbb{R}$ with $\\sigma(y) = c$.\n\nChoose $y_0$ with $\\sigma(y_0) = \\varepsilon/2$ and $y_1$ with\n$\\sigma(y_1) = 1 - \\varepsilon/2$.\n\n```{admonition} Auxiliary: Why does $y_0$ exist?\n:class: dropdown tip\n\nBecause $\\sigma$ is continuous, $\\sigma(x) \\to 0$ as $x \\to -\\infty$, and\n$\\sigma(x) \\to 1$ as $x \\to +\\infty$. Since $\\varepsilon/2 \\in (0,1)$,\nthe Intermediate Value Theorem guarantees at least one $y_0 \\in \\mathbb{R}$\nwith $\\sigma(y_0) = \\varepsilon/2$. Furthermore, since $\\sigma$ is\n*strictly increasing* (it is increasing and continuous with distinct limits),\nthis $y_0$ is unique. For the standard sigmoid\n$\\sigma(x) = 1/(1+e^{-x})$, we can compute it explicitly:\n$y_0 = \\log\\bigl(\\varepsilon/(2 - \\varepsilon)\\bigr)$.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-step2",
   "metadata": {},
   "source": "**Step 2: Solve the $2 \\times 2$ linear system.**\n\nWe need $s + t x_0 = y_0$ and $s + t x_1 = y_1$. In matrix form:\n\n$$\\begin{pmatrix} 1 & x_0 \\\\ 1 & x_1 \\end{pmatrix} \\begin{pmatrix} s \\\\ t \\end{pmatrix} = \\begin{pmatrix} y_0 \\\\ y_1 \\end{pmatrix}.$$\n\nThe determinant is $x_1 - x_0 \\neq 0$ (since $x_0 \\neq x_1$), so the system has a\nunique solution:\n\n$$t = \\frac{y_1 - y_0}{x_1 - x_0}, \\qquad s = y_0 - t \\, x_0.$$"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-step3",
   "metadata": {},
   "source": "**Step 3: Verify the separation.**\n\nSubstituting back:\n\n$$\\sigma(s + t x_0) = \\sigma(y_0) = \\varepsilon/2 < \\varepsilon \\;\\checkmark$$\n\n$$\\sigma(s + t x_1) = \\sigma(y_1) = 1 - \\varepsilon/2 > 1 - \\varepsilon \\;\\checkmark$$\n\nThis establishes the first claim of the lemma."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-step4",
   "metadata": {},
   "source": "**Step 4: Extend to intervals (the monotonicity argument).**\n\nSuppose in addition that $x_0 < x_1$ and $\\varepsilon < 1/2$. Then:\n\n$$\\sigma(s + t x_0) < \\varepsilon < \\tfrac{1}{2} < 1 - \\varepsilon < \\sigma(s + t x_1).$$\n\nSince $\\sigma$ is increasing and $s + tx$ is affine (hence monotone), the\ncomposition $x \\mapsto \\sigma(s + tx)$ must be **increasing** (if it were\ndecreasing, we would have $\\sigma(s+tx_0) > \\sigma(s+tx_1)$, contradicting the\ninequality above). Therefore:\n\n- For all $x \\leq x_0$: $\\;\\sigma(s+tx) \\leq \\sigma(s+tx_0) < \\varepsilon$.\n- For all $x \\geq x_1$: $\\;\\sigma(s+tx) \\geq \\sigma(s+tx_1) > 1-\\varepsilon$. $\\;\\square$"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-numerical-header",
   "metadata": {},
   "source": "### Numerical verification"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-lemma31-numerical",
   "metadata": {
    "tags": ["hide-input"]
   },
   "outputs": [],
   "source": "import numpy as np\nimport matplotlib.pyplot as plt\n\ndef sigma(x):\n    \"\"\"Standard sigmoid, numerically stable.\"\"\"\n    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))\n\ndef sigma_inv(y):\n    \"\"\"Inverse of sigmoid: log(y/(1-y)).\"\"\"\n    return np.log(y / (1 - y))\n\n# Example: x0=1, x1=3, eps=0.05\nx0, x1, eps = 1.0, 3.0, 0.05\n\ny0 = sigma_inv(eps / 2)\ny1 = sigma_inv(1 - eps / 2)\n\n# Solve: s + t*x0 = y0, s + t*x1 = y1\nt_val = (y1 - y0) / (x1 - x0)\ns_val = y0 - t_val * x0\n\nprint(f\"Parameters: x\\u2080={x0}, x\\u2081={x1}, \\u03b5={eps}\")\nprint(f\"IVT targets: y\\u2080=\\u03c3\\u207b\\u00b9({eps/2})={y0:.4f}, y\\u2081=\\u03c3\\u207b\\u00b9({1-eps/2})={y1:.4f}\")\nprint(f\"Solution: s={s_val:.4f}, t={t_val:.4f}\")\nprint()\nval0 = sigma(s_val + t_val * x0)\nval1 = sigma(s_val + t_val * x1)\nprint(f\"\\u03c3(s+t\\u00b7x\\u2080) = \\u03c3({s_val + t_val*x0:.4f}) = {val0:.6f} < {eps} \\u2713\" if val0 < eps else f\"\\u2717\")\nprint(f\"\\u03c3(s+t\\u00b7x\\u2081) = \\u03c3({s_val + t_val*x1:.4f}) = {val1:.6f} > {1-eps} \\u2713\" if val1 > 1-eps else f\"\\u2717\")"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-visual-header",
   "metadata": {},
   "source": "### Visual proof"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-lemma31-visual",
   "metadata": {
    "tags": ["hide-input"]
   },
   "outputs": [],
   "source": "# ============================================================\n# Visual proof for Lemma 3.1\n# ============================================================\n\nACCENT = '#4f46e5'\nTHM    = '#059669'\nWARN   = '#d97706'\nDANGER = '#dc2626'\n\nfig, ax = plt.subplots(figsize=(9, 5))\n\n# Plot sigma(s + tx) over a wide range\nx_range = np.linspace(x0 - 3, x1 + 3, 1000)\ny_range = sigma(s_val + t_val * x_range)\n\nax.plot(x_range, y_range, color=ACCENT, linewidth=2.5, label=r'$\\sigma(s + tx)$', zorder=5)\n\n# Vertical lines at x0 and x1\nax.axvline(x0, color=THM, linestyle='--', linewidth=1.2, alpha=0.8, label=f'$x_0 = {x0}$')\nax.axvline(x1, color=DANGER, linestyle='--', linewidth=1.2, alpha=0.8, label=f'$x_1 = {x1}$')\n\n# Horizontal epsilon-bands\nax.axhspan(0, eps, alpha=0.12, color=THM, zorder=1)\nax.axhspan(1 - eps, 1, alpha=0.12, color=DANGER, zorder=1)\nax.axhline(eps, color=THM, linestyle=':', linewidth=1, alpha=0.6)\nax.axhline(1 - eps, color=DANGER, linestyle=':', linewidth=1, alpha=0.6)\n\n# Shade regions where sigma(s+tx) < eps  and  sigma(s+tx) > 1-eps\nmask_low = y_range < eps\nmask_high = y_range > 1 - eps\nax.fill_between(x_range, 0, y_range, where=mask_low, alpha=0.15, color=THM, zorder=2)\nax.fill_between(x_range, y_range, 1, where=mask_high, alpha=0.15, color=DANGER, zorder=2)\n\n# Highlight points (x0, sigma(s+tx0)) and (x1, sigma(s+tx1))\nax.plot(x0, sigma(s_val + t_val * x0), 'o', color=THM, markersize=10, zorder=6,\n        markeredgecolor='white', markeredgewidth=1.5)\nax.plot(x1, sigma(s_val + t_val * x1), 'o', color=DANGER, markersize=10, zorder=6,\n        markeredgecolor='white', markeredgewidth=1.5)\n\n# Annotations\nax.annotate(f'$\\\\sigma(s+tx_0) = {sigma(s_val+t_val*x0):.3f}$',\n            xy=(x0, sigma(s_val + t_val * x0)),\n            xytext=(x0 - 1.5, 0.25), fontsize=10,\n            arrowprops=dict(arrowstyle='->', color=THM, lw=1.5),\n            color=THM, fontweight='bold')\nax.annotate(f'$\\\\sigma(s+tx_1) = {sigma(s_val+t_val*x1):.3f}$',\n            xy=(x1, sigma(s_val + t_val * x1)),\n            xytext=(x1 + 0.5, 0.7), fontsize=10,\n            arrowprops=dict(arrowstyle='->', color=DANGER, lw=1.5),\n            color=DANGER, fontweight='bold')\n\n# Labels\nax.text(x0 - 2.5, eps / 2, f'$< \\\\varepsilon = {eps}$', fontsize=10, color=THM,\n        va='center', fontweight='bold')\nax.text(x1 + 1.5, 1 - eps / 2, f'$> 1-\\\\varepsilon = {1-eps}$', fontsize=10, color=DANGER,\n        va='center', fontweight='bold')\n\nax.set_xlabel('$x$', fontsize=13)\nax.set_ylabel(r'$\\sigma(s + tx)$', fontsize=13)\nax.set_title('Lemma 3.1: Point-Point Separation', fontsize=14, fontweight='bold')\nax.set_ylim(-0.05, 1.08)\nax.legend(loc='center left', fontsize=10, framealpha=0.9)\nax.grid(True, alpha=0.3)\n\nfig.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma31-tryit",
   "metadata": {},
   "source": "**Try it yourself** &rarr; [Squashing Function Lab](../applets/squashing-lab.html) --- drag $x_0$ and $x_1$, adjust $\\varepsilon$, and watch the linear system update in real time."
  },
  {
   "cell_type": "markdown",
   "id": "cell-sec2-header",
   "metadata": {},
   "source": "---\n\n## Section 2: Lemma 3.2 --- Point-Set Separation"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-statement",
   "metadata": {},
   "source": "```{admonition} Lemma 3.2 (Point-Set Separation)\n:class: important\n\nLet $B \\subset K$ be a closed set, and $\\boldsymbol{x}_0 \\in K \\setminus B$.\nFor each $\\varepsilon > 0$ there exists $g \\in \\mathcal{N}_2$ such that\n\n$$g > 1 - \\varepsilon \\;\\text{ on } B \\qquad\\text{and}\\qquad g(\\boldsymbol{x}_0) < \\varepsilon.$$\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-intuition",
   "metadata": {},
   "source": "### Intuition\n\nCover $B$ with finitely many \"alarm bells\" --- each rings near its target\n$\\boldsymbol{b} \\in B$ but stays quiet at $\\boldsymbol{x}_0$. Sum them up: the\ntotal alarm is loud on all of $B$ (at least one bell rings for each point)\nbut quiet at $\\boldsymbol{x}_0$ (each bell contributes only a tiny amount there)."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-proof-header",
   "metadata": {},
   "source": "### Proof walkthrough"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-step1",
   "metadata": {},
   "source": "**Step 1: Separate each $\\boldsymbol{b}$ from $\\boldsymbol{x}_0$ individually.**\n\nWLOG assume $0 < \\varepsilon < 1/3$.\n\nFor each $\\boldsymbol{b} \\in B$, since $\\boldsymbol{b} \\neq \\boldsymbol{x}_0$, they\ndiffer in at least one coordinate. Define $f_{\\boldsymbol{b}}$ as a suitable\nfunction from $\\mathcal{N}_1$ (an affine function composed with $\\sigma$, using\nLemma 3.1) such that\n\n$$f_{\\boldsymbol{b}}(\\boldsymbol{x}_0) < \\varepsilon/2 \\qquad\\text{and}\\qquad f_{\\boldsymbol{b}}(\\boldsymbol{b}) > 1 - \\varepsilon/2.$$\n\n```{admonition} Auxiliary: constructing $f_{\\boldsymbol{b}}$\n:class: dropdown tip\n\nSince $\\boldsymbol{b} \\neq \\boldsymbol{x}_0$ in $\\mathbb{R}^n$, there exists an\nindex $i$ with $b_i \\neq (x_0)_i$. The coordinate projection $\\pi_i(\\boldsymbol{x}) = x_i$\nis an affine function, so $\\pi_i \\in \\mathcal{N}_1$, and it maps $\\boldsymbol{b}$\nand $\\boldsymbol{x}_0$ to distinct real numbers $b_i$ and $(x_0)_i$.\n\nNow apply Lemma 3.1 to these two real numbers: find $s, t$ such that\n$\\sigma(s + t \\cdot (x_0)_i) < \\varepsilon/2$ and $\\sigma(s + t \\cdot b_i) > 1 - \\varepsilon/2$.\nThen $f_{\\boldsymbol{b}}(\\boldsymbol{x}) = \\sigma(s + t \\, x_i)$ is the composition\n$\\sigma \\circ (s + t \\, \\pi_i) \\in \\mathcal{N}_1^\\sigma$, which is what we need.\n\n(Note: the paper writes $f_{\\boldsymbol{b}} \\in \\mathcal{N}_1$ for the *pre-activation*\nfunction, and the post-activation version $\\sigma \\circ f_{\\boldsymbol{b}}$ does the\nseparation. We keep this convention.)\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-step2",
   "metadata": {},
   "source": "**Step 2: Build an open cover of $B$.**\n\nDefine\n\n$$U_{\\boldsymbol{b}} = \\{\\boldsymbol{x} \\in K : f_{\\boldsymbol{b}}(\\boldsymbol{x}) > 1 - \\varepsilon\\}.$$\n\nSince $f_{\\boldsymbol{b}}$ is continuous, $U_{\\boldsymbol{b}}$ is open.\nAnd $\\boldsymbol{b} \\in U_{\\boldsymbol{b}}$ because\n$f_{\\boldsymbol{b}}(\\boldsymbol{b}) > 1 - \\varepsilon/2 > 1 - \\varepsilon$."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-step3",
   "metadata": {},
   "source": "**Step 3: Extract a finite subcover by compactness.**\n\n$\\{U_{\\boldsymbol{b}}\\}_{\\boldsymbol{b} \\in B}$ is an open cover of $B$. Since $B$\nis closed in the compact set $K$, $B$ itself is compact.\n\n```{admonition} Key: Why compactness matters\n:class: dropdown warning\n\nCompactness is **essential** here. $B$ might have infinitely many points, each\nneeding its own alarm bell. Without compactness, we would need infinitely many\nalarm bells --- and the sum might diverge or fail to stay below $\\varepsilon$\nat $\\boldsymbol{x}_0$. Compactness guarantees a **finite** subcover, which is\nwhat makes the epsilon-management in Step 5 work.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-step4",
   "metadata": {},
   "source": "**Step 4: Finite subcover.**\n\nExtract a finite subcover:\n$B \\subset U_{\\boldsymbol{b}_1} \\cup \\cdots \\cup U_{\\boldsymbol{b}_N}$."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-step5",
   "metadata": {},
   "source": "**Step 5: Sharpen with Lemma 3.1 and the $\\varepsilon/N$ trick.**\n\nApply Lemma 3.1 to find $s, t \\in \\mathbb{R}$ such that\n\n$$\\sigma(s + tx) < \\varepsilon/N \\;\\text{ on } (-\\infty, \\varepsilon) \\qquad\\text{and}\\qquad \\sigma(s + tx) > 1 - \\varepsilon \\;\\text{ on } (1-\\varepsilon, \\infty).$$\n\nDefine $F_j = \\sigma(s + t \\cdot f_{\\boldsymbol{b}_j}) \\in \\mathcal{N}_1^\\sigma$\nfor each $1 \\leq j \\leq N$.\n\n```{admonition} Auxiliary: Why $\\varepsilon/N$?\n:class: dropdown tip\n\nEach $F_j$ contributes at most $\\varepsilon/N$ to the sum at $\\boldsymbol{x}_0$\n(because $f_{\\boldsymbol{b}_j}(\\boldsymbol{x}_0) < \\varepsilon/2 < \\varepsilon$, so\nthe input to $\\sigma(s + t \\cdot \\ldots)$ falls in $(-\\infty, \\varepsilon)$).\nWith $N$ terms:\n\n$$g(\\boldsymbol{x}_0) = \\sum_{j=1}^N F_j(\\boldsymbol{x}_0) < N \\cdot \\frac{\\varepsilon}{N} = \\varepsilon.$$\n\nThis is the **epsilon-management trick**: divide the budget equally among the\nfinitely many cover elements.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-step6",
   "metadata": {},
   "source": "**Step 6: Sum and verify.**\n\nLet $g = \\sum_{j=1}^N F_j \\in \\mathcal{N}_2$. Then:\n\n- **At $\\boldsymbol{x}_0$:** $g(\\boldsymbol{x}_0) = \\sum_{j=1}^N F_j(\\boldsymbol{x}_0) < N \\cdot (\\varepsilon/N) = \\varepsilon$. $\\;\\checkmark$\n\n- **On $B$:** For any $\\boldsymbol{b} \\in B$, there exists some $k$ with\n  $\\boldsymbol{b} \\in U_{\\boldsymbol{b}_k}$, so\n  $f_{\\boldsymbol{b}_k}(\\boldsymbol{b}) > 1 - \\varepsilon$, hence\n  $F_k(\\boldsymbol{b}) = \\sigma(s + t \\cdot f_{\\boldsymbol{b}_k}(\\boldsymbol{b})) > 1 - \\varepsilon$,\n  hence $g(\\boldsymbol{b}) \\geq F_k(\\boldsymbol{b}) > 1 - \\varepsilon$. $\\;\\checkmark\\;\\square$"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-layer-counting",
   "metadata": {},
   "source": "### Layer counting\n\nWhy is $g \\in \\mathcal{N}_2$?\n\n- Each $f_{\\boldsymbol{b}_j} \\in \\mathcal{N}_1$ (affine function of coordinates).\n- Each $F_j = \\sigma(s + t \\cdot f_{\\boldsymbol{b}_j})$. Since $s + t \\cdot f_{\\boldsymbol{b}_j} \\in \\mathcal{N}_1$ (affine combination of affine functions), we have $F_j \\in \\mathcal{N}_1^\\sigma$.\n- The sum $g = \\sum F_j$ is an affine combination of elements of $\\mathcal{N}_1^\\sigma$, which by definition is $\\mathcal{N}_2$."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-numerical-header",
   "metadata": {},
   "source": "### Numerical demonstration"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-lemma32-numerical",
   "metadata": {
    "tags": ["hide-input"]
   },
   "outputs": [],
   "source": "# ============================================================\n# Lemma 3.2: Point-Set Separation in K = [0,1]^2\n# B = disk of radius 0.3 centered at (0.5, 0.5)\n# x0 = (0.9, 0.9)\n# eps = 0.1\n# ============================================================\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\ndef sigma(x):\n    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))\n\ndef sigma_inv(y):\n    return np.log(y / (1 - y))\n\nACCENT = '#4f46e5'\nTHM    = '#059669'\nWARN   = '#d97706'\nDANGER = '#dc2626'\n\n# Parameters\ncenter_B = np.array([0.5, 0.5])\nradius_B = 0.3\nx0_pt = np.array([0.9, 0.9])\neps = 0.1\n\n# Choose N=8 evenly-spaced points on B's boundary\nN_cover = 8\nangles = np.linspace(0, 2 * np.pi, N_cover, endpoint=False)\nb_points = np.column_stack([\n    center_B[0] + radius_B * np.cos(angles),\n    center_B[1] + radius_B * np.sin(angles)\n])\n\n# For each b_j, find the coordinate i where |b_j[i] - x0[i]| is maximized\n# Then use Lemma 3.1 on that coordinate\ndef build_separator(b_pt, x0_pt, eps_target):\n    \"\"\"Build f_b using the coordinate that differs most from x0.\"\"\"\n    diffs = np.abs(b_pt - x0_pt)\n    i = np.argmax(diffs)  # coordinate index\n    \n    # Apply Lemma 3.1 to separate b_pt[i] from x0_pt[i]\n    val_x0 = x0_pt[i]\n    val_b = b_pt[i]\n    \n    y0_target = sigma_inv(eps_target / 2)\n    y1_target = sigma_inv(1 - eps_target / 2)\n    \n    # We want sigma(s + t * val_x0) < eps_target  and  sigma(s + t * val_b) > 1-eps_target\n    # So: s + t * val_x0 = y0_target,  s + t * val_b = y1_target\n    if abs(val_b - val_x0) < 1e-12:\n        # Fallback: use the other coordinate\n        i = 1 - i\n        val_x0 = x0_pt[i]\n        val_b = b_pt[i]\n    \n    t_sep = (y1_target - y0_target) / (val_b - val_x0)\n    s_sep = y0_target - t_sep * val_x0\n    \n    return i, s_sep, t_sep\n\n# Build all separators\nseparators = [build_separator(b_points[j], x0_pt, eps) for j in range(N_cover)]\n\n# Apply Lemma 3.1 for sharpening: sigma(s'+t'x) < eps/N on (-inf, eps)\n# and sigma(s'+t'x) > 1-eps on (1-eps, inf)\neps_sharp = eps / N_cover\ny0_sharp = sigma_inv(eps_sharp / 2)   # target: sigma = eps/(2N)\ny1_sharp = sigma_inv(1 - eps / 2)     # target: sigma = 1-eps/2\n\n# Solve for the sharpening affine map: s' + t'*eps = y0_sharp, s' + t'*(1-eps) = y1_sharp\n# Actually: we want separation at points eps and 1-eps on the real line\nval_low = eps       # values of f_bj at x0 are below this\nval_high = 1 - eps  # values of f_bj at b are above this\nt_sharp = (y1_sharp - y0_sharp) / (val_high - val_low)\ns_sharp = y0_sharp - t_sharp * val_low\n\n# Evaluate g on a grid\nresolution = 200\nxx = np.linspace(0, 1, resolution)\nyy = np.linspace(0, 1, resolution)\nXX, YY = np.meshgrid(xx, yy)\npoints = np.column_stack([XX.ravel(), YY.ravel()])\n\ng_total = np.zeros(resolution * resolution)\n\nfor j in range(N_cover):\n    coord_idx, s_sep, t_sep = separators[j]\n    # f_bj(x) = sigma(s_sep + t_sep * x[coord_idx])  (from Lemma 3.1)\n    f_bj = sigma(s_sep + t_sep * points[:, coord_idx])\n    # F_j(x) = sigma(s_sharp + t_sharp * f_bj(x))    (sharpening)\n    F_j = sigma(s_sharp + t_sharp * f_bj)\n    g_total += F_j\n\nG = g_total.reshape(resolution, resolution)\n\n# Verify bounds\n# Check g(x0)\ng_at_x0 = 0\nfor j in range(N_cover):\n    coord_idx, s_sep, t_sep = separators[j]\n    f_val = sigma(s_sep + t_sep * x0_pt[coord_idx])\n    F_val = sigma(s_sharp + t_sharp * f_val)\n    g_at_x0 += F_val\n\nprint(f\"Point-Set Separation (Lemma 3.2)\")\nprint(f\"================================\")\nprint(f\"B = disk of radius {radius_B} at {tuple(center_B)}\")\nprint(f\"x\\u2080 = {tuple(x0_pt)}, \\u03b5 = {eps}, N = {N_cover}\")\nprint(f\"\")\nprint(f\"g(x\\u2080) = {g_at_x0:.6f} < {eps} \\u2713\" if g_at_x0 < eps else f\"g(x\\u2080) = {g_at_x0:.6f} \\u2717\")\n\n# Check on B: sample points on the boundary and interior\ntest_angles = np.linspace(0, 2*np.pi, 100)\ntest_radii = np.linspace(0, radius_B, 10)\nmin_on_B = np.inf\nfor r in test_radii:\n    for a in test_angles:\n        pt = center_B + r * np.array([np.cos(a), np.sin(a)])\n        if 0 <= pt[0] <= 1 and 0 <= pt[1] <= 1:\n            g_val = 0\n            for j in range(N_cover):\n                coord_idx, s_sep, t_sep = separators[j]\n                f_val = sigma(s_sep + t_sep * pt[coord_idx])\n                F_val = sigma(s_sharp + t_sharp * f_val)\n                g_val += F_val\n            min_on_B = min(min_on_B, g_val)\n\nprint(f\"min(g on B) = {min_on_B:.6f} > {1-eps} \\u2713\" if min_on_B > 1-eps else f\"min(g on B) = {min_on_B:.6f} \\u2717\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-lemma32-heatmap",
   "metadata": {
    "tags": ["hide-input"]
   },
   "outputs": [],
   "source": "# ============================================================\n# Heatmap of g over K = [0,1]^2\n# ============================================================\n\nfig, ax = plt.subplots(figsize=(7, 6))\n\nim = ax.imshow(G, extent=[0, 1, 0, 1], origin='lower', cmap='RdYlGn',\n               vmin=0, vmax=max(N_cover * 0.5, G.max()), aspect='equal')\n\n# Draw the disk B\ntheta_circle = np.linspace(0, 2 * np.pi, 200)\ncircle_x = center_B[0] + radius_B * np.cos(theta_circle)\ncircle_y = center_B[1] + radius_B * np.sin(theta_circle)\nax.plot(circle_x, circle_y, color='white', linewidth=2.5, linestyle='-', zorder=4)\nax.plot(circle_x, circle_y, color=DANGER, linewidth=1.5, linestyle='-', zorder=5,\n        label='$B$ (boundary)')\n\n# Mark the point x0\nax.plot(x0_pt[0], x0_pt[1], 'o', color=ACCENT, markersize=12, zorder=6,\n        markeredgecolor='white', markeredgewidth=2, label=f'$\\\\mathbf{{x}}_0 = ({x0_pt[0]}, {x0_pt[1]})$')\n\n# Mark the cover points b_j\nfor j in range(N_cover):\n    label = '$\\\\mathbf{b}_j$ (cover points)' if j == 0 else None\n    ax.plot(b_points[j, 0], b_points[j, 1], 's', color=WARN, markersize=7, zorder=6,\n            markeredgecolor='white', markeredgewidth=1, label=label)\n\n# Contour lines at eps and 1-eps\ncontour = ax.contour(XX, YY, G, levels=[eps, 1-eps], colors=['white', 'white'],\n                     linewidths=[1.5, 1.5], linestyles=['--', '-'], zorder=4)\nax.clabel(contour, fmt={eps: f'$g = \\\\varepsilon$', 1-eps: f'$g = 1-\\\\varepsilon$'},\n          fontsize=9)\n\ncbar = fig.colorbar(im, ax=ax, shrink=0.85, label='$g(\\\\mathbf{x})$')\nax.set_xlabel('$x_1$', fontsize=12)\nax.set_ylabel('$x_2$', fontsize=12)\nax.set_title('Lemma 3.2: Point-Set Separation\\n'\n             f'$g(\\\\mathbf{{x}}_0) = {g_at_x0:.4f} < \\\\varepsilon = {eps}$, '\n             f'$\\\\min_B g = {min_on_B:.4f} > 1 - \\\\varepsilon = {1-eps}$',\n             fontsize=12, fontweight='bold')\nax.legend(loc='lower left', fontsize=9, framealpha=0.9)\nax.set_xlim(0, 1)\nax.set_ylim(0, 1)\n\nfig.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma32-tryit",
   "metadata": {},
   "source": "**Try it yourself** &rarr; [Point-Set Separator](../applets/point-set-separator.html)"
  },
  {
   "cell_type": "markdown",
   "id": "cell-sec3-header",
   "metadata": {},
   "source": "---\n\n## Section 3: Lemma 3.3 --- Set-Set Separation"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-statement",
   "metadata": {},
   "source": "```{admonition} Lemma 3.3 (Set-Set Separation)\n:class: important\n\nLet $A$ and $B$ be disjoint closed subsets of $K$. Then for each $\\varepsilon > 0$:\n\n**(i)** there exists $h \\in \\mathcal{N}_3$ such that $h < \\varepsilon$ on $B$ and $h > 1 - \\varepsilon$ on $A$;\n\n**(ii)** there exists $H \\in \\mathcal{N}_3^\\sigma$ such that $0 \\leq H < \\varepsilon$ on $B$ and $1 - \\varepsilon < H \\leq 1$ on $A$.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-intuition",
   "metadata": {},
   "source": "### Intuition\n\nFor each point $\\boldsymbol{a} \\in A$, Lemma 3.2 gives a \"spotlight\" --- bright\nat $\\boldsymbol{a}$ and dark on $B$. Cover $A$ with finitely many spotlights,\nsum them up, then squash with one more application of $\\sigma$: done.\n\nThe structure is *exactly parallel* to Lemma 3.2, but one level higher:\n\n| Lemma 3.2 | Lemma 3.3 |\n|:---|:---|\n| Separate each $\\boldsymbol{b}$ from $\\boldsymbol{x}_0$ | Separate each $\\boldsymbol{a}$ from $B$ |\n| Using $\\mathcal{N}_1^\\sigma$ elements | Using $\\mathcal{N}_2$ elements |\n| Cover $B$ | Cover $A$ |\n| Sum gives $g \\in \\mathcal{N}_2$ | Sum gives $h \\in \\mathcal{N}_3$ |"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-proof-header",
   "metadata": {},
   "source": "### Proof walkthrough"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-step1",
   "metadata": {},
   "source": "**Step 1: Separate each $\\boldsymbol{a}$ from $B$ using Lemma 3.2.**\n\nWLOG assume $\\varepsilon \\in (0, 1/3)$.\n\nFor each $\\boldsymbol{a} \\in A$, apply Lemma 3.2 (with the roles swapped: separate\nthe single point $\\boldsymbol{a}$ from the closed set $B$) to obtain\n$\\widetilde{g}_{\\boldsymbol{a}} \\in \\mathcal{N}_2$ with\n\n$$\\widetilde{g}_{\\boldsymbol{a}} > 1 - \\varepsilon/2 \\;\\text{ on } B \\qquad\\text{and}\\qquad \\widetilde{g}_{\\boldsymbol{a}}(\\boldsymbol{a}) < \\varepsilon/2.$$"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-step2",
   "metadata": {},
   "source": "**Step 2: Flip to get the right orientation.**\n\nSet $g_{\\boldsymbol{a}} = 1 - \\widetilde{g}_{\\boldsymbol{a}}$. Then $g_{\\boldsymbol{a}} \\in \\mathcal{N}_2$ and:\n\n$$g_{\\boldsymbol{a}} < \\varepsilon/2 \\;\\text{ on } B \\qquad\\text{and}\\qquad g_{\\boldsymbol{a}}(\\boldsymbol{a}) > 1 - \\varepsilon/2.$$\n\nNow $g_{\\boldsymbol{a}}$ is **high at $\\boldsymbol{a}$** and **low on $B$** --- the\nopposite orientation from $\\widetilde{g}_{\\boldsymbol{a}}$.\n\n```{admonition} Auxiliary: Why does flipping preserve $\\mathcal{N}_2$?\n:class: dropdown tip\n\nIf $\\widetilde{g} \\in \\mathcal{N}_2$ then\n$\\widetilde{g} = a_0 + \\sum a_j \\sigma(f_j)$ for some coefficients $a_j$ and\nfunctions $f_j \\in \\mathcal{N}_1$. Then\n\n$$1 - \\widetilde{g} = (1 - a_0) + \\sum (-a_j) \\sigma(f_j),$$\n\nwhich is also an affine combination of elements of $\\mathcal{N}_1^\\sigma$,\nhence in $\\mathcal{N}_2$. In general, $\\mathcal{N}_k$ is closed under\naffine combinations.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-step3",
   "metadata": {},
   "source": "**Step 3: Build an open cover of $A$.**\n\nDefine\n\n$$U_{\\boldsymbol{a}} = \\{\\boldsymbol{x} \\in K : g_{\\boldsymbol{a}}(\\boldsymbol{x}) > 1 - \\varepsilon\\}.$$\n\nSince $g_{\\boldsymbol{a}}$ is continuous, $U_{\\boldsymbol{a}}$ is open.\nAnd $\\boldsymbol{a} \\in U_{\\boldsymbol{a}}$ because\n$g_{\\boldsymbol{a}}(\\boldsymbol{a}) > 1 - \\varepsilon/2 > 1 - \\varepsilon$.\nSo $\\{U_{\\boldsymbol{a}}\\}_{\\boldsymbol{a} \\in A}$ is an open cover of the compact\nset $A$.\n\nBy compactness, extract a finite subcover:\n$A \\subset U_{\\boldsymbol{a}_1} \\cup \\cdots \\cup U_{\\boldsymbol{a}_N}$."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-step4",
   "metadata": {},
   "source": "**Step 4: Sharpen with Lemma 3.1.**\n\nUse Lemma 3.1 to find $s, t \\in \\mathbb{R}$ with\n\n$$\\sigma(s + tx) < \\varepsilon/N \\;\\text{ on } (-\\infty, \\varepsilon) \\qquad\\text{and}\\qquad \\sigma(s + tx) > 1 - \\varepsilon \\;\\text{ on } (1-\\varepsilon, \\infty).$$"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-step5",
   "metadata": {},
   "source": "**Step 5: Sum to get $h \\in \\mathcal{N}_3$.**\n\nDefine\n\n$$h = \\sum_{j=1}^N \\sigma(s + t \\, g_{\\boldsymbol{a}_j}) \\;\\in \\mathcal{N}_3.$$\n\n```{admonition} Layer counting\n:class: dropdown note\n\nEach $g_{\\boldsymbol{a}_j} \\in \\mathcal{N}_2$, so\n$s + t \\, g_{\\boldsymbol{a}_j} \\in \\mathcal{N}_2$ (affine combination of an\n$\\mathcal{N}_2$ element). Therefore\n$\\sigma(s + t \\, g_{\\boldsymbol{a}_j}) \\in \\mathcal{N}_2^\\sigma$.\n\nThe sum $h = \\sum \\sigma(s + t \\, g_{\\boldsymbol{a}_j})$ is an affine combination\nof elements of $\\mathcal{N}_2^\\sigma$, which by definition is $\\mathcal{N}_3$.\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-step6",
   "metadata": {},
   "source": "**Step 6: Verify the bounds.**\n\n- **On $A$:** If $\\boldsymbol{a} \\in A$, then $\\boldsymbol{a} \\in U_{\\boldsymbol{a}_k}$\n  for some $k$, so $g_{\\boldsymbol{a}_k}(\\boldsymbol{a}) > 1 - \\varepsilon$, hence\n  $\\sigma(s + t \\, g_{\\boldsymbol{a}_k}(\\boldsymbol{a})) > 1 - \\varepsilon$, hence\n  $h(\\boldsymbol{a}) \\geq \\sigma(s + t \\, g_{\\boldsymbol{a}_k}(\\boldsymbol{a})) > 1 - \\varepsilon$. $\\;\\checkmark$\n\n- **On $B$:** If $\\boldsymbol{b} \\in B$, then $g_{\\boldsymbol{a}_j}(\\boldsymbol{b}) < \\varepsilon/2 < \\varepsilon$\n  for all $j$, so each $\\sigma(s + t \\, g_{\\boldsymbol{a}_j}(\\boldsymbol{b})) < \\varepsilon/N$,\n  hence $h(\\boldsymbol{b}) < N \\cdot (\\varepsilon/N) = \\varepsilon$. $\\;\\checkmark$\n\nThis proves part **(i)**."
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-step7",
   "metadata": {},
   "source": "**Step 7: Part (ii) --- one more squash.**\n\nApply Lemma 3.1 once more to find $s', t' \\in \\mathbb{R}$ with\n\n$$\\sigma(s' + t'x) < \\varepsilon \\;\\text{ on } (-\\infty, \\varepsilon) \\qquad\\text{and}\\qquad \\sigma(s' + t'x) > 1 - \\varepsilon \\;\\text{ on } (1-\\varepsilon, \\infty).$$\n\nSet $H = \\sigma(s' + t' h) \\in \\mathcal{N}_3^\\sigma$. Since $\\sigma$ is increasing\nand $0 \\leq \\sigma \\leq 1$:\n\n- On $B$: $h < \\varepsilon$, so $H = \\sigma(s' + t'h) < \\varepsilon$ and $H \\geq 0$. $\\;\\checkmark$\n- On $A$: $h > 1 - \\varepsilon$, so $H = \\sigma(s' + t'h) > 1 - \\varepsilon$ and $H \\leq 1$. $\\;\\checkmark\\;\\square$"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-epsilon-cascade",
   "metadata": {},
   "source": "### The epsilon cascade\n\nHere is how the epsilons propagate through the construction:\n\n```\nTarget: \u03b5\n\u2502\n\u251c\u2500\u2500 Lemma 3.3, Step 1: call Lemma 3.2 with \u03b5/2\n\u2502   \u2502\n\u2502   \u251c\u2500\u2500 Lemma 3.2, Step 1: call Lemma 3.1 with \u03b5/2\n\u2502   \u2502   \u2514\u2500\u2500 IVT targets: \u03c3(y\u2080) = \u03b5/4,  \u03c3(y\u2081) = 1 - \u03b5/4\n\u2502   \u2502\n\u2502   \u2514\u2500\u2500 Lemma 3.2, Step 5: sharpen with \u03b5/N\u2081 (N\u2081 = cover size for B)\n\u2502\n\u251c\u2500\u2500 Lemma 3.3, Step 2: flip g\u0303 \u2192 g = 1 - g\u0303\n\u2502\n\u251c\u2500\u2500 Lemma 3.3, Step 4: sharpen with \u03b5/N\u2082 (N\u2082 = cover size for A)\n\u2502\n\u2514\u2500\u2500 Lemma 3.3, Step 7: one more squash for part (ii)\n    \u2514\u2500\u2500 Final H \u2208 N\u2083\u1d5e with 0 \u2264 H < \u03b5 on B, 1-\u03b5 < H \u2264 1 on A\n```"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-numerical-header",
   "metadata": {},
   "source": "### Numerical demonstration"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-lemma33-numerical",
   "metadata": {
    "tags": ["hide-input"]
   },
   "outputs": [],
   "source": "# ============================================================\n# Lemma 3.3: Set-Set Separation in K = [0,1]^2\n# A = rectangle [0, 0.35] x [0.2, 0.8]\n# B = rectangle [0.65, 1] x [0.2, 0.8]\n# eps = 0.15\n# ============================================================\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.patches import Rectangle\n\ndef sigma(x):\n    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))\n\ndef sigma_inv(y):\n    return np.log(y / (1 - y))\n\nACCENT = '#4f46e5'\nTHM    = '#059669'\nWARN   = '#d97706'\nDANGER = '#dc2626'\n\n# Sets\nA_box = [0.0, 0.35, 0.2, 0.8]   # [x_min, x_max, y_min, y_max]\nB_box = [0.65, 1.0, 0.2, 0.8]\neps33 = 0.15\n\ndef in_box(pt, box):\n    return box[0] <= pt[0] <= box[1] and box[2] <= pt[1] <= box[3]\n\n# Grid for evaluation\nresolution = 150\nxx = np.linspace(0, 1, resolution)\nyy = np.linspace(0, 1, resolution)\nXX, YY = np.meshgrid(xx, yy)\npoints = np.column_stack([XX.ravel(), YY.ravel()])\n\n# -----------------------------------------------------------\n# Step 1-2: For each a in A, build g_a = 1 - g_tilde_a\n# g_tilde_a > 1 - eps/2 on B,  g_tilde_a(a) < eps/2\n# => g_a < eps/2 on B,  g_a(a) > 1 - eps/2\n#\n# We implement a simplified version:\n# For each a_j, we separate a_j from B using coordinate\n# projections. Since a_j is in A (x <= 0.35) and B has x >= 0.65,\n# the x-coordinate alone separates them.\n# -----------------------------------------------------------\n\n# Choose cover points a_1, ..., a_N along A\nN_A = 6\na_points = []\nfor i in range(3):\n    for j in range(2):\n        ax_val = A_box[0] + (i + 0.5) / 3 * (A_box[1] - A_box[0])\n        ay_val = A_box[2] + (j + 0.5) / 2 * (A_box[3] - A_box[2])\n        a_points.append(np.array([ax_val, ay_val]))\na_points = np.array(a_points)\nN_A = len(a_points)\n\ndef build_g_a(a_pt, eps_inner):\n    \"\"\"\n    Build g_a in N_2 such that g_a(a) > 1-eps_inner/2 and g_a < eps_inner/2 on B.\n    \n    Since A is on the left (x < 0.35) and B is on the right (x > 0.65),\n    we use the x-coordinate to separate.\n    \n    For each point b in B's boundary, we build f_b that is high at b and low at a.\n    But since B is a rectangle, we can cover it with a few well-chosen points.\n    Then we sum sharpened versions.\n    \"\"\"\n    # Simple approach: use Lemma 3.1 on the x-coordinate\n    # We want sigma(s+t*a[0]) < eps_inner/2 and sigma(s+t*b[0]) > 1-eps_inner/2\n    # for all b in B, i.e., b[0] >= 0.65\n    \n    # For g_tilde_a (high on B, low at a):\n    # sigma(s + t * a[0]) < eps_inner/2  at a[0]\n    # sigma(s + t * 0.65) > 1 - eps_inner/2  (leftmost point of B in x)\n    \n    y0_ivt = sigma_inv(eps_inner / 2)\n    y1_ivt = sigma_inv(1 - eps_inner / 2)\n    \n    t_g = (y1_ivt - y0_ivt) / (0.65 - a_pt[0])\n    s_g = y0_ivt - t_g * a_pt[0]\n    \n    return s_g, t_g\n\n# Build g_a for each a_j\n# g_tilde_a(x) = sigma(s_g + t_g * x[0])   -- high on B, low at a\n# g_a(x) = 1 - g_tilde_a(x)                -- low on B, high at a\n\neps_inner = eps33  # using eps for inner calls\n\ng_a_params = [build_g_a(a_points[j], eps_inner) for j in range(N_A)]\n\n# Evaluate individual g_a functions on the grid\ndef eval_g_a(s_g, t_g, pts):\n    \"\"\"g_a = 1 - sigma(s_g + t_g * x[0])\"\"\"\n    g_tilde = sigma(s_g + t_g * pts[:, 0])\n    return 1 - g_tilde\n\n# Step 4: Sharpening for the sum\n# sigma(s' + t'*x) < eps/N on (-inf, eps) and > 1-eps on (1-eps, inf)\neps_sharp_33 = eps33 / N_A\ny0_s = sigma_inv(eps_sharp_33 / 2)\ny1_s = sigma_inv(1 - eps33 / 2)\nval_low_33 = eps33\nval_high_33 = 1 - eps33\nt_sharp_33 = (y1_s - y0_s) / (val_high_33 - val_low_33)\ns_sharp_33 = y0_s - t_sharp_33 * val_low_33\n\n# Step 5: h = sum sigma(s_sharp + t_sharp * g_aj)\nh_total = np.zeros(len(points))\nfor j in range(N_A):\n    s_g, t_g = g_a_params[j]\n    g_a_vals = eval_g_a(s_g, t_g, points)\n    h_total += sigma(s_sharp_33 + t_sharp_33 * g_a_vals)\n\nH_grid = h_total.reshape(resolution, resolution)\n\n# Step 7: H = sigma(s'' + t'' * h) for part (ii)\ny0_final = sigma_inv(eps33 / 2)\ny1_final = sigma_inv(1 - eps33 / 2)\nt_final = (y1_final - y0_final) / (val_high_33 - val_low_33)\ns_final = y0_final - t_final * val_low_33\n\nH_sigma = sigma(s_final + t_final * h_total).reshape(resolution, resolution)\n\n# Verify bounds\n# On A\nmask_A = np.array([in_box(p, A_box) for p in points])\nmask_B = np.array([in_box(p, B_box) for p in points])\n\nh_on_A = h_total[mask_A]\nh_on_B = h_total[mask_B]\nH_on_A = H_sigma.ravel()[mask_A]\nH_on_B = H_sigma.ravel()[mask_B]\n\nprint(f\"Set-Set Separation (Lemma 3.3)\")\nprint(f\"==============================\")\nprint(f\"A = [{A_box[0]}, {A_box[1]}] \\u00d7 [{A_box[2]}, {A_box[3]}]\")\nprint(f\"B = [{B_box[0]}, {B_box[1]}] \\u00d7 [{B_box[2]}, {B_box[3]}]\")\nprint(f\"\\u03b5 = {eps33}, N = {N_A}\")\nprint()\nprint(f\"Part (i): h \\u2208 N\\u2083\")\nprint(f\"  min(h on A) = {h_on_A.min():.4f} > {1-eps33} \\u2713\" if h_on_A.min() > 1-eps33 else f\"  min(h on A) = {h_on_A.min():.4f} \\u2717\")\nprint(f\"  max(h on B) = {h_on_B.max():.4f} < {eps33} \\u2713\" if h_on_B.max() < eps33 else f\"  max(h on B) = {h_on_B.max():.4f} \\u2717\")\nprint()\nprint(f\"Part (ii): H \\u2208 N\\u2083\\u1d5e\")\nprint(f\"  min(H on A) = {H_on_A.min():.4f} > {1-eps33} \\u2713\" if H_on_A.min() > 1-eps33 else f\"  min(H on A) = {H_on_A.min():.4f} \\u2717\")\nprint(f\"  max(H on B) = {H_on_B.max():.4f} < {eps33} \\u2713\" if H_on_B.max() < eps33 else f\"  max(H on B) = {H_on_B.max():.4f} \\u2717\")\nprint(f\"  H range on A: [{H_on_A.min():.4f}, {H_on_A.max():.4f}] \\u2282 ({1-eps33}, 1]\")\nprint(f\"  H range on B: [{H_on_B.min():.4f}, {H_on_B.max():.4f}] \\u2282 [0, {eps33})\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cell-lemma33-heatmaps",
   "metadata": {
    "tags": ["hide-input"]
   },
   "outputs": [],
   "source": "# ============================================================\n# 2x2 subplot grid: individual g_a, h, H\n# ============================================================\n\nfig, axes = plt.subplots(2, 2, figsize=(11, 10))\n\ndef draw_boxes(ax):\n    \"\"\"Draw rectangles A and B on the given axis.\"\"\"\n    rect_A = Rectangle((A_box[0], A_box[2]), A_box[1]-A_box[0], A_box[3]-A_box[2],\n                        linewidth=2, edgecolor=THM, facecolor='none',\n                        linestyle='-', label='$A$', zorder=5)\n    rect_B = Rectangle((B_box[0], B_box[2]), B_box[1]-B_box[0], B_box[3]-B_box[2],\n                        linewidth=2, edgecolor=DANGER, facecolor='none',\n                        linestyle='-', label='$B$', zorder=5)\n    ax.add_patch(rect_A)\n    ax.add_patch(rect_B)\n    ax.set_xlim(0, 1)\n    ax.set_ylim(0, 1)\n    ax.set_aspect('equal')\n\n# --- Panel (a): g_a for a single a ---\nax = axes[0, 0]\ns_g0, t_g0 = g_a_params[0]\ng_a0_vals = eval_g_a(s_g0, t_g0, points).reshape(resolution, resolution)\nim0 = ax.imshow(g_a0_vals, extent=[0,1,0,1], origin='lower', cmap='RdYlGn', vmin=0, vmax=1)\ndraw_boxes(ax)\nax.plot(a_points[0, 0], a_points[0, 1], '*', color=ACCENT, markersize=14,\n        markeredgecolor='white', markeredgewidth=1, zorder=6)\nax.set_title(f'(a) Single $g_{{\\\\mathbf{{a}}_1}}$: high at $\\\\mathbf{{a}}_1$, low on $B$',\n             fontsize=11, fontweight='bold')\nax.legend(loc='upper right', fontsize=9)\nfig.colorbar(im0, ax=ax, shrink=0.8)\n\n# --- Panel (b): g_a for another a ---\nax = axes[0, 1]\ns_g3, t_g3 = g_a_params[3]\ng_a3_vals = eval_g_a(s_g3, t_g3, points).reshape(resolution, resolution)\nim1 = ax.imshow(g_a3_vals, extent=[0,1,0,1], origin='lower', cmap='RdYlGn', vmin=0, vmax=1)\ndraw_boxes(ax)\nax.plot(a_points[3, 0], a_points[3, 1], '*', color=ACCENT, markersize=14,\n        markeredgecolor='white', markeredgewidth=1, zorder=6)\nax.set_title(f'(b) Single $g_{{\\\\mathbf{{a}}_4}}$: high at $\\\\mathbf{{a}}_4$, low on $B$',\n             fontsize=11, fontweight='bold')\nax.legend(loc='upper right', fontsize=9)\nfig.colorbar(im1, ax=ax, shrink=0.8)\n\n# --- Panel (c): h = sum of sharpened g_a ---\nax = axes[1, 0]\nim2 = ax.imshow(H_grid, extent=[0,1,0,1], origin='lower', cmap='RdYlGn',\n                vmin=0, vmax=max(H_grid.max(), 1.5))\ndraw_boxes(ax)\nfor j in range(N_A):\n    ax.plot(a_points[j, 0], a_points[j, 1], '*', color=ACCENT, markersize=10,\n            markeredgecolor='white', markeredgewidth=0.8, zorder=6)\nax.set_title(f'(c) $h = \\\\sum \\\\sigma(s + t \\\\cdot g_{{\\\\mathbf{{a}}_j}}) \\\\in \\\\mathcal{{N}}_3$',\n             fontsize=11, fontweight='bold')\nax.legend(loc='upper right', fontsize=9)\nfig.colorbar(im2, ax=ax, shrink=0.8)\n\n# --- Panel (d): H = sigma(s' + t'h) ---\nax = axes[1, 1]\nim3 = ax.imshow(H_sigma, extent=[0,1,0,1], origin='lower', cmap='RdYlGn', vmin=0, vmax=1)\ndraw_boxes(ax)\ncontour = ax.contour(XX, YY, H_sigma, levels=[eps33, 1-eps33],\n                     colors=['white', 'white'], linewidths=[1.5, 1.5],\n                     linestyles=['--', '-'], zorder=4)\nax.clabel(contour, fmt={eps33: f'$H = \\\\varepsilon$', 1-eps33: f'$H = 1-\\\\varepsilon$'},\n          fontsize=9)\nax.set_title(f'(d) $H = \\\\sigma(s\\' + t\\'h) \\\\in \\\\mathcal{{N}}_3^\\\\sigma$',\n             fontsize=11, fontweight='bold')\nax.legend(loc='upper right', fontsize=9)\nfig.colorbar(im3, ax=ax, shrink=0.8)\n\nfor ax in axes.flat:\n    ax.set_xlabel('$x_1$', fontsize=11)\n    ax.set_ylabel('$x_2$', fontsize=11)\n\nfig.suptitle('Lemma 3.3: Set-Set Separation', fontsize=14, fontweight='bold', y=1.01)\nfig.tight_layout()\nplt.show()"
  },
  {
   "cell_type": "markdown",
   "id": "cell-lemma33-tryit",
   "metadata": {},
   "source": "**Try it yourself** &rarr; [Set-Set Separator](../applets/set-set-separator.html)"
  },
  {
   "cell_type": "markdown",
   "id": "cell-sec-exercises",
   "metadata": {},
   "source": "---\n\n## Exercises"
  },
  {
   "cell_type": "markdown",
   "id": "cell-exercises",
   "metadata": {},
   "source": "**Exercise 2.1.** For Lemma 3.1 with $x_0 = 0$ and $x_1 = 1$, solve the linear system explicitly and express $s$ and $t$ as functions of $\\varepsilon$. What happens to $t$ as $\\varepsilon \\to 0$?\n\n**Exercise 2.2.** Can you replace $\\sigma$ with ReLU in Lemma 3.1? What goes wrong? (Hint: ReLU is not bounded.)\n\n**Exercise 2.3.** In Lemma 3.2, why must $B$ be closed? Construct a counterexample with $B$ open where the conclusion fails.\n\n**Exercise 2.4.** How does the number of neurons in the network from Lemma 3.2 depend on $N$ (the cover size)? Write a formula for the total parameter count.\n\n**Exercise 2.5.** Trace the epsilon bounds through Lemma 3.3 for $\\varepsilon = 0.1$. How many neurons does the network need? (This is a counting exercise.)\n\n**Exercise 2.6.** $\\star$ The paper notes that the proof is \"wasteful.\" Identify specifically where in Lemma 3.3 the argument uses more network capacity than strictly necessary."
  }
 ]
}