Chapter 10: Linear Separability — A Deep Dive#

Making the Concept Precise#

Linear separability is the central concept governing what a single-layer perceptron can and cannot compute. In previous chapters, we encountered it informally through examples like AND (separable) and XOR (not separable). In this chapter, we develop the theory systematically: formal definitions, the convex hull criterion with proof, Cover’s function counting theorem, VC dimension, and the powerful idea of lifting data into higher-dimensional spaces where separability is recovered.

These ideas form the mathematical backbone connecting perceptrons to support vector machines, kernel methods, and the broader theory of statistical learning.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from itertools import combinations

plt.rcParams.update({
    'figure.figsize': (8, 6),
    'font.size': 12,
    'axes.grid': True,
    'grid.alpha': 0.3
})

1. Formal Definition#

Definition. Two finite sets \(A, B \subset \mathbb{R}^n\) are linearly separable if there exist a weight vector \(\mathbf{w} \in \mathbb{R}^n\) and a bias \(b \in \mathbb{R}\) such that:

\[\mathbf{w} \cdot \mathbf{x} + b > 0 \quad \forall \mathbf{x} \in A\]
\[\mathbf{w} \cdot \mathbf{x} + b \leq 0 \quad \forall \mathbf{x} \in B\]

The separating hyperplane is the set:

\[H = \{\mathbf{x} \in \mathbb{R}^n : \mathbf{w} \cdot \mathbf{x} + b = 0\}\]

This is an \((n-1)\)-dimensional affine subspace of \(\mathbb{R}^n\):

  • In \(\mathbb{R}^2\): \(H\) is a line.

  • In \(\mathbb{R}^3\): \(H\) is a plane.

  • In \(\mathbb{R}^n\): \(H\) is a hyperplane.

Geometric Interpretation#

The weight vector \(\mathbf{w}\) is normal (perpendicular) to the hyperplane \(H\). The bias \(b\) controls the offset of the hyperplane from the origin. Points in \(A\) lie on the positive side of \(H\) (where \(\mathbf{w} \cdot \mathbf{x} + b > 0\)), and points in \(B\) lie on the negative side.

Equivalence with Perceptron Computation#

A perceptron with weight vector \(\mathbf{w}\) and bias \(b\) computes:

\[\begin{split}f(\mathbf{x}) = \text{step}(\mathbf{w} \cdot \mathbf{x} + b) = \begin{cases} 1 & \text{if } \mathbf{w} \cdot \mathbf{x} + b \geq 0 \\ 0 & \text{if } \mathbf{w} \cdot \mathbf{x} + b < 0 \end{cases}\end{split}\]

Therefore, a perceptron can classify \(A\) vs. \(B\) correctly if and only if \(A\) and \(B\) are linearly separable.

2. The Convex Hull Criterion#

The most elegant characterization of linear separability uses the concept of convex hulls.

Definition. The convex hull of a finite set \(S = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\} \subset \mathbb{R}^n\) is:

\[\text{conv}(S) = \left\{\sum_{i=1}^{m} \lambda_i \mathbf{x}_i : \lambda_i \geq 0, \; \sum_{i=1}^{m} \lambda_i = 1\right\}\]

It is the smallest convex set containing \(S\), or equivalently, the set of all convex combinations of points in \(S\).

Theorem (Convex Hull Separability)

Two finite sets \(A, B \subset \mathbb{R}^n\) are linearly separable if and only if their convex hulls are disjoint:

\[A, B \text{ linearly separable} \iff \text{conv}(A) \cap \text{conv}(B) = \emptyset\]

This provides a purely geometric characterization: instead of searching over all possible hyperplanes, you only need to check whether two convex bodies overlap.

3. Convex Hull Visualization#

Let us visualize the convex hull criterion for several 2D datasets, both separable and non-separable.

Hide code cell source
# Compute convex hull of 2D points (Graham scan)

def convex_hull_2d(points):
    """Compute the convex hull of 2D points using Graham scan.
    Returns the hull vertices in counter-clockwise order."""
    points = np.array(points)
    if len(points) <= 1:
        return points
    if len(points) == 2:
        return points
    
    # Find the point with the lowest y-coordinate (leftmost if tie)
    start = np.lexsort((points[:, 0], points[:, 1]))[0]
    pivot = points[start]
    
    # Sort by polar angle with respect to pivot
    def angle_key(p):
        return np.arctan2(p[1] - pivot[1], p[0] - pivot[0])
    
    indices = list(range(len(points)))
    indices.sort(key=lambda i: (angle_key(points[i]), np.linalg.norm(points[i] - pivot)))
    
    def cross(o, a, b):
        return (a[0]-o[0])*(b[1]-o[1]) - (a[1]-o[1])*(b[0]-o[0])
    
    hull = []
    for idx in indices:
        while len(hull) >= 2 and cross(points[hull[-2]], points[hull[-1]], points[idx]) <= 0:
            hull.pop()
        hull.append(idx)
    
    return points[hull]


def check_hull_intersection(hull_A, hull_B):
    """Simple check if two convex polygons (given as hull vertices) intersect.
    Uses the Separating Axis Theorem (SAT)."""
    def get_edges(hull):
        edges = []
        n = len(hull)
        for i in range(n):
            edge = hull[(i+1) % n] - hull[i]
            edges.append(edge)
        return edges
    
    def project(hull, axis):
        projections = np.dot(hull, axis)
        return projections.min(), projections.max()
    
    for hull in [hull_A, hull_B]:
        if len(hull) < 2:
            continue
        edges = get_edges(hull)
        for edge in edges:
            # Normal to the edge
            normal = np.array([-edge[1], edge[0]])
            if np.linalg.norm(normal) < 1e-10:
                continue
            normal = normal / np.linalg.norm(normal)
            
            min_A, max_A = project(hull_A, normal)
            min_B, max_B = project(hull_B, normal)
            
            if max_A < min_B - 1e-10 or max_B < min_A - 1e-10:
                return False  # Found a separating axis
    
    return True  # No separating axis found => intersection


print("Convex hull and intersection checking functions defined.")
Convex hull and intersection checking functions defined.
Hide code cell source
# Generate and visualize several datasets

np.random.seed(42)

datasets = [
    # (name, class_A, class_B)
    ("Separable: Two Clusters",
     np.random.randn(8, 2) * 0.5 + np.array([-2, 0]),
     np.random.randn(8, 2) * 0.5 + np.array([2, 0])),
    
    ("Separable: Diagonal",
     np.random.randn(8, 2) * 0.4 + np.array([-1.5, -1.5]),
     np.random.randn(8, 2) * 0.4 + np.array([1.5, 1.5])),
    
    ("NOT Separable: XOR Pattern",
     np.array([[0, 0], [1, 1]]) + np.random.randn(2, 2) * 0.15,
     np.array([[0, 1], [1, 0]]) + np.random.randn(2, 2) * 0.15),
    
    ("NOT Separable: Interleaved",
     np.array([[-1, 0], [0, 1], [1, 0], [0, -1.0]]),
     np.array([[0, 0], [-0.5, 0.5], [0.5, 0.5], [0, -0.3]]))
]

fig, axes = plt.subplots(2, 2, figsize=(14, 12))

for idx, (name, A, B) in enumerate(datasets):
    ax = axes[idx // 2][idx % 2]
    
    # Compute convex hulls
    hull_A = convex_hull_2d(A)
    hull_B = convex_hull_2d(B)
    
    # Check intersection
    intersects = check_hull_intersection(hull_A, hull_B)
    separable = not intersects
    
    # Plot points
    ax.scatter(A[:, 0], A[:, 1], c='blue', s=100, zorder=5,
               edgecolors='black', linewidths=1.5, marker='o', label='Class A')
    ax.scatter(B[:, 0], B[:, 1], c='red', s=100, zorder=5,
               edgecolors='black', linewidths=1.5, marker='s', label='Class B')
    
    # Draw convex hulls
    if len(hull_A) >= 3:
        hull_closed = np.vstack([hull_A, hull_A[0]])
        ax.fill(hull_A[:, 0], hull_A[:, 1], alpha=0.15, color='blue')
        ax.plot(hull_closed[:, 0], hull_closed[:, 1], 'b-', linewidth=2, alpha=0.5)
    elif len(hull_A) == 2:
        ax.plot(hull_A[:, 0], hull_A[:, 1], 'b-', linewidth=2, alpha=0.5)
    
    if len(hull_B) >= 3:
        hull_closed = np.vstack([hull_B, hull_B[0]])
        ax.fill(hull_B[:, 0], hull_B[:, 1], alpha=0.15, color='red')
        ax.plot(hull_closed[:, 0], hull_closed[:, 1], 'r-', linewidth=2, alpha=0.5)
    elif len(hull_B) == 2:
        ax.plot(hull_B[:, 0], hull_B[:, 1], 'r-', linewidth=2, alpha=0.5)
    
    # Title with separability status
    color = 'green' if separable else 'red'
    status = 'SEPARABLE' if separable else 'NOT SEPARABLE'
    ax.set_title(f"{name}\nConvex hulls {'disjoint' if separable else 'overlap'} => {status}",
                 fontsize=12, color=color)
    ax.legend(fontsize=10)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)

plt.suptitle('Convex Hull Criterion for Linear Separability', fontsize=15, y=1.01)
plt.tight_layout()
plt.show()
../_images/b619d135cea709a0e400de170e73d650fb92d82120ef81887adbc274dd9f4b41.png

4. Cover’s Function Counting Theorem#

A fundamental question in the theory of linear separability is: how many of the possible labelings of \(m\) points can be achieved by a hyperplane in \(\mathbb{R}^d\)?

Setting#

Given \(m\) points \(\mathbf{x}_1, \ldots, \mathbf{x}_m \in \mathbb{R}^d\) in general position (meaning no \(d+1\) points lie on a common hyperplane), a dichotomy is a partition of the points into two classes. There are \(2^m\) possible dichotomies.

A dichotomy is linearly realizable if there exists a hyperplane separating the two classes.

Theorem (Cover, 1965)

The number of linearly realizable dichotomies of \(m\) points in general position in \(\mathbb{R}^d\) is:

\[C(m, d) = 2 \sum_{k=0}^{d} \binom{m-1}{k}\]

Properties:

  1. When \(m \leq d+1\): \(C(m, d) = 2^m\). All dichotomies are realizable (the points can be shattered).

  2. When \(m = 2(d+1)\): \(C(m, d) \approx 2^{m-1}\). Approximately half of all dichotomies are realizable.

  3. When \(m \gg d\): \(C(m, d) \ll 2^m\). Almost no dichotomies are realizable.

There is a sharp phase transition around \(m = 2d\): below this threshold, most dichotomies are separable; above it, most are not.

Tip

Why higher dimensions help separability (Cover’s theorem).

Cover’s theorem reveals a profound principle: data that is not separable in low dimensions often becomes separable in high dimensions. The fraction of realizable dichotomies depends on the ratio \(m/d\) (number of points to number of dimensions). If you double the number of dimensions while keeping the number of points fixed, you go from “almost no dichotomies work” to “almost all dichotomies work.” This is the mathematical foundation for:

  • Feature engineering: adding computed features increases dimension

  • Kernel methods: implicitly mapping to high-dimensional feature spaces

  • Hidden layers in neural networks: the hidden layer creates a high-dimensional representation where the output layer can separate the classes

Hide code cell source
# Cover's Function Counting Theorem: computation and visualization

from math import comb

def cover_count(m, d):
    """Number of linearly realizable dichotomies of m points in R^d."""
    return 2 * sum(comb(m - 1, k) for k in range(d + 1))

# Table of C(m, d)
print("Cover's Function C(m, d): Number of realizable dichotomies")
print("=" * 65)
print(f"{'m':>4}", end="")
for d in range(1, 7):
    print(f"{'d='+str(d):>10}", end="")
print(f"{'2^m':>10}")
print("-" * 65)

for m in range(1, 13):
    print(f"{m:>4}", end="")
    for d in range(1, 7):
        c = cover_count(m, d)
        total = 2**m
        if c >= total:
            print(f"{'ALL':>10}", end="")
        else:
            print(f"{c:>10}", end="")
    print(f"{2**m:>10}")
Cover's Function C(m, d): Number of realizable dichotomies
=================================================================
   m       d=1       d=2       d=3       d=4       d=5       d=6       2^m
-----------------------------------------------------------------
   1       ALL       ALL       ALL       ALL       ALL       ALL         2
   2       ALL       ALL       ALL       ALL       ALL       ALL         4
   3         6       ALL       ALL       ALL       ALL       ALL         8
   4         8        14       ALL       ALL       ALL       ALL        16
   5        10        22        30       ALL       ALL       ALL        32
   6        12        32        52        62       ALL       ALL        64
   7        14        44        84       114       126       ALL       128
   8        16        58       128       198       240       254       256
   9        18        74       186       326       438       494       512
  10        20        92       260       512       764       932      1024
  11        22       112       352       772      1276      1696      2048
  12        24       134       464      1124      2048      2972      4096
Hide code cell source
# Plot: fraction of realizable dichotomies as m grows

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: fraction for several dimensions
ax1 = axes[0]
for d in [1, 2, 3, 5, 10]:
    m_vals = np.arange(1, 8 * d + 1)
    fractions = [min(cover_count(m, d) / 2**m, 1.0) for m in m_vals]
    ax1.plot(m_vals / (2*d), fractions, '-', linewidth=2, label=f'd = {d}')

ax1.axvline(x=1, color='black', linestyle='--', alpha=0.5, label='$m = 2d$ (transition)')
ax1.set_xlabel('$m / (2d)$', fontsize=13)
ax1.set_ylabel('Fraction of realizable dichotomies', fontsize=12)
ax1.set_title('Cover\'s Theorem: Phase Transition at $m \\approx 2d$', fontsize=13)
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.set_ylim(-0.05, 1.1)

# Right: absolute count for d=2
ax2 = axes[1]
d = 2
m_vals = np.arange(1, 20)
cover_vals = [cover_count(m, d) for m in m_vals]
total_vals = [2**m for m in m_vals]

ax2.semilogy(m_vals, total_vals, 'r-o', markersize=5, label=f'$2^m$ (all dichotomies)')
ax2.semilogy(m_vals, cover_vals, 'b-s', markersize=5, label=f'$C(m, {d})$ (realizable)')
ax2.axvline(x=2*(d+1), color='green', linestyle='--', alpha=0.7,
            label=f'$m = 2(d+1) = {2*(d+1)}$ (half-point)')
ax2.set_xlabel('$m$ (number of points)', fontsize=13)
ax2.set_ylabel('Number of dichotomies (log scale)', fontsize=12)
ax2.set_title(f'$d = {d}$: Realizable vs. Total Dichotomies', fontsize=13)
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
/var/folders/z7/wp7m8p7x1250jzvklw5z24mm0000gn/T/ipykernel_46000/2536969230.py:9: RuntimeWarning: divide by zero encountered in scalar divide
  fractions = [min(cover_count(m, d) / 2**m, 1.0) for m in m_vals]
../_images/4e5f92d0841712dc07371f4eaa817e608ccfee3aaaacba97016703b418b6b9cf.png

4a. Graph of Cover’s Counting Function#

Let us plot the number of linearly separable dichotomies \(C(m,d)\) versus the number of points \(m\) for several dimensions \(d\), overlaid with \(2^m\) to see where the gap opens.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from math import comb

def cover_count(m, d):
    """Number of linearly realizable dichotomies of m points in R^d."""
    return 2 * sum(comb(m - 1, k) for k in range(d + 1))

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Left panel: C(m,d) for various d on log scale
ax1 = axes[0]
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']
dims = [1, 2, 3, 5, 10]
for i, d in enumerate(dims):
    m_vals = np.arange(1, 6 * d + 1)
    c_vals = [cover_count(int(m), d) for m in m_vals]
    total_vals = [2**int(m) for m in m_vals]
    ax1.semilogy(m_vals, c_vals, '-o', color=colors[i], markersize=3,
                 linewidth=2, label=f'$C(m, {d})$')

# Also plot 2^m
m_all = np.arange(1, 61)
ax1.semilogy(m_all, [2**int(m) for m in m_all], 'k--', linewidth=2, alpha=0.5, label='$2^m$ (all)')

ax1.set_xlabel('$m$ (number of points)', fontsize=13)
ax1.set_ylabel('Number of dichotomies (log scale)', fontsize=12)
ax1.set_title('Cover\'s Counting Function $C(m,d)$\nvs. total dichotomies $2^m$', fontsize=13)
ax1.legend(fontsize=10, loc='lower right')
ax1.grid(True, alpha=0.3)
ax1.set_xlim(1, 60)

# Right panel: fraction C(m,d) / 2^m showing the phase transition
ax2 = axes[1]
for i, d in enumerate(dims):
    m_vals = np.arange(1, 8 * d + 1)
    fractions = [min(cover_count(int(m), d) / 2**int(m), 1.0) for m in m_vals]
    ax2.plot(m_vals, fractions, '-', color=colors[i], linewidth=2, label=f'd = {d}')
    # Mark the transition point m = 2d
    ax2.axvline(x=2*d, color=colors[i], linestyle=':', alpha=0.3)

ax2.set_xlabel('$m$ (number of points)', fontsize=13)
ax2.set_ylabel('Fraction $C(m,d) / 2^m$', fontsize=12)
ax2.set_title('Phase Transition: Fraction of Separable Dichotomies\n'
              'Dotted lines mark $m = 2d$', fontsize=13)
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)
ax2.set_ylim(-0.05, 1.1)

plt.tight_layout()
plt.show()

print("Key observation: For each dimension d, the fraction of separable dichotomies")
print("drops sharply around m = 2d points. This is the phase transition.")
print("Higher dimensions push the transition to more points --- this is why")
print("mapping to higher dimensions helps with separability.")
../_images/6eaaf6f66b7cad293a3a095d451f403c8a98703c6b489f8942b5d4688c69d869.png
Key observation: For each dimension d, the fraction of separable dichotomies
drops sharply around m = 2d points. This is the phase transition.
Higher dimensions push the transition to more points --- this is why
mapping to higher dimensions helps with separability.

5. VC Dimension#

The Vapnik-Chervonenkis (VC) dimension formalizes the expressive power of a hypothesis class.

Definition (VC Dimension)

Shattering. A hypothesis class \(\mathcal{H}\) shatters a set of points \(S = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\) if for every possible labeling \((y_1, \ldots, y_m) \in \{0, 1\}^m\), there exists \(h \in \mathcal{H}\) such that \(h(\mathbf{x}_i) = y_i\) for all \(i\). In other words, \(\mathcal{H}\) can realize all \(2^m\) dichotomies of \(S\).

VC Dimension. The VC dimension of \(\mathcal{H}\) is:

\[\text{VCdim}(\mathcal{H}) = \max\{m : \exists S \text{ of size } m \text{ shattered by } \mathcal{H}\}\]

It is the largest number of points that can be arranged so that the hypothesis class can realize every possible labeling. Note the existential quantifier: we only need one arrangement of \(m\) points that can be shattered, not all arrangements.

VC Dimension of Perceptrons#

Theorem. The VC dimension of the class of perceptrons (linear classifiers) in \(\mathbb{R}^n\) is \(n + 1\).

Danger

Curse of dimensionality — more dimensions do NOT always help.

While Cover’s theorem shows that higher dimensions increase the fraction of separable dichotomies, there is a hidden cost: the curse of dimensionality. In high-dimensional spaces:

  • Data becomes sparse: the volume of the space grows exponentially, so the same number of data points covers an exponentially smaller fraction.

  • Distance concentration: all pairwise distances become nearly equal, making nearest-neighbor methods unreliable.

  • Overfitting: with enough dimensions, any dataset becomes trivially separable, but the resulting classifier generalizes poorly.

The VC dimension gives us a quantitative handle on this tradeoff: a hypothesis class with VC dimension \(d\) needs roughly \(O(d / \epsilon^2)\) training examples to generalize well (by the VC theorem). So increasing dimension helps separability but hurts generalization — unless the training set grows proportionally.

Hide code cell source
# VC Dimension visualization: 3 points in R^2, all 8 dichotomies

# Choose 3 points in general position
points = np.array([[0, 0], [1, 0], [0.5, 0.87]])

fig, axes = plt.subplots(2, 4, figsize=(16, 8))

# All 8 labelings of 3 points
for idx in range(8):
    ax = axes[idx // 4][idx % 4]
    labels = [(idx >> i) & 1 for i in range(3)]
    
    # Plot points
    for i, (pt, lbl) in enumerate(zip(points, labels)):
        color = 'red' if lbl == 1 else 'blue'
        marker = 's' if lbl == 1 else 'o'
        ax.scatter(pt[0], pt[1], c=color, s=200, marker=marker,
                   edgecolors='black', linewidths=2, zorder=5)
    
    # Find a separating line (solve for w1*x + w2*y + b = 0)
    # Using a simple approach: try to find weights
    class_pos = points[np.array(labels) == 1]
    class_neg = points[np.array(labels) == 0]
    
    # Find separating line by optimization
    found = False
    if len(class_pos) == 0 or len(class_neg) == 0:
        # Trivial case: all same class
        if len(class_pos) == 0:
            # All negative: any line with all points on negative side
            w = np.array([0, -1])
            b = -1.5
        else:
            w = np.array([0, 1])
            b = -0.1 if len(class_pos) == 3 else 1.5
        found = True
    else:
        # Try many random directions
        best_w, best_b = None, None
        for angle in np.linspace(0, 2*np.pi, 360):
            w_try = np.array([np.cos(angle), np.sin(angle)])
            proj_pos = [np.dot(w_try, p) for p in class_pos]
            proj_neg = [np.dot(w_try, p) for p in class_neg]
            if min(proj_pos) > max(proj_neg):
                b_try = -(min(proj_pos) + max(proj_neg)) / 2
                best_w = w_try
                best_b = b_try
                found = True
                break
        if found:
            w = best_w
            b = best_b
    
    # Draw separating line if found
    if found:
        x_line = np.linspace(-0.5, 1.5, 100)
        if abs(w[1]) > 1e-10:
            y_line = -(w[0] * x_line + b) / w[1]
            valid = (y_line >= -0.5) & (y_line <= 1.5)
            ax.plot(x_line[valid], y_line[valid], 'g-', linewidth=2, alpha=0.7)
        else:
            ax.axvline(x=-b/w[0], color='g', linewidth=2, alpha=0.7)
    
    label_str = ''.join(str(l) for l in labels)
    ax.set_title(f'Labels: {label_str}', fontsize=11)
    ax.set_xlim(-0.3, 1.3)
    ax.set_ylim(-0.3, 1.2)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.2)
    ax.set_xticks([])
    ax.set_yticks([])

plt.suptitle('VC Dimension: 3 Points in $\\mathbb{R}^2$ --- All 8 Dichotomies are Realizable\n'
             'Therefore VCdim(perceptron in $\\mathbb{R}^2$) $\\geq$ 3',
             fontsize=14, y=1.02)
plt.tight_layout()
plt.show()
../_images/f8290186e7f5ca1605b2994dc882d761552b9d13a6f311c11d27b2fb19a33257.png

5a. VC Dimension Shatter Diagrams#

To develop deeper intuition for shattering, let us explicitly visualize the shatter diagram: for 3 points in \(\mathbb{R}^2\), we show all 8 dichotomies with their separating lines, and then show that for 4 points, at least one dichotomy (the XOR pattern) cannot be realized.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 5, figsize=(20, 8))

# ---- Top row: 3 points shattered by lines in 2D ----
pts3 = np.array([[0.2, 0.1], [0.8, 0.1], [0.5, 0.8]])

for idx in range(8):
    ax = axes[0][idx] if idx < 5 else axes[1][idx - 5]
    labels = [(idx >> i) & 1 for i in range(3)]

    for i, (pt, lbl) in enumerate(zip(pts3, labels)):
        color = 'red' if lbl == 1 else 'blue'
        marker = 's' if lbl == 1 else 'o'
        ax.scatter(pt[0], pt[1], c=color, s=250, marker=marker,
                   edgecolors='black', linewidths=2, zorder=5)

    # Find separating line
    class_pos = pts3[np.array(labels) == 1]
    class_neg = pts3[np.array(labels) == 0]

    found = False
    if len(class_pos) == 0 or len(class_neg) == 0:
        found = True
        w = np.array([0, 1])
        b = -1.5 if len(class_pos) == 0 else 0.05
    else:
        for angle in np.linspace(0, 2 * np.pi, 720):
            w_try = np.array([np.cos(angle), np.sin(angle)])
            proj_pos = [np.dot(w_try, p) for p in class_pos]
            proj_neg = [np.dot(w_try, p) for p in class_neg]
            if min(proj_pos) > max(proj_neg) + 1e-6:
                b = -(min(proj_pos) + max(proj_neg)) / 2
                w = w_try
                found = True
                break

    if found:
        x_line = np.linspace(-0.2, 1.2, 200)
        if abs(w[1]) > 1e-10:
            y_line = -(w[0] * x_line + b) / w[1]
            valid = (y_line >= -0.2) & (y_line <= 1.2)
            ax.plot(x_line[valid], y_line[valid], 'g-', linewidth=2.5, alpha=0.8)
        else:
            ax.axvline(x=-b / w[0], color='g', linewidth=2.5, alpha=0.8)

    label_str = ''.join(str(l) for l in labels)
    ax.set_title(f'{label_str}', fontsize=12, fontweight='bold',
                 color='green')
    ax.set_xlim(-0.1, 1.1)
    ax.set_ylim(-0.1, 1.0)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.2)
    ax.set_xticks([])
    ax.set_yticks([])

# ---- Bottom row remaining: 4 points, show XOR failure ----
# Fill remaining bottom-row axes
pts4 = np.array([[0.2, 0.2], [0.8, 0.2], [0.2, 0.8], [0.8, 0.8]])

# Show a separable dichotomy
ax_ok = axes[1][3]
labels_ok = [0, 1, 0, 1]
for i, (pt, lbl) in enumerate(zip(pts4, labels_ok)):
    color = 'red' if lbl == 1 else 'blue'
    marker = 's' if lbl == 1 else 'o'
    ax_ok.scatter(pt[0], pt[1], c=color, s=250, marker=marker,
                  edgecolors='black', linewidths=2, zorder=5)
# Draw separating line
ax_ok.axvline(x=0.5, color='green', linewidth=2.5)
ax_ok.set_title('4pts: 0101\nSeparable', fontsize=11, color='green', fontweight='bold')
ax_ok.set_xlim(-0.1, 1.1)
ax_ok.set_ylim(-0.1, 1.0)
ax_ok.set_aspect('equal')
ax_ok.grid(True, alpha=0.2)
ax_ok.set_xticks([])
ax_ok.set_yticks([])

# Show the XOR failure
ax_fail = axes[1][4]
labels_fail = [0, 1, 1, 0]  # XOR pattern
for i, (pt, lbl) in enumerate(zip(pts4, labels_fail)):
    color = 'red' if lbl == 1 else 'blue'
    marker = 's' if lbl == 1 else 'o'
    ax_fail.scatter(pt[0], pt[1], c=color, s=250, marker=marker,
                    edgecolors='black', linewidths=2, zorder=5)
# Draw the intersecting convex hulls
ax_fail.plot([0.2, 0.8], [0.2, 0.8], 'b--', linewidth=2, alpha=0.5)
ax_fail.plot([0.8, 0.2], [0.2, 0.8], 'r--', linewidth=2, alpha=0.5)
ax_fail.plot(0.5, 0.5, 'kX', markersize=15, zorder=10)
ax_fail.set_title('4pts: 0110 (XOR)\nNOT Separable!', fontsize=11, color='red', fontweight='bold')
ax_fail.set_xlim(-0.1, 1.1)
ax_fail.set_ylim(-0.1, 1.0)
ax_fail.set_aspect('equal')
ax_fail.grid(True, alpha=0.2)
ax_fail.set_xticks([])
ax_fail.set_yticks([])

plt.suptitle('Shatter Diagrams: 3 points can be shattered (all 8 dichotomies work)\n'
             '4 points CANNOT be shattered (XOR dichotomy fails) $\\Rightarrow$ VCdim = 3',
             fontsize=14, y=1.04)
plt.tight_layout()
plt.show()
../_images/5ec4eddce064838f1d210cd0325f43dfd419dfee2af8f67be372c9d1359a5790.png
Hide code cell source
# Show that 4 points in R^2 CANNOT all be shattered
# The XOR configuration is the failing dichotomy

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# 4 points in general position
pts4 = np.array([[0, 0], [1, 0], [0, 1], [0.5, 0.5]])

# Show a solvable dichotomy
ax1 = axes[0]
labels_ok = [0, 1, 0, 1]
for i, (pt, lbl) in enumerate(zip(pts4, labels_ok)):
    color = 'red' if lbl == 1 else 'blue'
    marker = 's' if lbl == 1 else 'o'
    ax1.scatter(pt[0], pt[1], c=color, s=200, marker=marker,
               edgecolors='black', linewidths=2, zorder=5)
ax1.set_title('4 points: This dichotomy IS realizable', fontsize=12, color='green')
# Draw a separating line
x_line = np.linspace(-0.3, 1.3, 100)
ax1.plot(x_line, -0.8*x_line + 0.35, 'g-', linewidth=2.5)
ax1.set_xlim(-0.3, 1.3)
ax1.set_ylim(-0.3, 1.3)
ax1.set_aspect('equal')
ax1.grid(True, alpha=0.3)

# Show the XOR-like failing dichotomy
ax2 = axes[1]
# Use the standard XOR pattern on a square
pts_xor = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
labels_fail = [0, 1, 1, 0]  # XOR pattern
for i, (pt, lbl) in enumerate(zip(pts_xor, labels_fail)):
    color = 'red' if lbl == 1 else 'blue'
    marker = 's' if lbl == 1 else 'o'
    ax2.scatter(pt[0], pt[1], c=color, s=200, marker=marker,
               edgecolors='black', linewidths=2, zorder=5)

# Draw the intersecting convex hulls
ax2.plot([0, 1], [0, 1], 'b--', linewidth=2, alpha=0.5, label='conv(Class 0)')
ax2.plot([1, 0], [0, 1], 'r--', linewidth=2, alpha=0.5, label='conv(Class 1)')
ax2.plot(0.5, 0.5, 'k*', markersize=15, zorder=10)

ax2.set_title('4 points: This dichotomy is NOT realizable (XOR)\n'
              'No line can separate blue circles from red squares',
              fontsize=12, color='red')
ax2.set_xlim(-0.3, 1.3)
ax2.set_ylim(-0.3, 1.3)
ax2.set_aspect('equal')
ax2.grid(True, alpha=0.3)
ax2.legend(fontsize=11)

plt.suptitle('VCdim(perceptron in $\\mathbb{R}^2$) = 3, NOT 4\n'
             '3 points can be shattered, but 4 points cannot', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()
../_images/2f6c3b458ba7ff7782b9aa026911cee0741eb046cecf7a53c4fd33d506f6f75f.png

6. Higher-Dimensional Embeddings#

One of the most powerful ideas in machine learning is that data which is not linearly separable in its original space may become separable after mapping it to a higher-dimensional feature space.

XOR in \(\mathbb{R}^3\)#

Consider the feature map \(\varphi: \mathbb{R}^2 \to \mathbb{R}^3\) defined by:

\[\varphi(x_1, x_2) = (x_1, x_2, x_1 x_2)\]

This adds a new dimension: the product of the two inputs. Under this mapping:

\((x_1, x_2)\)

\(\varphi(x_1, x_2) = (x_1, x_2, x_1 x_2)\)

XOR

\((0, 0)\)

\((0, 0, 0)\)

0

\((0, 1)\)

\((0, 1, 0)\)

1

\((1, 0)\)

\((1, 0, 0)\)

1

\((1, 1)\)

\((1, 1, 1)\)

0

Claim: These 4 points are linearly separable in \(\mathbb{R}^3\).

Verification: We need \(w_1 x_1 + w_2 x_2 + w_3 x_1 x_2 + b\) to be positive for XOR=1 and negative for XOR=0.

Try \(w_1 = 1, w_2 = 1, w_3 = -2, b = -0.5\):

  • \((0,0,0)\): \(0 + 0 + 0 - 0.5 = -0.5 < 0\) (correct: class 0)

  • \((0,1,0)\): \(0 + 1 + 0 - 0.5 = 0.5 > 0\) (correct: class 1)

  • \((1,0,0)\): \(1 + 0 + 0 - 0.5 = 0.5 > 0\) (correct: class 1)

  • \((1,1,1)\): \(1 + 1 - 2 - 0.5 = -0.5 < 0\) (correct: class 0)

Connection to Kernel Methods#

This idea — mapping data to a higher-dimensional space where it becomes linearly separable — is the foundation of kernel methods, including Support Vector Machines (SVMs). The “kernel trick” allows performing computations in the high-dimensional feature space implicitly, without ever computing the mapping \(\varphi\) explicitly.

Hide code cell source
# 3D visualization: XOR becomes separable after lifting

fig = plt.figure(figsize=(14, 6))

# Left: original 2D space
ax1 = fig.add_subplot(121)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 0])

colors = ['blue' if yi == 0 else 'red' for yi in y]
markers = ['o' if yi == 0 else 's' for yi in y]
for i in range(4):
    ax1.scatter(X[i, 0], X[i, 1], c=colors[i], s=250, marker=markers[i],
               edgecolors='black', linewidths=2, zorder=5)
    ax1.annotate(f'XOR={y[i]}', (X[i, 0], X[i, 1]),
                xytext=(X[i, 0]+0.05, X[i, 1]+0.08), fontsize=11)

ax1.set_xlabel('$x_1$', fontsize=14)
ax1.set_ylabel('$x_2$', fontsize=14)
ax1.set_title('Original Space $\\mathbb{R}^2$: NOT separable', fontsize=13)
ax1.set_xlim(-0.3, 1.4)
ax1.set_ylim(-0.3, 1.4)
ax1.set_aspect('equal')
ax1.grid(True, alpha=0.3)

# Right: lifted 3D space
ax2 = fig.add_subplot(122, projection='3d')

# Lift to 3D: phi(x1, x2) = (x1, x2, x1*x2)
X_lifted = np.column_stack([X, X[:, 0] * X[:, 1]])

for i in range(4):
    ax2.scatter(X_lifted[i, 0], X_lifted[i, 1], X_lifted[i, 2],
               c=colors[i], s=200, marker=markers[i],
               edgecolors='black', linewidths=2, zorder=5)

# Draw the separating plane: x1 + x2 - 2*x1*x2 = 0.5
# => z3 = (x1 + x2 - 0.5) / 2  (where z3 = x1*x2)
xx, yy = np.meshgrid(np.linspace(-0.2, 1.2, 20), np.linspace(-0.2, 1.2, 20))
zz = (xx + yy - 0.5) / 2
ax2.plot_surface(xx, yy, zz, alpha=0.2, color='green')

ax2.set_xlabel('$x_1$', fontsize=12)
ax2.set_ylabel('$x_2$', fontsize=12)
ax2.set_zlabel('$x_1 x_2$', fontsize=12)
ax2.set_title('Lifted Space $\\mathbb{R}^3$: SEPARABLE!', fontsize=13)
ax2.view_init(elev=25, azim=45)

plt.suptitle('Feature Map $\\varphi(x_1, x_2) = (x_1, x_2, x_1 x_2)$: '
             'XOR Becomes Linearly Separable in 3D', fontsize=14, y=1.0)
plt.tight_layout()
plt.show()

# Verify the separation
w = np.array([1, 1, -2])
b = -0.5
print("Verification of linear separability in R^3:")
print(f"Separating plane: {w[0]}*x1 + {w[1]}*x2 + ({w[2]})*x1*x2 + ({b}) = 0")
print()
for i in range(4):
    val = np.dot(w, X_lifted[i]) + b
    pred = 1 if val > 0 else 0
    print(f"phi({X[i]}) = {X_lifted[i]} => w.phi + b = {val:.1f} => class {pred} "
          f"(true: {y[i]}) {'OK' if pred == y[i] else 'FAIL'}")
../_images/a2fad38b8663a85b406a34c0c23d4be28c9848554fb175c49e1d3766429c84a2.png
Verification of linear separability in R^3:
Separating plane: 1*x1 + 1*x2 + (-2)*x1*x2 + (-0.5) = 0

phi([0 0]) = [0 0 0] => w.phi + b = -0.5 => class 0 (true: 0) OK
phi([0 1]) = [0 1 0] => w.phi + b = 0.5 => class 1 (true: 1) OK
phi([1 0]) = [1 0 0] => w.phi + b = 0.5 => class 1 (true: 1) OK
phi([1 1]) = [1 1 1] => w.phi + b = -0.5 => class 0 (true: 0) OK

7. Margin and Support Vectors#

When data is linearly separable, there are infinitely many separating hyperplanes. Which one is “best”? This question leads to the concept of margin and eventually to Support Vector Machines (SVMs).

Geometric Margin#

Given a separating hyperplane \(H: \mathbf{w} \cdot \mathbf{x} + b = 0\) (with \(\|\mathbf{w}\| = 1\)), the geometric margin is the minimum distance from any data point to \(H\):

\[\gamma = \min_i |\mathbf{w} \cdot \mathbf{x}_i + b|\]

The distance from a point \(\mathbf{x}_i\) to the hyperplane is:

\[d(\mathbf{x}_i, H) = \frac{|\mathbf{w} \cdot \mathbf{x}_i + b|}{\|\mathbf{w}\|}\]

Support Vectors#

The support vectors are the data points closest to the separating hyperplane — those achieving the minimum distance \(\gamma\). These are the “hardest” points to classify and the ones that determine the position of the optimal boundary.

Maximum Margin Classifier#

The maximum margin hyperplane is the one that maximizes \(\gamma\). This is the optimal linear classifier in a precise sense: it provides the greatest “safety buffer” between the two classes.

The optimization problem is:

\[\max_{\mathbf{w}, b} \frac{2}{\|\mathbf{w}\|} \quad \text{s.t.} \quad y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \; \forall i\]

This is equivalent to minimizing \(\|\mathbf{w}\|^2\) subject to the constraints — a quadratic program that can be solved efficiently. This is the foundation of the Support Vector Machine (SVM), developed by Vapnik and Cortes in the 1990s.

Hide code cell source
# Visualization: margin and support vectors

np.random.seed(12)

# Generate linearly separable data
A = np.random.randn(15, 2) * 0.6 + np.array([-1.5, 0])
B = np.random.randn(15, 2) * 0.6 + np.array([1.5, 0])

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Left: many valid separating hyperplanes
ax1 = axes[0]
ax1.scatter(A[:, 0], A[:, 1], c='blue', s=80, edgecolors='black', linewidths=1, marker='o')
ax1.scatter(B[:, 0], B[:, 1], c='red', s=80, edgecolors='black', linewidths=1, marker='s')

# Draw several valid separating lines
x_plot = np.linspace(-3.5, 3.5, 100)
for slope, intercept, style in [(0, 0, '-'), (0.5, 0.2, '--'), (-0.3, -0.1, '-.'), (0.8, 0.3, ':')]:
    ax1.plot(x_plot, slope * x_plot + intercept, 'gray', linestyle=style, linewidth=1.5, alpha=0.6)

ax1.set_title('Many valid separating lines', fontsize=13)
ax1.set_xlim(-3.5, 3.5)
ax1.set_ylim(-3, 3)
ax1.set_xlabel('$x_1$', fontsize=12)
ax1.set_ylabel('$x_2$', fontsize=12)
ax1.grid(True, alpha=0.3)

# Right: maximum margin hyperplane
ax2 = axes[1]
ax2.scatter(A[:, 0], A[:, 1], c='blue', s=80, edgecolors='black', linewidths=1, marker='o')
ax2.scatter(B[:, 0], B[:, 1], c='red', s=80, edgecolors='black', linewidths=1, marker='s')

# Find approximate maximum margin (simple approach: midpoint of closest pair)
min_dist = np.inf
sv_a, sv_b = None, None
for a in A:
    for b in B:
        d = np.linalg.norm(a - b)
        if d < min_dist:
            min_dist = d
            sv_a, sv_b = a, b

# The optimal separating line is perpendicular to sv_b - sv_a at the midpoint
midpoint = (sv_a + sv_b) / 2
w_dir = sv_b - sv_a
w_dir = w_dir / np.linalg.norm(w_dir)

# Line perpendicular to w_dir through midpoint
perp = np.array([-w_dir[1], w_dir[0]])
t_vals = np.linspace(-3, 3, 100)
line_pts = midpoint + np.outer(t_vals, perp)
ax2.plot(line_pts[:, 0], line_pts[:, 1], 'g-', linewidth=3, label='Max-margin boundary')

# Draw margin lines
margin = min_dist / 2
for sign, style in [(1, '--'), (-1, '--')]:
    margin_pts = midpoint + sign * margin * w_dir + np.outer(t_vals, perp)
    ax2.plot(margin_pts[:, 0], margin_pts[:, 1], 'g', linestyle=style, linewidth=1.5, alpha=0.5)

# Highlight support vectors
ax2.scatter([sv_a[0]], [sv_a[1]], c='blue', s=250, edgecolors='green', linewidths=3,
            marker='o', zorder=6, label='Support vectors')
ax2.scatter([sv_b[0]], [sv_b[1]], c='red', s=250, edgecolors='green', linewidths=3,
            marker='s', zorder=6)

# Draw margin width
ax2.annotate('', xy=sv_b, xytext=sv_a,
             arrowprops=dict(arrowstyle='<->', color='purple', linewidth=2))
ax2.text(midpoint[0] + 0.1, midpoint[1] + 0.3, f'margin = {min_dist:.2f}',
         fontsize=12, color='purple', fontweight='bold')

ax2.set_title('Maximum margin hyperplane', fontsize=13)
ax2.set_xlim(-3.5, 3.5)
ax2.set_ylim(-3, 3)
ax2.set_xlabel('$x_1$', fontsize=12)
ax2.set_ylabel('$x_2$', fontsize=12)
ax2.legend(fontsize=10, loc='upper left')
ax2.grid(True, alpha=0.3)

plt.suptitle('From Separability to Optimal Separability: The Margin Idea', fontsize=14, y=1.01)
plt.tight_layout()
plt.show()
../_images/b5dced66df1a747a3f2f26720f64bad8de905cffab3be0654d1431f9ca3702d2.png

8. Exercises#

Exercise 10.1: Separability by Convex Hull#

For each of the 5 random 2D datasets below, determine whether the two classes are linearly separable by computing and visualizing the convex hulls. Use the convex_hull_2d and check_hull_intersection functions defined earlier.

Hide code cell source
# Exercise 10.1: Generate 5 random datasets and check separability

np.random.seed(2024)

exercise_datasets = []
for i in range(5):
    # Randomly generate two clusters
    center_A = np.random.randn(2) * 2
    center_B = np.random.randn(2) * 2
    spread = np.random.uniform(0.3, 1.5)
    n_points = np.random.randint(5, 12)
    A = np.random.randn(n_points, 2) * spread + center_A
    B = np.random.randn(n_points, 2) * spread + center_B
    exercise_datasets.append((f'Dataset {i+1}', A, B))

print("Exercise 10.1: For each dataset, determine separability.")
print("Use convex_hull_2d() and check_hull_intersection().")
print()

for name, A, B in exercise_datasets:
    hull_A = convex_hull_2d(A)
    hull_B = convex_hull_2d(B)
    intersects = check_hull_intersection(hull_A, hull_B)
    status = "NOT separable" if intersects else "SEPARABLE"
    print(f"{name}: {len(A)} vs {len(B)} points => {status}")
Exercise 10.1: For each dataset, determine separability.
Use convex_hull_2d() and check_hull_intersection().

Dataset 1: 7 vs 7 points => SEPARABLE
Dataset 2: 8 vs 8 points => SEPARABLE
Dataset 3: 9 vs 9 points => SEPARABLE
Dataset 4: 8 vs 8 points => NOT separable
Dataset 5: 9 vs 9 points => SEPARABLE

Exercise 10.2: VC Dimension of Circles#

Consider the hypothesis class of circles in \(\mathbb{R}^2\): a point is classified as positive if it lies inside the circle, and negative if outside.

\[\begin{split}h_{c,r}(\mathbf{x}) = \begin{cases} 1 & \text{if } \|\mathbf{x} - \mathbf{c}\| \leq r \\ 0 & \text{if } \|\mathbf{x} - \mathbf{c}\| > r \end{cases}\end{split}\]

Task: Show that the VC dimension of this class is 3.

  1. Show that 3 points can be shattered.

  2. Show that no set of 4 points can be shattered.

Hint for (1): Place 3 points on a circle. Show all 8 labelings can be achieved.

Hint for (2): Consider 4 points. If one is inside the convex hull of the other three, use that to find a labeling that fails.

Exercise 10.3: Feature Map for Concentric Circles#

Consider a dataset where class 0 points form a ring at distance \(\approx 2\) from the origin, and class 1 points form a cluster near the origin.

Find a feature map \(\varphi: \mathbb{R}^2 \to \mathbb{R}^k\) (for some \(k\)) that makes this dataset linearly separable.

Hint: What quantity naturally separates points near the origin from points far from it?

Hide code cell source
# Exercise 10.3: Concentric circles dataset

np.random.seed(42)

# Generate concentric circles data
n_inner = 50
n_outer = 80

# Inner circle (class 1)
theta_inner = np.random.uniform(0, 2*np.pi, n_inner)
r_inner = np.random.uniform(0, 0.8, n_inner)
X_inner = np.column_stack([r_inner * np.cos(theta_inner), r_inner * np.sin(theta_inner)])

# Outer ring (class 0)
theta_outer = np.random.uniform(0, 2*np.pi, n_outer)
r_outer = np.random.uniform(1.5, 2.5, n_outer)
X_outer = np.column_stack([r_outer * np.cos(theta_outer), r_outer * np.sin(theta_outer)])

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Original space
ax1 = axes[0]
ax1.scatter(X_inner[:, 0], X_inner[:, 1], c='red', s=30, label='Class 1 (inner)', alpha=0.7)
ax1.scatter(X_outer[:, 0], X_outer[:, 1], c='blue', s=30, label='Class 0 (outer)', alpha=0.7)
ax1.set_title('Original Space: NOT linearly separable', fontsize=13)
ax1.set_xlabel('$x_1$', fontsize=12)
ax1.set_ylabel('$x_2$', fontsize=12)
ax1.legend(fontsize=10)
ax1.set_aspect('equal')
ax1.grid(True, alpha=0.3)

# Feature space: phi(x1, x2) = x1^2 + x2^2 (just the radius squared)
r2_inner = X_inner[:, 0]**2 + X_inner[:, 1]**2
r2_outer = X_outer[:, 0]**2 + X_outer[:, 1]**2

ax2 = axes[1]
ax2.scatter(r2_inner, np.zeros_like(r2_inner) + np.random.randn(n_inner)*0.05,
            c='red', s=30, label='Class 1 (inner)', alpha=0.7)
ax2.scatter(r2_outer, np.zeros_like(r2_outer) + np.random.randn(n_outer)*0.05,
            c='blue', s=30, label='Class 0 (outer)', alpha=0.7)
ax2.axvline(x=1.0, color='green', linewidth=3, linestyle='--',
            label='Threshold at $r^2 = 1.0$')
ax2.set_title('Feature Space $\\varphi(x_1, x_2) = x_1^2 + x_2^2$: SEPARABLE!', fontsize=13)
ax2.set_xlabel('$r^2 = x_1^2 + x_2^2$', fontsize=12)
ax2.set_ylabel('(jittered for visibility)', fontsize=10)
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

plt.suptitle('Exercise 10.3: The radius-squared feature makes concentric circles separable',
             fontsize=14, y=1.01)
plt.tight_layout()
plt.show()

print("The feature map phi(x1, x2) = x1^2 + x2^2 maps each point to its squared distance from the origin.")
print("In this 1D feature space, a simple threshold separates the two classes.")
print(f"\nInner class r^2 range: [{r2_inner.min():.3f}, {r2_inner.max():.3f}]")
print(f"Outer class r^2 range: [{r2_outer.min():.3f}, {r2_outer.max():.3f}]")
../_images/2101ffe4d459215029708e9e2505ea5fcec17a064ed15c3086949b4683971175.png
The feature map phi(x1, x2) = x1^2 + x2^2 maps each point to its squared distance from the origin.
In this 1D feature space, a simple threshold separates the two classes.

Inner class r^2 range: [0.000, 0.623]
Outer class r^2 range: [2.265, 6.116]