Chapter 7: What Perceptrons Can Compute#

Part 2: The Perceptron#

7.1 Introduction#

Having established the perceptron model (Chapter 4) and its learning algorithm (Chapter 5), we now ask a fundamental question: what functions can a single perceptron compute?

Since a perceptron is a binary classifier that partitions its input space with a hyperplane, this question reduces to: which Boolean functions are linearly separable?

This chapter provides a systematic exploration:

  1. We enumerate all 16 two-input Boolean functions and determine which are perceptron-computable.

  2. We visualize the decision boundaries for all separable functions and demonstrate the impossibility for the non-separable ones.

  3. We count how the fraction of linearly separable functions changes with the number of inputs.

  4. We introduce the convex hull criterion for linear separability.

  5. We define the VC dimension of the perceptron.

These results set the stage for the Minsky-Papert critique and the motivation for multi-layer networks.

7.2 All 16 Two-Input Boolean Functions#

A Boolean function of two variables \(f: \{0,1\}^2 \to \{0,1\}\) is completely determined by its values on the four inputs \((0,0), (0,1), (1,0), (1,1)\). Since each output can be 0 or 1, there are \(2^4 = 16\) possible functions.

We can list them systematically. If we write the outputs in the order \((f(0,0), f(0,1), f(1,0), f(1,1))\) as a 4-bit binary number, we get the functions numbered 0 through 15:

#

\((0,0)\)

\((0,1)\)

\((1,0)\)

\((1,1)\)

Name

Linearly Separable?

0

0

0

0

0

FALSE (contradiction)

Yes (trivially)

1

0

0

0

1

AND (\(x_1 \wedge x_2\))

Yes

2

0

0

1

0

\(x_1 \wedge \neg x_2\)

Yes

3

0

0

1

1

\(x_1\) (projection)

Yes

4

0

1

0

0

\(\neg x_1 \wedge x_2\)

Yes

5

0

1

0

1

\(x_2\) (projection)

Yes

6

0

1

1

0

XOR (\(x_1 \oplus x_2\))

No

7

0

1

1

1

OR (\(x_1 \vee x_2\))

Yes

8

1

0

0

0

NOR (\(\neg(x_1 \vee x_2)\))

Yes

9

1

0

0

1

XNOR (\(\neg(x_1 \oplus x_2)\))

No

10

1

0

1

0

\(\neg x_2\)

Yes

11

1

0

1

1

\(x_1 \vee \neg x_2\)

Yes

12

1

1

0

0

\(\neg x_1\)

Yes

13

1

1

0

1

\(\neg x_1 \vee x_2\)

Yes

14

1

1

1

0

NAND (\(\neg(x_1 \wedge x_2)\))

Yes

15

1

1

1

1

TRUE (tautology)

Yes (trivially)

Danger

Most Boolean functions are NOT linearly separable! While 14 out of 16 two-input functions are separable (a fraction of 87.5%), this is deceptively optimistic. The fraction drops super-exponentially with the number of inputs:

  • \(n = 2\): 14 / 16 = 87.5% separable

  • \(n = 3\): 104 / 256 = 40.6% separable

  • \(n = 4\): 1,882 / 65,536 = 2.87% separable

  • \(n = 5\): 94,572 / 4,294,967,296 = 0.0022% separable

For even moderate \(n\), virtually NO Boolean function can be computed by a single perceptron. This is the fundamental limitation that Minsky and Papert exposed in 1969.

Note that XNOR is the complement of XOR, so their non-separability is related: if XOR were separable, flipping all labels would give XNOR, which would also be separable (just negate the weights and bias).

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from scipy.spatial import ConvexHull

plt.rcParams.update({
    'figure.figsize': (8, 6),
    'font.size': 12,
    'axes.grid': True,
    'grid.alpha': 0.3
})

try:
    plt.style.use('seaborn-v0_8-whitegrid')
except OSError:
    pass
Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Complete table of all 16 two-input Boolean functions with separability status
fig, ax = plt.subplots(figsize=(16, 10))
ax.axis('off')

# Headers
headers = ['#', '(0,0)', '(0,1)', '(1,0)', '(1,1)', 'Name', 'Separable?', 'Weights (w1,w2,b)']

# Data rows
rows = [
    ['0',  '0', '0', '0', '0', 'FALSE',                       'Yes (trivial)', 'b = -1'],
    ['1',  '0', '0', '0', '1', 'AND',                         'Yes',           'w=(1,1), b=-1.5'],
    ['2',  '0', '0', '1', '0', '$x_1 \\wedge \\neg x_2$',     'Yes',           'w=(1,-1), b=-0.5'],
    ['3',  '0', '0', '1', '1', '$x_1$',                       'Yes',           'w=(1,0), b=-0.5'],
    ['4',  '0', '1', '0', '0', '$\\neg x_1 \\wedge x_2$',     'Yes',           'w=(-1,1), b=-0.5'],
    ['5',  '0', '1', '0', '1', '$x_2$',                       'Yes',           'w=(0,1), b=-0.5'],
    ['6',  '0', '1', '1', '0', 'XOR',                         'NO',            '---'],
    ['7',  '0', '1', '1', '1', 'OR',                          'Yes',           'w=(1,1), b=-0.5'],
    ['8',  '1', '0', '0', '0', 'NOR',                         'Yes',           'w=(-1,-1), b=0.5'],
    ['9',  '1', '0', '0', '1', 'XNOR',                        'NO',            '---'],
    ['10', '1', '0', '1', '0', '$\\neg x_2$',                  'Yes',           'w=(0,-1), b=0.5'],
    ['11', '1', '0', '1', '1', '$x_1 \\vee \\neg x_2$',       'Yes',           'w=(1,-1), b=0.5'],
    ['12', '1', '1', '0', '0', '$\\neg x_1$',                  'Yes',           'w=(-1,0), b=0.5'],
    ['13', '1', '1', '0', '1', '$\\neg x_1 \\vee x_2$',       'Yes',           'w=(-1,1), b=0.5'],
    ['14', '1', '1', '1', '0', 'NAND',                        'Yes',           'w=(-1,-1), b=1.5'],
    ['15', '1', '1', '1', '1', 'TRUE',                        'Yes (trivial)', 'b = 1'],
]

# Create table
table = ax.table(
    cellText=rows,
    colLabels=headers,
    cellLoc='center',
    loc='center',
    colWidths=[0.04, 0.06, 0.06, 0.06, 0.06, 0.18, 0.12, 0.18]
)

table.auto_set_font_size(False)
table.set_fontsize(9)
table.scale(1.0, 1.8)

# Style headers
for j in range(len(headers)):
    cell = table[0, j]
    cell.set_facecolor('#2C3E50')
    cell.set_text_props(color='white', fontweight='bold', fontsize=10)

# Style rows
for i in range(1, len(rows) + 1):
    row_data = rows[i-1]
    is_nonsep = row_data[6] == 'NO'
    
    for j in range(len(headers)):
        cell = table[i, j]
        if is_nonsep:
            cell.set_facecolor('#FADBD8')
            cell.set_text_props(fontweight='bold', color='#C0392B')
        else:
            color = '#EBF5FB' if i % 2 == 1 else '#FDFEFE'
            cell.set_facecolor(color)

ax.set_title('Complete Table of All 16 Two-Input Boolean Functions\n'
             'with Linear Separability Status and Perceptron Weights',
             fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()
../_images/20b2ab30121cdb313b736c0debf1f385058e40ec79389684b108350d23e86272.png
Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Define all 16 two-input Boolean functions
# Input: (0,0), (0,1), (1,0), (1,1)
inputs_2 = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

boolean_functions = {
    0:  {'name': 'FALSE',                'outputs': [0,0,0,0], 'separable': True},
    1:  {'name': 'AND',                  'outputs': [0,0,0,1], 'separable': True},
    2:  {'name': '$x_1 \\wedge \\neg x_2$', 'outputs': [0,0,1,0], 'separable': True},
    3:  {'name': '$x_1$',                'outputs': [0,0,1,1], 'separable': True},
    4:  {'name': '$\\neg x_1 \\wedge x_2$', 'outputs': [0,1,0,0], 'separable': True},
    5:  {'name': '$x_2$',                'outputs': [0,1,0,1], 'separable': True},
    6:  {'name': 'XOR',                  'outputs': [0,1,1,0], 'separable': False},
    7:  {'name': 'OR',                   'outputs': [0,1,1,1], 'separable': True},
    8:  {'name': 'NOR',                  'outputs': [1,0,0,0], 'separable': True},
    9:  {'name': 'XNOR',                 'outputs': [1,0,0,1], 'separable': False},
    10: {'name': '$\\neg x_2$',           'outputs': [1,0,1,0], 'separable': True},
    11: {'name': '$x_1 \\vee \\neg x_2$',  'outputs': [1,0,1,1], 'separable': True},
    12: {'name': '$\\neg x_1$',           'outputs': [1,1,0,0], 'separable': True},
    13: {'name': '$\\neg x_1 \\vee x_2$',  'outputs': [1,1,0,1], 'separable': True},
    14: {'name': 'NAND',                 'outputs': [1,1,1,0], 'separable': True},
    15: {'name': 'TRUE',                 'outputs': [1,1,1,1], 'separable': True},
}

# Weights and biases for the separable functions (found analytically or by perceptron)
# For the constant functions (FALSE, TRUE), we use a degenerate boundary
separating_params = {
    0:  (np.array([0, 0]), -1),       # FALSE: always 0 (bias < 0)
    1:  (np.array([1, 1]), -1.5),     # AND
    2:  (np.array([1, -1]), -0.5),    # x1 AND NOT x2
    3:  (np.array([1, 0]), -0.5),     # x1
    4:  (np.array([-1, 1]), -0.5),    # NOT x1 AND x2
    5:  (np.array([0, 1]), -0.5),     # x2
    7:  (np.array([1, 1]), -0.5),     # OR
    8:  (np.array([-1, -1]), 0.5),    # NOR
    10: (np.array([0, -1]), 0.5),     # NOT x2
    11: (np.array([1, -1]), 0.5),     # x1 OR NOT x2
    12: (np.array([-1, 0]), 0.5),     # NOT x1
    13: (np.array([-1, 1]), 0.5),     # NOT x1 OR x2
    14: (np.array([-1, -1]), 1.5),    # NAND
    15: (np.array([0, 0]), 1),        # TRUE: always 1 (bias > 0)
}

# Verify all separable functions
print("Verification of perceptron weights for all separable functions:")
print("=" * 65)
all_correct = True
for idx, info in boolean_functions.items():
    if info['separable'] and idx in separating_params:
        w, b = separating_params[idx]
        outputs = info['outputs']
        predictions = [(inputs_2[i] @ w + b >= 0).astype(int) for i in range(4)]
        correct = all(p == o for p, o in zip(predictions, outputs))
        if not correct:
            all_correct = False
        status = 'OK' if correct else 'FAIL'
        print(f"  {idx:>2d}. {info['name']:>25s}: w={w}, b={b:+.1f} [{status}]")

print(f"\nAll correct: {all_correct}")
Verification of perceptron weights for all separable functions:
=================================================================
   0.                     FALSE: w=[0 0], b=-1.0 [OK]
   1.                       AND: w=[1 1], b=-1.5 [OK]
   2.     $x_1 \wedge \neg x_2$: w=[ 1 -1], b=-0.5 [OK]
   3.                     $x_1$: w=[1 0], b=-0.5 [OK]
   4.     $\neg x_1 \wedge x_2$: w=[-1  1], b=-0.5 [OK]
   5.                     $x_2$: w=[0 1], b=-0.5 [OK]
   7.                        OR: w=[1 1], b=-0.5 [OK]
   8.                       NOR: w=[-1 -1], b=+0.5 [OK]
  10.                $\neg x_2$: w=[ 0 -1], b=+0.5 [OK]
  11.       $x_1 \vee \neg x_2$: w=[ 1 -1], b=+0.5 [OK]
  12.                $\neg x_1$: w=[-1  0], b=+0.5 [OK]
  13.       $\neg x_1 \vee x_2$: w=[-1  1], b=+0.5 [OK]
  14.                      NAND: w=[-1 -1], b=+1.5 [OK]
  15.                      TRUE: w=[0 0], b=+1.0 [OK]

All correct: True

7.4 Counting Linearly Separable Functions#

As the number of inputs \(n\) increases, the total number of Boolean functions \(2^{2^n}\) grows super-exponentially. How many of these are linearly separable?

This is a deep combinatorial question. The answer depends on the number of threshold functions (Boolean functions computable by a single perceptron). Let \(T(n)\) denote the number of linearly separable Boolean functions of \(n\) variables.

Theorem (Counting Linearly Separable Boolean Functions)

Let \(T(n)\) denote the number of linearly separable Boolean functions of \(n\) variables. The known values are:

\(n\)

Total functions \(2^{2^n}\)

Linearly separable \(T(n)\)

Fraction

1

4

4

1.000

2

16

14

0.875

3

256

104

0.406

4

65,536

1,882

0.0287

5

~4.3 billion

94,572

~0.0000220

6

~1.8 x 10^19

15,028,134

~8.3 x 10^-13

The fraction \(T(n) / 2^{2^n}\) goes to zero super-exponentially as \(n\) grows.

Tip

The Combinatorial Explosion of Non-Separable Functions

The total number of Boolean functions grows as \(2^{2^n}\) – a double exponential (also called a tetration). This is incomprehensibly fast:

  • At \(n = 5\), there are about 4.3 billion functions.

  • At \(n = 6\), there are about \(1.8 \times 10^{19}\) functions – more than the number of grains of sand on Earth.

  • At \(n = 10\), there are about \(10^{308}\) functions – vastly more than the number of atoms in the observable universe (\(\sim 10^{80}\)).

The number of linearly separable functions \(T(n)\) also grows, but at a much slower rate. As a result, the fraction of functions that a single perceptron can compute shrinks to effectively zero. This is the mathematical underpinning of the perceptron’s limitation: it can only express a vanishingly small fraction of all possible input-output mappings.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Table and plot of linearly separable function counts
n_values = [1, 2, 3, 4, 5, 6]
total_functions = [2**(2**n) for n in n_values]
lin_sep_counts = [4, 14, 104, 1882, 94572, 15028134]
fractions = [ls / tot for ls, tot in zip(lin_sep_counts, total_functions)]
total_functions_float = [float(x) for x in total_functions]  # convert for matplotlib (2**64 overflows int64)

print("Linearly Separable Boolean Functions")
print("=" * 70)
print(f"{'n':>4} | {'Total 2^(2^n)':>18} | {'Lin. Sep. T(n)':>16} | {'Fraction':>14}")
print("-" * 70)
for n, tot, ls, frac in zip(n_values, total_functions, lin_sep_counts, fractions):
    print(f"{n:>4d} | {tot:>18,} | {ls:>16,} | {frac:>14.6e}")

# Plot
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Panel 1: Log scale of counts
ax = axes[0]
ax.semilogy(n_values, total_functions_float, 'ro-', linewidth=2, markersize=8,
            label=r'Total: $2^{2^n}$')
ax.semilogy(n_values, lin_sep_counts, 'bs-', linewidth=2, markersize=8,
            label='Linearly separable: $T(n)$')

# Shade the gap
ax.fill_between(n_values, lin_sep_counts, total_functions_float, alpha=0.15, color='red',
                label='Non-separable functions')

ax.set_xlabel('Number of inputs $n$', fontsize=13)
ax.set_ylabel('Count (log scale)', fontsize=13)
ax.set_title('Boolean Functions vs. Linearly Separable Ones',
             fontsize=13, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, which='both')
ax.set_xticks(n_values)

# Panel 2: Fraction (log scale)
ax = axes[1]
ax.semilogy(n_values, fractions, 'g^-', linewidth=2, markersize=10,
            color='darkgreen')
ax.set_xlabel('Number of inputs $n$', fontsize=13)
ax.set_ylabel('Fraction (log scale)', fontsize=13)
ax.set_title('Fraction of Boolean Functions That Are\nLinearly Separable',
             fontsize=13, fontweight='bold')
ax.grid(True, alpha=0.3, which='both')
ax.set_xticks(n_values)

# Annotate
for i, (n, frac) in enumerate(zip(n_values, fractions)):
    ax.annotate(f'{frac:.2e}', (n, frac), textcoords='offset points',
                xytext=(10, 5), fontsize=9)

plt.tight_layout()
plt.show()

print("\nConclusion: As n grows, the fraction of linearly separable")
print("functions goes to zero super-exponentially. Most Boolean")
print("functions are NOT computable by a single perceptron.")
Linearly Separable Boolean Functions
======================================================================
   n |      Total 2^(2^n) |   Lin. Sep. T(n) |       Fraction
----------------------------------------------------------------------
   1 |                  4 |                4 |   1.000000e+00
   2 |                 16 |               14 |   8.750000e-01
   3 |                256 |              104 |   4.062500e-01
   4 |             65,536 |            1,882 |   2.871704e-02
   5 |      4,294,967,296 |           94,572 |   2.201926e-05
   6 | 18,446,744,073,709,551,616 |       15,028,134 |   8.146768e-13
/var/folders/z7/wp7m8p7x1250jzvklw5z24mm0000gn/T/ipykernel_46383/599292653.py:42: UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "g^-" (-> color='g'). The keyword argument will take precedence.
  ax.semilogy(n_values, fractions, 'g^-', linewidth=2, markersize=10,
../_images/3f4e6339ec3f4efac2a3a6decc5ce4cff2fea400322a07d0f7766c0df8b5893e.png
Conclusion: As n grows, the fraction of linearly separable
functions goes to zero super-exponentially. Most Boolean
functions are NOT computable by a single perceptron.

Pie Charts: Separable vs. Non-Separable#

The following visualization shows the proportion of linearly separable functions for \(n = 2, 3, 4, 5\) inputs using pie charts.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Pie chart showing separable vs non-separable for n=2,3,4,5
n_vals_pie = [2, 3, 4, 5]
total_fns = [2**(2**n) for n in n_vals_pie]
sep_counts = [14, 104, 1882, 94572]
nonsep_counts = [t - s for t, s in zip(total_fns, sep_counts)]

fig, axes = plt.subplots(1, 4, figsize=(20, 5))

colors_pie = ['#2ecc71', '#e74c3c']

for i, (n, total, sep, nonsep) in enumerate(zip(n_vals_pie, total_fns, sep_counts, nonsep_counts)):
    ax = axes[i]
    frac_sep = sep / total
    frac_nonsep = nonsep / total
    
    sizes = [frac_sep, frac_nonsep]
    
    # For n>=4, the separable slice is too thin -- use explode
    explode = (0.05, 0) if frac_sep > 0.01 else (0.15, 0)
    
    # autopct=None returns only (wedges, texts), not 3 values
    wedges, texts = ax.pie(
        sizes, labels=None, colors=colors_pie,
        explode=explode,
        startangle=90, wedgeprops={'edgecolor': 'black', 'linewidth': 1.5}
    )
    
    ax.set_title(f'$n = {n}$\n$2^{{2^{n}}} = {total:,}$ total',
                 fontsize=12, fontweight='bold')
    
    # Add text below
    ax.text(0, -1.4, f'Separable: {sep:,} ({frac_sep:.2%})',
            ha='center', fontsize=10, color='#27ae60', fontweight='bold')
    ax.text(0, -1.65, f'Non-sep: {nonsep:,} ({frac_nonsep:.2%})',
            ha='center', fontsize=10, color='#c0392b', fontweight='bold')

# Add a shared legend
fig.legend(['Linearly Separable', 'Non-Separable'],
           loc='upper center', ncol=2, fontsize=12,
           bbox_to_anchor=(0.5, 1.05),
           facecolor='white', edgecolor='black')

fig.suptitle('Proportion of Linearly Separable Boolean Functions',
             fontsize=15, fontweight='bold', y=1.12)
plt.tight_layout()
plt.show()
../_images/ec78bbd6e112533b7051bdcaf4641fc56c8714681169e65a835d58ae8fcf8d16.png

7.5 The Convex Hull Criterion#

We now present a beautiful geometric characterization of linear separability.

Theorem (Convex Hull Separability)#

Two finite sets \(S_0, S_1 \subset \mathbb{R}^n\) are linearly separable if and only if their convex hulls are disjoint:

\[\text{conv}(S_0) \cap \text{conv}(S_1) = \emptyset\]

Proof Sketch#

\((\Rightarrow)\) If separable, then convex hulls are disjoint.

Suppose \(\mathbf{w} \cdot \mathbf{x} + b > 0\) for all \(\mathbf{x} \in S_1\) and \(\mathbf{w} \cdot \mathbf{x} + b < 0\) for all \(\mathbf{x} \in S_0\). Let \(\mathbf{p} \in \text{conv}(S_1)\). Then \(\mathbf{p} = \sum_i \lambda_i \mathbf{x}_i\) with \(\mathbf{x}_i \in S_1\), \(\lambda_i \geq 0\), \(\sum \lambda_i = 1\). Hence:

\[\mathbf{w} \cdot \mathbf{p} + b = \sum_i \lambda_i(\mathbf{w} \cdot \mathbf{x}_i + b) > 0\]

Similarly, for any \(\mathbf{q} \in \text{conv}(S_0)\) we get \(\mathbf{w} \cdot \mathbf{q} + b < 0\), hence \(\mathbf{p} \neq \mathbf{q}\), and the convex hulls are disjoint.

\((\Leftarrow)\) If convex hulls are disjoint, then separable.

This follows from the Separating Hyperplane Theorem (a consequence of the Hahn-Banach theorem): if two disjoint convex compact sets in \(\mathbb{R}^n\) exist, there is a hyperplane separating them. Since \(\text{conv}(S_0)\) and \(\text{conv}(S_1)\) are convex and compact (being convex hulls of finite sets), a separating hyperplane exists. \(\blacksquare\)

Why This Matters#

The convex hull criterion gives us an intuitive way to determine linear separability:

  • AND: The class-1 convex hull is the single point \(\{(1,1)\}\). The class-0 convex hull is the triangle with vertices \((0,0), (0,1), (1,0)\). These are disjoint, so AND is separable.

  • XOR: Class-1 points are \(\{(0,1), (1,0)\}\) and class-0 points are \(\{(0,0), (1,1)\}\). The convex hull of class 1 is the line segment from \((0,1)\) to \((1,0)\), and the convex hull of class 0 is the line segment from \((0,0)\) to \((1,1)\). These two segments intersect at \((0.5, 0.5)\), so XOR is NOT separable.

7.6 Convex Hull Visualization#

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from scipy.spatial import ConvexHull

def plot_convex_hulls(X, y, ax, title, function_name):
    """Plot data points and convex hulls for each class.
    
    Parameters
    ----------
    X : np.ndarray of shape (n, 2)
    y : np.ndarray of shape (n,)
    ax : matplotlib axes
    title : str
    function_name : str
    """
    class_0 = X[y == 0]
    class_1 = X[y == 1]
    
    # Plot convex hulls
    for cls_data, color, label in [(class_0, 'red', 'Class 0'),
                                    (class_1, 'blue', 'Class 1')]:
        if len(cls_data) >= 3:
            hull = ConvexHull(cls_data)
            hull_vertices = cls_data[hull.vertices]
            hull_vertices = np.vstack([hull_vertices, hull_vertices[0]])
            polygon = Polygon(hull_vertices, alpha=0.2, color=color,
                            edgecolor=color, linewidth=2)
            ax.add_patch(polygon)
        elif len(cls_data) == 2:
            # Line segment
            ax.plot(cls_data[:, 0], cls_data[:, 1], '-', color=color,
                    linewidth=3, alpha=0.4)
        # Single point: just the scatter will show it
    
    # Plot data points
    if len(class_0) > 0:
        ax.scatter(class_0[:, 0], class_0[:, 1], c='red', marker='o',
                   s=200, edgecolors='black', zorder=5, linewidths=2,
                   label='Class 0')
    if len(class_1) > 0:
        ax.scatter(class_1[:, 0], class_1[:, 1], c='blue', marker='s',
                   s=200, edgecolors='black', zorder=5, linewidths=2,
                   label='Class 1')
    
    ax.set_xlim(-0.3, 1.3)
    ax.set_ylim(-0.3, 1.3)
    ax.set_aspect('equal')
    ax.set_title(title, fontsize=13, fontweight='bold')
    ax.legend(fontsize=10, loc='upper left')
    ax.grid(True, alpha=0.3)


fig, axes = plt.subplots(2, 2, figsize=(14, 14))

# AND: separable (convex hulls disjoint)
plot_convex_hulls(inputs_2, np.array([0,0,0,1]), axes[0,0],
                  'AND: Convex Hulls are DISJOINT\n(Linearly Separable)',
                  'AND')
axes[0,0].set_title('AND: Convex Hulls are DISJOINT\n(Linearly Separable)',
                     fontsize=13, fontweight='bold', color='green')

# OR: separable
plot_convex_hulls(inputs_2, np.array([0,1,1,1]), axes[0,1],
                  'OR: Convex Hulls are DISJOINT\n(Linearly Separable)',
                  'OR')
axes[0,1].set_title('OR: Convex Hulls are DISJOINT\n(Linearly Separable)',
                     fontsize=13, fontweight='bold', color='green')

# XOR: NOT separable (convex hulls intersect)
plot_convex_hulls(inputs_2, np.array([0,1,1,0]), axes[1,0],
                  'XOR: Convex Hulls INTERSECT\n(NOT Linearly Separable)',
                  'XOR')
axes[1,0].set_title('XOR: Convex Hulls INTERSECT\n(NOT Linearly Separable)',
                     fontsize=13, fontweight='bold', color='red')
# Mark intersection point
axes[1,0].plot(0.5, 0.5, 'kx', markersize=15, markeredgewidth=3, zorder=10)
axes[1,0].annotate('Intersection\n(0.5, 0.5)', xy=(0.5, 0.5),
                    xytext=(0.75, 0.15), fontsize=11,
                    arrowprops=dict(arrowstyle='->', color='black'),
                    fontweight='bold')

# XNOR: NOT separable (convex hulls intersect)
plot_convex_hulls(inputs_2, np.array([1,0,0,1]), axes[1,1],
                  'XNOR: Convex Hulls INTERSECT\n(NOT Linearly Separable)',
                  'XNOR')
axes[1,1].set_title('XNOR: Convex Hulls INTERSECT\n(NOT Linearly Separable)',
                     fontsize=13, fontweight='bold', color='red')
axes[1,1].plot(0.5, 0.5, 'kx', markersize=15, markeredgewidth=3, zorder=10)
axes[1,1].annotate('Intersection\n(0.5, 0.5)', xy=(0.5, 0.5),
                    xytext=(0.75, 0.15), fontsize=11,
                    arrowprops=dict(arrowstyle='->', color='black'),
                    fontweight='bold')

fig.suptitle('Convex Hull Criterion for Linear Separability',
             fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
/var/folders/z7/wp7m8p7x1250jzvklw5z24mm0000gn/T/ipykernel_46383/952866437.py:27: UserWarning: Setting the 'color' property will override the edgecolor or facecolor properties.
  polygon = Polygon(hull_vertices, alpha=0.2, color=color,
../_images/5c8d38afcabe12443d4330f9488d69b07252a0ad031e174538b434c98774f955.png

In the XOR and XNOR plots, the convex hulls of the two classes (line segments connecting the respective corners of the unit square) cross at the point \((0.5, 0.5)\). This intersection proves that no hyperplane (line) can separate the two classes.

7.7 VC Dimension#

The Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the capacity of a class of binary classifiers. It tells us the largest number of points that the classifier can “shatter” – that is, classify correctly for every possible labeling.

Definition: Shattering#

A set of points \(S = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\} \subset \mathbb{R}^n\) is shattered by a class of classifiers \(\mathcal{H}\) if, for every possible labeling \((y_1, \ldots, y_m) \in \{0, 1\}^m\), there exists some \(h \in \mathcal{H}\) that correctly classifies all points:

\[h(\mathbf{x}_i) = y_i \quad \text{for all } i = 1, \ldots, m\]

In other words, \(\mathcal{H}\) shatters \(S\) if it can realize all \(2^m\) possible dichotomies of \(S\).

Definition: VC Dimension#

The VC dimension of a hypothesis class \(\mathcal{H}\), denoted \(\text{VCdim}(\mathcal{H})\), is the size of the largest set that \(\mathcal{H}\) can shatter:

\[\text{VCdim}(\mathcal{H}) = \max\{m : \exists S \text{ of size } m \text{ that } \mathcal{H} \text{ shatters}\}\]

Theorem: VC Dimension of the Perceptron#

The VC dimension of the perceptron (half-spaces) in \(\mathbb{R}^n\) is exactly \(n + 1\).

This means:

  • In \(\mathbb{R}^2\) (a line classifier): VCdim = 3. Any 3 points in general position can be shattered; no set of 4 points can always be shattered.

  • In \(\mathbb{R}^n\): VCdim = \(n+1\).

Example: 3 Points in \(\mathbb{R}^2\)#

Consider three non-collinear points in \(\mathbb{R}^2\). There are \(2^3 = 8\) possible labelings. We must show that for each labeling, there exists a line that separates the two classes.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Demonstrate shattering of 3 points in R^2
# Three non-collinear points
points_3 = np.array([[0.0, 0.0], [1.0, 0.0], [0.5, 1.0]])

# All 8 possible labelings
all_labelings = []
for i in range(8):
    labeling = [(i >> bit) & 1 for bit in range(3)]
    all_labelings.append(labeling)

def find_separating_line(X, y):
    """Find separating weights for a small dataset using the perceptron."""
    y_arr = np.array(y)
    
    # Handle trivial cases (all same label)
    if np.all(y_arr == 0):
        return np.array([0.0, 0.0]), -1.0
    if np.all(y_arr == 1):
        return np.array([0.0, 0.0]), 1.0
    
    # Run perceptron
    w = np.zeros(2)
    b = 0.0
    for epoch in range(1000):
        errors = 0
        for i in range(len(X)):
            y_hat = int(w @ X[i] + b >= 0)
            if y_hat != y_arr[i]:
                update = y_arr[i] - y_hat
                w = w + update * X[i]
                b = b + update
                errors += 1
        if errors == 0:
            return w, b
    return w, b  # May not have converged


fig, axes = plt.subplots(2, 4, figsize=(20, 10))

for idx, labeling in enumerate(all_labelings):
    row, col = divmod(idx, 4)
    ax = axes[row, col]
    
    y_label = np.array(labeling)
    
    # Find separating line
    w, b = find_separating_line(points_3, y_label)
    
    # Plot decision regions
    xx, yy = np.meshgrid(np.linspace(-0.5, 1.5, 200),
                         np.linspace(-0.5, 1.5, 200))
    Z = xx * w[0] + yy * w[1] + b
    
    ax.contourf(xx, yy, Z, levels=[-1e10, 0, 1e10],
                colors=['#FFCCCC', '#CCCCFF'], alpha=0.4)
    if np.linalg.norm(w) > 0:
        ax.contour(xx, yy, Z, levels=[0], colors='black', linewidths=2)
    
    # Plot points
    for i in range(3):
        color = 'blue' if y_label[i] == 1 else 'red'
        marker = 's' if y_label[i] == 1 else 'o'
        ax.scatter(points_3[i, 0], points_3[i, 1], c=color, marker=marker,
                   s=200, edgecolors='black', zorder=5, linewidths=2)
    
    label_str = ''.join(str(l) for l in labeling)
    ax.set_title(f'Labeling: ({label_str})', fontsize=11, fontweight='bold')
    ax.set_xlim(-0.5, 1.5)
    ax.set_ylim(-0.5, 1.5)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)

fig.suptitle('Shattering 3 Points in $\\mathbb{R}^2$: All $2^3 = 8$ Labelings\n'
             '(Each has a separating line $\\Rightarrow$ VCdim $\\geq$ 3)',
             fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
../_images/9ef12b2eef1bf5adb7647f694ad57d7f3d65ae3fa7ec881362a62053fbffc417.png
Hide code cell source
import numpy as np

# Show that 4 points in general position in R^2 CANNOT be shattered
# Take 4 points: the vertices of a square
points_4 = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])

# There are 2^4 = 16 labelings. We need to find one that is NOT separable.
# The XOR labeling (0,1,1,0) is not separable, as we've seen.

# Check all 16 labelings
unseparable_count = 0
unseparable_examples = []

for i in range(16):
    labeling = np.array([(i >> bit) & 1 for bit in range(4)])
    
    # Try to find separating line using perceptron (many epochs)
    w = np.zeros(2)
    b = 0.0
    converged = False
    
    # Handle trivial cases
    if np.all(labeling == 0) or np.all(labeling == 1):
        converged = True
    else:
        for epoch in range(500):
            errors = 0
            for j in range(4):
                y_hat = int(w @ points_4[j] + b >= 0)
                if y_hat != labeling[j]:
                    update = labeling[j] - y_hat
                    w = w + update * points_4[j]
                    b = b + update
                    errors += 1
            if errors == 0:
                converged = True
                break
    
    if not converged:
        unseparable_count += 1
        unseparable_examples.append(labeling)

print(f"Out of 16 labelings of 4 points, {unseparable_count} are NOT linearly separable.")
print(f"\nUnseparable labelings:")
for lab in unseparable_examples:
    label_str = ''.join(str(l) for l in lab)
    print(f"  ({label_str})")

print(f"\nSince not all {2**4} labelings can be realized,")
print(f"4 points in R^2 CANNOT be shattered.")
print(f"Therefore VCdim(perceptron in R^2) = 3 = n+1 (with n=2).")
Out of 16 labelings of 4 points, 2 are NOT linearly separable.

Unseparable labelings:
  (0110)
  (1001)

Since not all 16 labelings can be realized,
4 points in R^2 CANNOT be shattered.
Therefore VCdim(perceptron in R^2) = 3 = n+1 (with n=2).
Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Visualize the unseparable labelings of 4 points
n_unsep = len(unseparable_examples)
fig, axes = plt.subplots(1, n_unsep, figsize=(6 * n_unsep, 5))
if n_unsep == 1:
    axes = [axes]

for ax, labeling in zip(axes, unseparable_examples):
    for i in range(4):
        color = 'blue' if labeling[i] == 1 else 'red'
        marker = 's' if labeling[i] == 1 else 'o'
        ax.scatter(points_4[i, 0], points_4[i, 1], c=color, marker=marker,
                   s=200, edgecolors='black', zorder=5, linewidths=2)
    
    # Show that no line works by drawing several failed attempts
    x_line = np.linspace(-0.5, 1.5, 100)
    for angle in np.linspace(0, 180, 12):
        rad = np.radians(angle)
        w_try = np.array([np.cos(rad), np.sin(rad)])
        for b_try in np.linspace(-2, 2, 10):
            preds = np.array([int(w_try @ points_4[j] + b_try >= 0) for j in range(4)])
            if np.array_equal(preds, labeling):
                # Found a separating line (shouldn't happen for XOR/XNOR)
                break
    
    # Draw several failed attempts as gray dashed lines
    for angle in [30, 60, 90, 120, 150]:
        rad = np.radians(angle)
        slope = -np.cos(rad) / (np.sin(rad) + 1e-10)
        y_line = slope * (x_line - 0.5) + 0.5
        ax.plot(x_line, y_line, '--', color='gray', alpha=0.3, linewidth=1)
    
    label_str = ''.join(str(l) for l in labeling)
    ax.set_title(f'Labeling ({label_str}): NOT separable',
                 fontsize=12, fontweight='bold', color='red')
    ax.set_xlim(-0.3, 1.3)
    ax.set_ylim(-0.3, 1.3)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)

fig.suptitle('4 Points in $\\mathbb{R}^2$: Labelings That Cannot Be Separated\n'
             '(Proving VCdim < 4)',
             fontsize=14, fontweight='bold', y=1.05)
plt.tight_layout()
plt.show()
../_images/2ddab03d02564373ea7fca9a3beb4100d14d51bbb246d4a69dd938602351ec50.png

Proof that VCdim = \(n+1\) for Perceptrons in \(\mathbb{R}^n\)#

VCdim \(\geq n+1\) (Lower bound):

Consider the \(n+1\) points in \(\mathbb{R}^n\) given by the origin and the \(n\) standard basis vectors: \(\{\mathbf{0}, \mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n\}\). These \(n+1\) points can be shattered by half-spaces. For any labeling \(y_0, y_1, \ldots, y_n \in \{0, 1\}\), one can construct a weight vector \(\mathbf{w}\) and bias \(b\) that achieves the labeling.

VCdim \(\leq n+1\) (Upper bound, via Radon’s theorem):

Radon’s theorem states that any set of \(n+2\) points in \(\mathbb{R}^n\) can be partitioned into two subsets whose convex hulls intersect. But if the convex hulls of the two classes intersect, the labeling is not linearly separable (by the convex hull criterion). Therefore, no set of \(n+2\) points can be shattered.

Combining: VCdim = \(n+1\). \(\blacksquare\)

7.8 Exercises#

Exercise 7.1: Symmetric Truth Tables#

A Boolean function \(f(x_1, x_2)\) has a symmetric truth table if swapping the inputs does not change the output: \(f(x_1, x_2) = f(x_2, x_1)\) for all inputs.

  1. Which of the 16 two-input Boolean functions have symmetric truth tables? (List them.)

  2. Among the symmetric ones, which are linearly separable?

  3. Verify that AND, OR, NAND, NOR are the only symmetric, linearly separable, non-trivial gates.

Exercise 7.2: The 3-Input Majority Function#

The majority function of 3 inputs outputs 1 if and only if at least 2 of the 3 inputs are 1:

\[\begin{split}\text{MAJ}(x_1, x_2, x_3) = \begin{cases} 1 & \text{if } x_1 + x_2 + x_3 \geq 2 \\ 0 & \text{otherwise} \end{cases}\end{split}\]
  1. Write out the complete truth table.

  2. Find perceptron weights \(\mathbf{w}\) and bias \(b\) that compute this function.

  3. Verify your answer. Is the majority function linearly separable?

Exercise 7.3: Unshattering#

Find a set of 3 points in \(\mathbb{R}^2\) that cannot be shattered by any line. Why does this not contradict the fact that VCdim = 3?

Hide code cell source
import numpy as np

# Exercise 7.1 verification
print("=" * 60)
print("Exercise 7.1: Symmetric Boolean Functions")
print("=" * 60)
print()
print("A function f is symmetric if f(x1,x2) = f(x2,x1),")
print("i.e., f(0,1) = f(1,0).")
print()

symmetric_functions = []
for idx, info in boolean_functions.items():
    outputs = info['outputs']
    # Check if f(0,1) == f(1,0), i.e., outputs[1] == outputs[2]
    if outputs[1] == outputs[2]:
        symmetric_functions.append(idx)
        sep_str = 'Yes' if info['separable'] else 'No'
        trivial = '(trivial)' if idx in [0, 15] else ''
        print(f"  #{idx:>2d}: {info['name']:>25s} | Separable: {sep_str} {trivial}")

print(f"\nTotal symmetric functions: {len(symmetric_functions)}")
print("Non-trivial symmetric and separable: AND, OR, NOR, NAND")
============================================================
Exercise 7.1: Symmetric Boolean Functions
============================================================

A function f is symmetric if f(x1,x2) = f(x2,x1),
i.e., f(0,1) = f(1,0).

  # 0:                     FALSE | Separable: Yes (trivial)
  # 1:                       AND | Separable: Yes 
  # 6:                       XOR | Separable: No 
  # 7:                        OR | Separable: Yes 
  # 8:                       NOR | Separable: Yes 
  # 9:                      XNOR | Separable: No 
  #14:                      NAND | Separable: Yes 
  #15:                      TRUE | Separable: Yes (trivial)

Total symmetric functions: 8
Non-trivial symmetric and separable: AND, OR, NOR, NAND
Hide code cell source
import numpy as np

# Exercise 7.2: 3-input majority function
print("=" * 60)
print("Exercise 7.2: 3-Input Majority Function")
print("=" * 60)
print()

# Truth table
inputs_3 = np.array([[0,0,0], [0,0,1], [0,1,0], [0,1,1],
                      [1,0,0], [1,0,1], [1,1,0], [1,1,1]])
y_majority = (inputs_3.sum(axis=1) >= 2).astype(int)

print("Truth table:")
print(f"  {'x1':>3} {'x2':>3} {'x3':>3} | {'MAJ':>4}")
print("  " + "-" * 20)
for x, y in zip(inputs_3, y_majority):
    print(f"  {x[0]:>3} {x[1]:>3} {x[2]:>3} | {y:>4}")

# Find weights: w = (1,1,1), b = -1.5
w_maj = np.array([1, 1, 1])
b_maj = -1.5

print(f"\nProposed weights: w = {w_maj}, b = {b_maj}")
print("\nVerification:")
for x, y in zip(inputs_3, y_majority):
    z = w_maj @ x + b_maj
    y_hat = int(z >= 0)
    status = 'OK' if y_hat == y else 'FAIL'
    print(f"  x={x}, z={z:+.1f}, y_hat={y_hat}, y={y} [{status}]")

print("\nThe 3-input majority function IS linearly separable.")
============================================================
Exercise 7.2: 3-Input Majority Function
============================================================

Truth table:
   x1  x2  x3 |  MAJ
  --------------------
    0   0   0 |    0
    0   0   1 |    0
    0   1   0 |    0
    0   1   1 |    1
    1   0   0 |    0
    1   0   1 |    1
    1   1   0 |    1
    1   1   1 |    1

Proposed weights: w = [1 1 1], b = -1.5

Verification:
  x=[0 0 0], z=-1.5, y_hat=0, y=0 [OK]
  x=[0 0 1], z=-0.5, y_hat=0, y=0 [OK]
  x=[0 1 0], z=-0.5, y_hat=0, y=0 [OK]
  x=[0 1 1], z=+0.5, y_hat=1, y=1 [OK]
  x=[1 0 0], z=-0.5, y_hat=0, y=0 [OK]
  x=[1 0 1], z=+0.5, y_hat=1, y=1 [OK]
  x=[1 1 0], z=+0.5, y_hat=1, y=1 [OK]
  x=[1 1 1], z=+1.5, y_hat=1, y=1 [OK]

The 3-input majority function IS linearly separable.
Hide code cell source
import numpy as np

# Exercise 7.3: Three collinear points cannot be shattered
print("=" * 60)
print("Exercise 7.3: Three Collinear Points")
print("=" * 60)
print()

# Three collinear points
collinear_points = np.array([[0, 0], [1, 1], [2, 2]])

print("Points: (0,0), (1,1), (2,2) -- all on the line y = x")
print()

# Try all 8 labelings
print("Trying all 8 labelings:")
fail_count = 0

for i in range(8):
    labeling = np.array([(i >> bit) & 1 for bit in range(3)])
    label_str = ''.join(str(l) for l in labeling)
    
    # Try perceptron
    w = np.zeros(2)
    b = 0.0
    converged = False
    
    if np.all(labeling == 0) or np.all(labeling == 1):
        converged = True
    else:
        for epoch in range(500):
            errors = 0
            for j in range(3):
                y_hat = int(w @ collinear_points[j] + b >= 0)
                if y_hat != labeling[j]:
                    update = labeling[j] - y_hat
                    w = w + update * collinear_points[j]
                    b = b + update
                    errors += 1
            if errors == 0:
                converged = True
                break
    
    status = 'separable' if converged else 'NOT separable'
    if not converged:
        fail_count += 1
    print(f"  ({label_str}): {status}")

print(f"\n{fail_count} labeling(s) are not separable.")
print(f"Therefore, these 3 collinear points CANNOT be shattered.")
print()
print("This does NOT contradict VCdim = 3, because the VC dimension")
print("only requires that SOME set of 3 points can be shattered,")
print("not ALL sets. Collinear points are in 'degenerate position.'")
============================================================
Exercise 7.3: Three Collinear Points
============================================================

Points: (0,0), (1,1), (2,2) -- all on the line y = x

Trying all 8 labelings:
  (000): separable
  (100): separable
  (010): NOT separable
  (110): separable
  (001): separable
  (101): NOT separable
  (011): separable
  (111): separable

2 labeling(s) are not separable.
Therefore, these 3 collinear points CANNOT be shattered.

This does NOT contradict VCdim = 3, because the VC dimension
only requires that SOME set of 3 points can be shattered,
not ALL sets. Collinear points are in 'degenerate position.'