Chapter 8: The XOR Problem#
The Most Famous Impossibility Result in Neural Network Theory#
The XOR (exclusive-or) problem stands as one of the most important results in the history of artificial intelligence. It is the simplest Boolean function that a single-layer perceptron cannot compute. This seemingly modest observation had enormous consequences: it helped trigger the first “AI winter,” a period of reduced funding and interest in neural network research that lasted from roughly 1969 to 1986.
In this chapter, we provide two complete proofs that XOR is not linearly separable, visualize the geometric obstruction, watch a perceptron fail to learn XOR, and then show how adding a single hidden layer resolves the problem entirely.
Why XOR Matters#
XOR is not just an abstract curiosity. It is:
The simplest non-linearly-separable Boolean function (only 2 inputs, 4 data points).
Functionally complete when combined with AND and a constant 1: {XOR, AND, 1} can generate any Boolean function via algebraic normal form.
The 2-bit case of the PARITY function, which generalizes to \(n\) bits.
A pedagogical gem: the failure is visible in a 2D plot with just 4 points.
Tip
Why XOR is the simplest non-linearly-separable function.
There are exactly \(2^{2^2} = 16\) Boolean functions of two variables. Of these 16, exactly 14 are linearly separable (AND, OR, NAND, NOR, constant 0/1, identity, negation, implications, etc.). Only 2 are not: XOR and XNOR (its complement). XOR uses the absolute minimum — 2 inputs, 4 data points — to demonstrate the failure of linear separability. You literally cannot construct a simpler counterexample.
Danger
THE critical limitation — XOR impossibility killed perceptron research and triggered the first AI Winter. This is the single most consequential result in early neural network history.
When Minsky and Papert published their proof in 1969, funding agencies interpreted it as evidence that neural networks were a dead end. DARPA and other agencies redirected funding to symbolic AI. Researchers who had devoted their careers to neural networks found themselves unable to publish or obtain grants. The field went into hibernation for nearly 17 years, until the backpropagation revolution of 1986.
Note
The Minsky-Rosenblatt Controversy
The story of the XOR problem is inseparable from the rivalry between Marvin Minsky and Frank Rosenblatt. Both were at Cornell in the late 1950s. Rosenblatt was the optimistic champion of the perceptron, making bold claims about its potential (some justified, some not). Minsky was the skeptic, who saw fundamental limitations that Rosenblatt’s enthusiasm glossed over.
Minsky’s 1969 book Perceptrons, co-authored with Seymour Papert, was in part a response to what Minsky perceived as overblown hype. The XOR impossibility proof was the book’s centerpiece. Rosenblatt died in a boating accident in 1971, at age 43, before he could respond to or outlive the criticism. The field he pioneered would not recover until long after his death.
The tragedy is that Minsky and Papert were mathematically correct but strategically devastating. Their theorems applied only to single-layer perceptrons, yet the broader community interpreted them as condemning all neural networks.
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
# Set consistent style
plt.rcParams.update({
'figure.figsize': (8, 6),
'font.size': 12,
'axes.grid': True,
'grid.alpha': 0.3
})
1. XOR Definition#
The exclusive-or function returns 1 when exactly one of its inputs is 1, and 0 otherwise.
Truth Table#
\(x_1\) |
\(x_2\) |
\(x_1 \oplus x_2\) |
|---|---|---|
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
Algebraic Identities#
XOR can be expressed algebraically in several equivalent ways:
This is because:
When both are 0: \(0 + 0 - 0 = 0\)
When one is 1: \(1 + 0 - 0 = 1\) or \(0 + 1 - 0 = 1\)
When both are 1: \(1 + 1 - 2 = 0\)
Equivalently:
since \((x_1 - x_2)^2 = x_1^2 - 2x_1 x_2 + x_2^2 = x_1 - 2x_1 x_2 + x_2\) for binary inputs (where \(x^2 = x\)).
2. Proof 1: Algebraic Proof of Non-Separability#
Theorem (XOR Impossibility)
No single perceptron (with step activation) can compute XOR. That is, there exist no weights \(w_1, w_2\) and bias \(b\) such that
computes the XOR function.
Proof. Suppose for contradiction that there exist weights \(w_1, w_2\) and bias \(b\) such that
computes XOR, where \(\text{step}(z) = 1\) if \(z \geq 0\) and \(\text{step}(z) = 0\) if \(z < 0\).
From the truth table, we require:
Input |
Output |
Condition |
|---|---|---|
\((0,0) \to 0\) |
\(w_1(0) + w_2(0) + b < 0\) |
\(b < 0\) … (i) |
\((0,1) \to 1\) |
\(w_1(0) + w_2(1) + b \geq 0\) |
\(w_2 + b \geq 0\) … (ii) |
\((1,0) \to 1\) |
\(w_1(1) + w_2(0) + b \geq 0\) |
\(w_1 + b \geq 0\) … (iii) |
\((1,1) \to 0\) |
\(w_1(1) + w_2(1) + b < 0\) |
\(w_1 + w_2 + b < 0\) … (iv) |
Adding inequalities (ii) and (iii):
From inequality (iv):
Combining (v) and (vi):
This requires \(-2b < -b\), which simplifies to:
But this contradicts inequality (i), which requires \(b < 0\).
3. Proof 2: Geometric Proof of Non-Separability#
Proof (Geometric Argument)
Theorem. The XOR dataset is not linearly separable.
Proof. Consider the two classes:
Class 0 (output 0): \(C_0 = \{(0,0), (1,1)\}\)
Class 1 (output 1): \(C_1 = \{(0,1), (1,0)\}\)
The convex hull of \(C_0\) is the line segment from \((0,0)\) to \((1,1)\):
The convex hull of \(C_1\) is the line segment from \((0,1)\) to \((1,0)\):
These two segments intersect at \((0.5, 0.5)\) (set \(t = 0.5\) and \(s = 0.5\)).
By the Separating Hyperplane Theorem, two sets are linearly separable if and only if their convex hulls are disjoint. Since \(\text{conv}(C_0) \cap \text{conv}(C_1) = \{(0.5, 0.5)\} \neq \emptyset\), the XOR dataset is not linearly separable. \(\blacksquare\)
Show code cell source
# Geometric proof: Visualize XOR points and convex hulls
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
# XOR data points
class0_x = np.array([[0, 0], [1, 1]]) # output 0
class1_x = np.array([[0, 1], [1, 0]]) # output 1
# Draw convex hulls (line segments in this case)
# Class 0 hull: line from (0,0) to (1,1) - diagonal
t = np.linspace(0, 1, 100)
ax.plot(t, t, 'b-', linewidth=3, alpha=0.4, label='conv(Class 0)')
ax.fill_between(t, t - 0.02, t + 0.02, alpha=0.15, color='blue')
# Class 1 hull: line from (0,1) to (1,0) - anti-diagonal
ax.plot(t, 1 - t, 'r-', linewidth=3, alpha=0.4, label='conv(Class 1)')
ax.fill_between(t, 1 - t - 0.02, 1 - t + 0.02, alpha=0.15, color='red')
# Plot the intersection point
ax.plot(0.5, 0.5, 'k*', markersize=20, zorder=10, label='Intersection (0.5, 0.5)')
# Plot data points
ax.scatter(class0_x[:, 0], class0_x[:, 1], c='blue', s=200, zorder=5,
edgecolors='black', linewidths=2, marker='o', label='Class 0: XOR=0')
ax.scatter(class1_x[:, 0], class1_x[:, 1], c='red', s=200, zorder=5,
edgecolors='black', linewidths=2, marker='s', label='Class 1: XOR=1')
# Annotate points
offsets = {(0, 0): (-0.15, -0.1), (1, 1): (0.05, 0.05),
(0, 1): (-0.15, 0.05), (1, 0): (0.05, -0.1)}
for pt, label in [((0, 0), '(0,0)\u21920'), ((1, 1), '(1,1)\u21920'),
((0, 1), '(0,1)\u21921'), ((1, 0), '(1,0)\u21921')]:
ox, oy = offsets[pt]
ax.annotate(label, pt, xytext=(pt[0]+ox, pt[1]+oy), fontsize=13, fontweight='bold')
ax.set_xlim(-0.3, 1.3)
ax.set_ylim(-0.3, 1.3)
ax.set_xlabel('$x_1$', fontsize=14)
ax.set_ylabel('$x_2$', fontsize=14)
ax.set_title('XOR: Convex Hulls Intersect \u2192 Not Linearly Separable', fontsize=14)
ax.legend(fontsize=11, loc='upper right')
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
3a. AND vs XOR: Side-by-Side Decision Boundary Comparison#
To build intuition for why XOR fails where AND succeeds, let us place the two problems side by side. For AND, a single line cleanly separates the classes. For XOR, no line can do the job.
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_and = np.array([0, 0, 0, 1])
y_xor = np.array([0, 1, 1, 0])
for ax, y_vals, title, subtitle in [
(axes[0], y_and, 'AND Function', 'Linearly Separable'),
(axes[1], y_xor, 'XOR Function', 'NOT Linearly Separable')
]:
# Decision region background
xx, yy = np.meshgrid(np.linspace(-0.5, 1.5, 300), np.linspace(-0.5, 1.5, 300))
if 'AND' in title:
# AND: w1=1, w2=1, b=-1.5 => separating line: x1 + x2 = 1.5
Z = (xx + yy >= 1.5).astype(float)
ax.contourf(xx, yy, Z, levels=[-0.5, 0.5, 1.5], colors=['#BBDDFF', '#FFBBBB'], alpha=0.4)
ax.contour(xx, yy, Z, levels=[0.5], colors=['green'], linewidths=3)
ax.text(0.2, 1.35, 'Separating line: $x_1 + x_2 = 1.5$', fontsize=11, color='green', fontweight='bold')
else:
# XOR: show several failed attempts
x_line = np.linspace(-0.5, 1.5, 100)
for slope, intercept, ls in [(-1, 0.5, '--'), (-1, 1.5, '-.'), (-0.5, 0.8, ':')]:
ax.plot(x_line, slope * x_line + intercept, color='gray', linestyle=ls,
linewidth=2, alpha=0.5)
ax.text(0.05, 1.35, 'No single line works!', fontsize=12, color='red', fontweight='bold')
# Plot data points
for i in range(4):
color = 'red' if y_vals[i] == 1 else 'blue'
marker = 's' if y_vals[i] == 1 else 'o'
ax.scatter(X[i, 0], X[i, 1], c=color, s=300, marker=marker,
edgecolors='black', linewidths=2, zorder=5)
ax.annotate(f'({X[i,0]},{X[i,1]})\u2192{y_vals[i]}',
(X[i, 0], X[i, 1]),
xytext=(X[i, 0] + 0.08, X[i, 1] - 0.12),
fontsize=11, fontweight='bold')
ax.set_xlim(-0.5, 1.5)
ax.set_ylim(-0.5, 1.5)
ax.set_xlabel('$x_1$', fontsize=14)
ax.set_ylabel('$x_2$', fontsize=14)
ax.set_title(f'{title}\n({subtitle})', fontsize=14)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
plt.suptitle('AND vs XOR: Why One Line Is Enough for AND but Not for XOR',
fontsize=15, y=1.02)
plt.tight_layout()
plt.show()
3b. Convex Hull Intersection: Why XOR Fails#
The convex hull criterion provides the definitive geometric test. Below we show the convex hulls for AND (disjoint) and XOR (intersecting) side by side, making the obstruction visually obvious.
Show code cell source
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon as MplPolygon
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_and = np.array([0, 0, 0, 1])
y_xor = np.array([0, 1, 1, 0])
for ax, y_vals, title in [
(axes[0], y_and, 'AND: Convex Hulls Are DISJOINT'),
(axes[1], y_xor, 'XOR: Convex Hulls INTERSECT')
]:
class0 = X[y_vals == 0]
class1 = X[y_vals == 1]
# Draw convex hulls
if len(class0) >= 3:
# Compute simple convex hull ordering for polygon
from itertools import combinations
centroid = class0.mean(axis=0)
angles = np.arctan2(class0[:, 1] - centroid[1], class0[:, 0] - centroid[0])
order = np.argsort(angles)
hull_pts = class0[order]
poly = MplPolygon(hull_pts, alpha=0.2, color='blue', edgecolor='blue', linewidth=2)
ax.add_patch(poly)
elif len(class0) == 2:
ax.plot(class0[:, 0], class0[:, 1], 'b-', linewidth=3, alpha=0.4)
if len(class1) >= 3:
centroid = class1.mean(axis=0)
angles = np.arctan2(class1[:, 1] - centroid[1], class1[:, 0] - centroid[0])
order = np.argsort(angles)
hull_pts = class1[order]
poly = MplPolygon(hull_pts, alpha=0.2, color='red', edgecolor='red', linewidth=2)
ax.add_patch(poly)
elif len(class1) == 2:
ax.plot(class1[:, 0], class1[:, 1], 'r-', linewidth=3, alpha=0.4)
elif len(class1) == 1:
pass # Single point, hull is just the point
# Check intersection for XOR case
if 'XOR' in title:
ax.plot(0.5, 0.5, 'k*', markersize=20, zorder=10, label='Intersection')
ax.annotate('Hulls intersect\nat (0.5, 0.5)!', (0.5, 0.5),
xytext=(0.6, 0.2), fontsize=12, color='black', fontweight='bold',
arrowprops=dict(arrowstyle='->', color='black', linewidth=2))
else:
ax.annotate('Hulls are\ndisjoint!', (0.5, 0.5),
xytext=(0.4, 0.5), fontsize=12, color='green', fontweight='bold')
# Plot data points
ax.scatter(class0[:, 0], class0[:, 1], c='blue', s=250, zorder=5,
edgecolors='black', linewidths=2, marker='o', label='Class 0')
ax.scatter(class1[:, 0], class1[:, 1], c='red', s=250, zorder=5,
edgecolors='black', linewidths=2, marker='s', label='Class 1')
# Annotate points
for i in range(4):
ax.annotate(f'({X[i,0]},{X[i,1]})\u2192{y_vals[i]}',
(X[i, 0], X[i, 1]),
xytext=(X[i, 0] + 0.07, X[i, 1] + 0.07),
fontsize=10, fontweight='bold')
ax.set_xlim(-0.3, 1.4)
ax.set_ylim(-0.3, 1.4)
ax.set_xlabel('$x_1$', fontsize=14)
ax.set_ylabel('$x_2$', fontsize=14)
sep_status = 'Separable' if 'AND' in title else 'NOT Separable'
ax.set_title(f'{title}\n\u2192 {sep_status}', fontsize=13)
ax.legend(fontsize=10, loc='upper right')
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
plt.suptitle('Convex Hull Criterion: Disjoint Hulls \u21D4 Linearly Separable',
fontsize=15, y=1.02)
plt.tight_layout()
plt.show()
/var/folders/z7/wp7m8p7x1250jzvklw5z24mm0000gn/T/ipykernel_83351/3809229431.py:26: UserWarning: Setting the 'color' property will override the edgecolor or facecolor properties.
poly = MplPolygon(hull_pts, alpha=0.2, color='blue', edgecolor='blue', linewidth=2)
4. The Perceptron Trying to Learn XOR#
Let us now watch a perceptron attempt to learn XOR using the classical perceptron learning rule. We know from the algebraic proof that convergence is impossible. The perceptron convergence theorem guarantees convergence only when the data is linearly separable. For XOR, the weights will cycle endlessly, and the number of errors per epoch will oscillate without ever reaching zero.
Show code cell source
# Perceptron trying (and failing) to learn XOR
# XOR dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 0])
def step(z):
return (z >= 0).astype(int)
# Perceptron learning
np.random.seed(42)
w = np.random.randn(2) * 0.5
b = 0.0
lr = 0.1
n_epochs = 100
errors_per_epoch = []
weight_history = [(w.copy(), b)]
for epoch in range(n_epochs):
errors = 0
for i in range(len(X)):
z = np.dot(w, X[i]) + b
y_pred = step(z)
error = y[i] - y_pred
if error != 0:
w = w + lr * error * X[i]
b = b + lr * error
errors += 1
errors_per_epoch.append(errors)
weight_history.append((w.copy(), b))
print(f"Final weights: w = {w}, b = {b:.4f}")
print(f"Errors in last 10 epochs: {errors_per_epoch[-10:]}")
print(f"Total epochs with zero errors: {sum(1 for e in errors_per_epoch if e == 0)}")
Final weights: w = [-0.05164292 -0.06913215], b = 0.0000
Errors in last 10 epochs: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
Total epochs with zero errors: 0
Show code cell source
# Visualize the perceptron's failure
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1: Errors per epoch (oscillating)
ax1 = axes[0]
ax1.plot(range(1, n_epochs + 1), errors_per_epoch, 'r-o', markersize=3, linewidth=1)
ax1.set_xlabel('Epoch', fontsize=12)
ax1.set_ylabel('Number of Errors', fontsize=12)
ax1.set_title('Perceptron on XOR: Errors Never Reach Zero', fontsize=13)
ax1.set_ylim(-0.5, max(errors_per_epoch) + 0.5)
ax1.axhline(y=0, color='green', linestyle='--', alpha=0.7, label='Target: 0 errors')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)
# Plot 2: Weight trajectory
ax2 = axes[1]
w1_hist = [wh[0][0] for wh in weight_history]
w2_hist = [wh[0][1] for wh in weight_history]
b_hist = [wh[1] for wh in weight_history]
ax2.plot(w1_hist, w2_hist, 'b-', alpha=0.5, linewidth=0.8)
ax2.plot(w1_hist[0], w2_hist[0], 'go', markersize=12, label='Start', zorder=5)
ax2.plot(w1_hist[-1], w2_hist[-1], 'rs', markersize=12, label='End', zorder=5)
ax2.set_xlabel('$w_1$', fontsize=14)
ax2.set_ylabel('$w_2$', fontsize=14)
ax2.set_title('Weight Trajectory: Cycling, Never Settling', fontsize=13)
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
As predicted by the theory, the perceptron never converges. The error count oscillates between 1 and 4 across epochs, and the weight vector cycles through different configurations without settling. This is not a failure of the learning rate or initialization — it is a fundamental impossibility.
5. Why XOR Matters#
The XOR problem is far more than an academic exercise. It occupies a central place in the theory and history of neural networks for several interconnected reasons:
Simplest Non-Linearly-Separable Function#
Among all Boolean functions of two variables, XOR is the simplest that cannot be computed by a single perceptron. The 16 Boolean functions of 2 inputs include AND, OR, NAND, NOR (all linearly separable) and XOR, XNOR (not separable). XOR uses the minimum number of inputs and data points to exhibit the limitation.
Functional Completeness#
The set \(\{\text{XOR}, \text{AND}, 1\}\) is functionally complete: with a constant 1 available, XOR and AND generate every Boolean function via algebraic normal form. This means that the inability to compute XOR is not a minor gap — it locks out an entire class of computations.
Parity Generalization#
XOR is \(\text{PARITY}(2)\): it outputs 1 when an odd number of inputs are 1. The \(n\)-bit parity function
is equally impossible for a single perceptron, and Minsky and Papert proved this requires examining all input bits simultaneously (order \(n\)).
Pedagogical Clarity#
With only 4 data points in 2 dimensions, the XOR problem can be fully visualized. The geometric intuition — two classes whose convex hulls intersect — is immediate and memorable.
8. Alternative Construction: The “Band” Approach#
There is a second elegant way to solve XOR with 2 hidden neurons, using a different geometric idea.
Construction 2: Dual Threshold#
Hidden neuron 1: \(h_1 = \text{step}(x_1 + x_2 - 0.5)\)
Fires when \(x_1 + x_2 \geq 0.5\) (i.e., when the sum is at least 1)
Hidden neuron 2: \(h_2 = \text{step}(x_1 + x_2 - 1.5)\)
Fires when \(x_1 + x_2 \geq 1.5\) (i.e., when both inputs are 1)
Output neuron: \(y = \text{step}(h_1 - h_2 - 0.5)\)
Fires when \(h_1 = 1\) and \(h_2 = 0\) (exactly in the “band” between the two thresholds)
Geometric Interpretation#
The two hidden neurons define two parallel lines in input space:
Line 1: \(x_1 + x_2 = 0.5\)
Line 2: \(x_1 + x_2 = 1.5\)
The output neuron selects the band between these two lines: it fires for points where \(0.5 \leq x_1 + x_2 < 1.5\). This band contains exactly the XOR=1 points.
Show code cell source
# Alternative Construction: Band approach
print("Construction 2: Dual Threshold (Band)")
print("=" * 60)
print(f"{'x1':>3} {'x2':>3} | {'sum':>4} {'h1(>=0.5)':>10} {'h2(>=1.5)':>10} | {'y':>3} {'target':>7}")
print("-" * 60)
for i in range(4):
x1, x2 = X[i]
s = x1 + x2
h1 = int(s >= 0.5)
h2 = int(s >= 1.5)
y_pred = int(h1 - h2 - 0.5 >= 0)
match = 'OK' if y_pred == y_true[i] else 'FAIL'
print(f"{x1:>3.0f} {x2:>3.0f} | {s:>4.1f} {h1:>10} {h2:>10} | {y_pred:>3} {y_true[i]:>7} {match}")
print("\nAll outputs match!")
Construction 2: Dual Threshold (Band)
============================================================
x1 x2 | sum h1(>=0.5) h2(>=1.5) | y target
------------------------------------------------------------
0 0 | 0.0 0 0 | 0 0 OK
0 1 | 1.0 1 0 | 1 1 OK
1 0 | 1.0 1 0 | 1 1 OK
1 1 | 2.0 1 1 | 0 0 OK
All outputs match!
Show code cell source
# Visualize the "band" construction
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
# Draw the two parallel decision boundaries
x_range = np.linspace(-0.3, 1.5, 200)
# Line 1: x1 + x2 = 0.5
ax.plot(x_range, 0.5 - x_range, 'g-', linewidth=2.5, label='$x_1 + x_2 = 0.5$')
# Line 2: x1 + x2 = 1.5
ax.plot(x_range, 1.5 - x_range, 'purple', linewidth=2.5, linestyle='-', label='$x_1 + x_2 = 1.5$')
# Shade the band between the two lines
x_fill = np.linspace(-0.3, 1.5, 200)
y_lower = np.clip(0.5 - x_fill, -0.5, 1.5)
y_upper = np.clip(1.5 - x_fill, -0.5, 1.5)
ax.fill_between(x_fill, y_lower, y_upper, alpha=0.15, color='orange', label='Band: XOR = 1 region')
# Plot data points
X_class0 = X[y_true == 0]
X_class1 = X[y_true == 1]
ax.scatter(X_class0[:, 0], X_class0[:, 1], c='blue', s=250, zorder=5,
edgecolors='black', linewidths=2, marker='o', label='XOR = 0')
ax.scatter(X_class1[:, 0], X_class1[:, 1], c='red', s=250, zorder=5,
edgecolors='black', linewidths=2, marker='s', label='XOR = 1')
# Annotations
ax.annotate('sum < 0.5', (0, 0), xytext=(-0.25, -0.2), fontsize=12, color='blue')
ax.annotate('0.5 \u2264 sum < 1.5', (0.5, 0.5), xytext=(0.15, 0.3), fontsize=12, color='red')
ax.annotate('sum \u2265 1.5', (1, 1), xytext=(1.05, 1.1), fontsize=12, color='blue')
ax.set_xlim(-0.5, 1.5)
ax.set_ylim(-0.5, 1.5)
ax.set_xlabel('$x_1$', fontsize=14)
ax.set_ylabel('$x_2$', fontsize=14)
ax.set_title('XOR via Band: Two Parallel Decision Boundaries', fontsize=14)
ax.legend(fontsize=11, loc='upper right')
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
9. Exercises#
Exercise 8.1: XNOR with a 2-Layer Network#
The XNOR function is the complement of XOR:
\(x_1\) |
\(x_2\) |
XNOR |
|---|---|---|
0 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
Design a 2-layer network (2 hidden neurons + 1 output neuron) that computes XNOR. Give the exact weights and biases, and verify with a truth table.
Hint: XNOR = NOT(XOR). How can you negate the output of the XOR network?
Exercise 8.3: 3-Bit Parity Network#
Design a network that computes \(\text{PARITY}(x_1, x_2, x_3) = x_1 \oplus x_2 \oplus x_3\).
The truth table is:
\(x_1\) |
\(x_2\) |
\(x_3\) |
Parity |
|---|---|---|---|
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
How many hidden neurons do you need? Can you organize them into layers?
Hint: Decompose as \(\text{XOR}(\text{XOR}(x_1, x_2), x_3)\).
Show code cell source
# Skeleton for Exercise 8.3: 3-bit parity
# Students should fill in the weights
X3 = np.array([[0,0,0], [0,0,1], [0,1,0], [0,1,1],
[1,0,0], [1,0,1], [1,1,0], [1,1,1]])
y3 = np.array([0, 1, 1, 0, 1, 0, 0, 1])
def parity_network(x):
"""Compute 3-bit parity using a multi-layer network.
TODO: Fill in the weights and biases.
Hint: First compute XOR(x1, x2), then XOR(result, x3).
"""
x1, x2, x3 = x
# Layer 1: compute intermediate values for XOR(x1, x2)
# h1 = step(???)
# h2 = step(???)
# Layer 2: combine to get XOR(x1, x2), then prepare for XOR with x3
# ...
# Output layer
# y = step(???)
pass # Replace with implementation
print("Exercise 8.3: Implement the parity_network function above.")
print("Then verify:")
print("for i in range(8):")
print(" assert parity_network(X3[i]) == y3[i]")
Exercise 8.3: Implement the parity_network function above.
Then verify:
for i in range(8):
assert parity_network(X3[i]) == y3[i]