Novikoff's Theorem (1962): If the training data is linearly separable with margin $\gamma$ and all points lie within a ball of radius $R$, then the perceptron algorithm converges in at most $k \le R^2 / \gamma^2$ updates. The proof uses a squeeze argument: a lower bound on $\mathbf{w}^* \cdot \mathbf{w}_k$ grows as $k\gamma$, while an upper bound on $\|\mathbf{w}_k\|^2$ grows as $kR^2$. Combining these via Cauchy-Schwarz yields the result.
Maximum updates
k ≤ R² / γ² =

Squeeze Visualization

Lower (k²γ²) Upper (kR²) Actual ||wk||²

Data & Boundary

Parameters

R² / γ²
16

Generate random linearly separable data with these R and γ parameters, train a perceptron, and compare actual steps vs. the bound.

Lower bound: $(\mathbf{w}^* \cdot \mathbf{w}_k)^2 \ge k^2\gamma^2$

Upper bound: $\|\mathbf{w}_k\|^2 \le kR^2$

Cauchy-Schwarz: $k^2\gamma^2 \le \|\mathbf{w}^*\|^2 \cdot kR^2$

With $\|\mathbf{w}^*\|=1$:   $k \le R^2/\gamma^2$

Adjust sliders and click Generate