Click any weight to see how backpropagation computes its gradient, step by step.
How it works: Choose a topology, then click any connection in the diagram.
The applet highlights the selected weight,
traces the gradient paths to the output,
and derives the gradient using
BP3BP2BP1.
Vectors are shown in bold, scalars in normal weight.
Notation Guide
Indices & layers
\(\wt{w_{j,k}^{(l)}}\)
Weight from neuron \(k\) in layer \(l{-}1\) to neuron \(j\) in layer \(l\). First subscript = destination row, second = source column.
\(\act{a_j^{(l)}}\)
Activation (post-\(\sigma\)) of neuron \(j\) in layer \(l\).
\(\act{z_j^{(l)}}\)
Pre-activation (weighted sum + bias) of neuron \(j\) in layer \(l\).
\(\err{\delta_j^{(l)}}\)
Error signal: \(\err{\delta_j^{(l)}} = \partial\mathcal{L}/\partial z_j^{(l)}\). Measures how the loss changes with the pre-activation.
\(L\)
Index of the last layer (output). For \([3,2,1]\), \(L = 2\).
\(n_l\)
Number of neurons in layer \(l\).
Component chain conventions
\(\displaystyle\sum_{m=1}^{n}\)
Explicit notation (default): summation variable \(m\) ranges over neurons in an intermediate layer. Each term is one gradient path.
\(\wt{w_{\cdot,\,k}^{(l)}}\)
Einstein notation (toggle): the dot (\(\cdot\)) marks a summed-over index. Repeated dots across factors imply summation. Column \(k\) of \(\wt{\mathbf{W}^{(l)}}\).
\(\wt{w_{\cdot,\,\cdot}^{(l)}}\)
Both indices summed — the full matrix \(\wt{\mathbf{W}^{(l)}}\) participates.
Vector & matrix notation
\(\wt{\mathbf{W}^{(l)}}\)
Bold = matrix or vector. \(\wt{\mathbf{W}^{(l)}} \in \mathbb{R}^{n_l \times n_{l-1}}\).
\(\err{\boldsymbol{\delta}^{(l)}}\)
Error vector for layer \(l\) (all \(n_l\) signals stacked).