Chapter 23: The AES (Rijndael) Design

Chapter 23: The AES (Rijndael) Design#

The Advanced Encryption Standard (AES) is the most widely deployed symmetric cipher in the world. In this chapter we build a complete AES-128 implementation from first principles using NumPy, derive the S-box algebraically over \(\mathrm{GF}(2^8)\), and study the cipher’s security properties through visualization and experiment.

23.1 Historical Context: The NIST Competition#

By the mid-1990s the Data Encryption Standard (DES), adopted in 1977, was showing its age. Its 56-bit key was vulnerable to brute-force search (the EFF Deep Crack machine broke DES in 22 hours in 1998), and triple-DES, while secure, was slow.

In January 1997 the U.S. National Institute of Standards and Technology (NIST) issued a public call for proposals for a new block cipher standard to replace DES. The requirements were:

128-bit block size
Support for 128-, 192-, and 256-bit keys
Security, efficiency, and simplicity

Fifteen candidates were submitted. After two rounds of public evaluation the field was narrowed to five finalists: MARS, RC6, Rijndael, Serpent, and Twofish.

In October 2000 NIST selected Rijndael, designed by the Belgian cryptographers Joan Daemen and Vincent Rijmen, as the AES. The standard was published as FIPS 197 in November 2001.

Why Rijndael?

Rijndael combined strong security margins with exceptional performance across a wide range of platforms, from 8-bit smart cards to 64-bit workstations. Its mathematical elegance (based on operations in \(\mathrm{GF}(2^8)\)) and the wide trail strategy for provable resistance to differential and linear cryptanalysis were key factors.

The Wide Trail Strategy#

Daemen and Rijmen’s design philosophy, the wide trail strategy, ensures that after a small number of rounds any non-trivial differential or linear trail must activate many S-boxes. This is achieved through careful choice of the linear layer (MixColumns combined with ShiftRows) so that the branch number of the linear map is maximal.

23.2 Overview of AES#

AES operates on a \(4 \times 4\) array of bytes called the state. For AES-128 (the variant we implement here) there are 10 rounds. Each round (except the last) applies four transformations in sequence:

Step	Transformation	Purpose
1	SubBytes	Non-linear substitution via S-box
2	ShiftRows	Cyclic row shifts for inter-column diffusion
3	MixColumns	Matrix multiplication over \(\mathrm{GF}(2^8)\) for intra-column diffusion
4	AddRoundKey	XOR with the round key

The final (10th) round omits MixColumns. Before the first round an initial AddRoundKey is applied.

State layout

The 16 plaintext bytes \(b_0, b_1, \ldots, b_{15}\) are arranged column-major:

\[\begin{split} \text{state} = \begin{pmatrix} b_0 & b_4 & b_8 & b_{12} \\ b_1 & b_5 & b_9 & b_{13} \\ b_2 & b_6 & b_{10} & b_{14} \\ b_3 & b_7 & b_{11} & b_{15} \end{pmatrix}\end{split}\]

23.3 Arithmetic in \(\mathrm{GF}(2^8)\)#

All AES operations work over the finite field \(\mathrm{GF}(2^8) \cong \mathbb{F}_2[x]/(x^8+x^4+x^3+x+1)\).

We represent each field element as a byte (integer 0–255). Addition is bitwise XOR. Multiplication uses the irreducible polynomial \(m(x) = x^8 + x^4 + x^3 + x + 1\) (hex 0x11B).

Below we implement the core \(\mathrm{GF}(2^8)\) routines.

import numpy as np

# --- GF(2^8) arithmetic with irreducible polynomial x^8+x^4+x^3+x+1 ---

GF_MODULUS = 0x11B  # x^8 + x^4 + x^3 + x + 1

def gf_mul(a, b):
    """Multiply two elements in GF(2^8)."""
    a, b = int(a), int(b)
    result = 0
    while b:
        if b & 1:
            result ^= a
        a <<= 1
        if a & 0x100:
            a ^= GF_MODULUS
        b >>= 1
    return result

def gf_pow(a, n):
    """Raise element a to power n in GF(2^8)."""
    result = 1
    base = int(a)
    n = int(n)
    while n > 0:
        if n & 1:
            result = gf_mul(result, base)
        base = gf_mul(base, base)
        n >>= 1
    return result

def gf_inv(a):
    """Multiplicative inverse in GF(2^8). inv(0) = 0 by AES convention."""
    if a == 0:
        return 0
    # a^{-1} = a^{254} in GF(2^8) since the multiplicative group has order 255
    return gf_pow(a, 254)

# Pre-compute full multiplication table for speed
GF_MUL_TABLE = np.zeros((256, 256), dtype=np.uint8)
for i in range(256):
    for j in range(256):
        GF_MUL_TABLE[i, j] = gf_mul(i, j)

# Pre-compute inverse table
GF_INV_TABLE = np.array([gf_inv(i) for i in range(256)], dtype=np.uint8)

# Quick verification
print("gf_mul(0x57, 0x83) =", hex(gf_mul(0x57, 0x83)))
print("gf_inv(0x53)       =", hex(gf_inv(0x53)))
print("check: gf_mul(0x53, gf_inv(0x53)) =", hex(gf_mul(0x53, gf_inv(0x53))))
print("Inverse table computed for all 256 elements.")

gf_mul(0x57, 0x83) = 0xc1
gf_inv(0x53)       = 0xca
check: gf_mul(0x53, gf_inv(0x53)) = 0x1
Inverse table computed for all 256 elements.

23.4 Building the AES S-Box from First Principles#

The AES S-box is the only non-linear component in the cipher. It is constructed in two steps:

Multiplicative inversion in \(\mathrm{GF}(2^8)\): \(b \mapsto b^{-1}\) (with \(0 \mapsto 0\)).
Affine transformation over \(\mathrm{GF}(2)\):

\[ b'_i = b_i \oplus b_{(i+4)\bmod 8} \oplus b_{(i+5)\bmod 8} \oplus b_{(i+6)\bmod 8} \oplus b_{(i+7)\bmod 8} \oplus c_i\]

where \(c = \texttt{0x63}\) (binary 01100011).

This construction guarantees high algebraic degree, high non-linearity, and low differential uniformity.

import numpy as np

def build_aes_sbox():
    """Construct the AES S-box from multiplicative inverse + affine transform."""
    # Affine matrix (circulant matrix from the vector [1,0,0,0,1,1,1,1])
    affine_matrix = np.zeros((8, 8), dtype=np.uint8)
    row = [1, 0, 0, 0, 1, 1, 1, 1]
    for i in range(8):
        for j in range(8):
            affine_matrix[i, j] = row[(j - i) % 8]

    # Affine constant 0x63 = 01100011 in bits (LSB first)
    c_bits = np.array([1, 1, 0, 0, 0, 1, 1, 0], dtype=np.uint8)

    sbox = np.zeros(256, dtype=np.uint8)
    for val in range(256):
        # Step 1: multiplicative inverse in GF(2^8)
        inv = GF_INV_TABLE[val]
        # Convert to bit vector (LSB first)
        bits = np.array([(inv >> k) & 1 for k in range(8)], dtype=np.uint8)
        # Step 2: affine transformation
        result_bits = (affine_matrix @ bits + c_bits) % 2
        # Convert back to byte
        sbox[val] = sum(int(result_bits[k]) << k for k in range(8))

    return sbox

def build_aes_inv_sbox(sbox):
    """Build the inverse S-box from the forward S-box."""
    inv_sbox = np.zeros(256, dtype=np.uint8)
    for i in range(256):
        inv_sbox[sbox[i]] = i
    return inv_sbox

SBOX = build_aes_sbox()
INV_SBOX = build_aes_inv_sbox(SBOX)

# Verify against known values
print("S-box[0x00] =", hex(SBOX[0x00]), "(expected 0x63)")
print("S-box[0x01] =", hex(SBOX[0x01]), "(expected 0x7c)")
print("S-box[0x53] =", hex(SBOX[0x53]), "(expected 0xed)")
print("S-box[0xff] =", hex(SBOX[0xFF]), "(expected 0x16)")
print("\nInverse check: InvSBox[SBox[0x53]] =", hex(INV_SBOX[SBOX[0x53]]))

S-box[0x00] = 0x63 (expected 0x63)
S-box[0x01] = 0x7c (expected 0x7c)
S-box[0x53] = 0xed (expected 0xed)
S-box[0xff] = 0x16 (expected 0x16)

Inverse check: InvSBox[SBox[0x53]] = 0x53

Visualising the S-Box#

The S-box is traditionally displayed as a \(16 \times 16\) lookup table indexed by the high and low nibbles of the input byte.

../_images/a09ee24518453eb4c8870eaab20e4d5d2079fdd1f81dadfeae68dd8863a4c5af.png

23.5 The Four AES Transformations#

We now implement each transformation individually before assembling the full cipher.

23.5.1 SubBytes#

SubBytes applies the S-box independently to each of the 16 bytes of the state. It is the only non-linear transformation in AES.

import numpy as np

def sub_bytes(state):
    """Apply the AES S-box to every byte of the 4x4 state."""
    return SBOX[state]

def inv_sub_bytes(state):
    """Apply the inverse AES S-box to every byte of the 4x4 state."""
    return INV_SBOX[state]

# Demonstration
demo_state = np.array([
    [0x19, 0xA0, 0x9A, 0xE9],
    [0x3D, 0xF4, 0xC6, 0xF8],
    [0xE3, 0xE2, 0x8D, 0x48],
    [0xBE, 0x2B, 0x2A, 0x08]
], dtype=np.uint8)

after_sub = sub_bytes(demo_state)
print("Before SubBytes:")
for row in demo_state:
    print(' '.join(f'{b:02X}' for b in row))
print("\nAfter SubBytes:")
for row in after_sub:
    print(' '.join(f'{b:02X}' for b in row))

# Verify round-trip
assert np.array_equal(inv_sub_bytes(after_sub), demo_state)
print("\nRound-trip verified: InvSubBytes(SubBytes(state)) == state")

Before SubBytes:
19 A0 9A E9
3D F4 C6 F8
E3 E2 8D 48
BE 2B 2A 08

After SubBytes:
D4 E0 B8 1E
27 BF B4 41
11 98 5D 52
AE F1 E5 30

Round-trip verified: InvSubBytes(SubBytes(state)) == state

23.5.2 ShiftRows#

ShiftRows cyclically shifts each row of the state to the left:

Row 0: no shift
Row 1: shift left by 1
Row 2: shift left by 2
Row 3: shift left by 3

This provides inter-column diffusion: after ShiftRows, each column of the state contains bytes from four different original columns.

import numpy as np

def shift_rows(state):
    """Cyclically shift rows 1,2,3 to the left by 1,2,3 positions."""
    out = state.copy()
    out[1] = np.roll(state[1], -1)
    out[2] = np.roll(state[2], -2)
    out[3] = np.roll(state[3], -3)
    return out

def inv_shift_rows(state):
    """Inverse ShiftRows: shift rows right."""
    out = state.copy()
    out[1] = np.roll(state[1], 1)
    out[2] = np.roll(state[2], 2)
    out[3] = np.roll(state[3], 3)
    return out

# Demonstration
demo = np.array([
    [0xD4, 0xE0, 0xB8, 0x1E],
    [0x27, 0xBF, 0xB4, 0x41],
    [0x11, 0x98, 0x5D, 0x52],
    [0xAE, 0xF1, 0xE5, 0x30]
], dtype=np.uint8)

shifted = shift_rows(demo)
print("Before ShiftRows:")
for row in demo:
    print(' '.join(f'{b:02X}' for b in row))
print("\nAfter ShiftRows:")
for row in shifted:
    print(' '.join(f'{b:02X}' for b in row))

assert np.array_equal(inv_shift_rows(shifted), demo)
print("\nRound-trip verified.")

Before ShiftRows:
D4 E0 B8 1E
27 BF B4 41
11 98 5D 52
AE F1 E5 30

After ShiftRows:
D4 E0 B8 1E
BF B4 41 27
5D 52 11 98
30 AE F1 E5

Round-trip verified.

23.5.3 MixColumns#

MixColumns multiplies each column of the state by a fixed \(4 \times 4\) matrix over \(\mathrm{GF}(2^8)\):

\[\begin{split} \begin{pmatrix} 02 & 03 & 01 & 01 \\ 01 & 02 & 03 & 01 \\ 01 & 01 & 02 & 03 \\ 03 & 01 & 01 & 02 \end{pmatrix} \cdot \begin{pmatrix} s_0 \\ s_1 \\ s_2 \\ s_3 \end{pmatrix}\end{split}\]

This corresponds to multiplication by the polynomial \(c(x) = 03 \cdot x^3 + 01 \cdot x^2 + 01 \cdot x + 02\) modulo \(x^4 + 1\).

The inverse uses the matrix with entries {0E, 0B, 0D, 09}.

import numpy as np

# MixColumns fixed matrix and its inverse
MIX_MATRIX = np.array([
    [0x02, 0x03, 0x01, 0x01],
    [0x01, 0x02, 0x03, 0x01],
    [0x01, 0x01, 0x02, 0x03],
    [0x03, 0x01, 0x01, 0x02]
], dtype=np.uint8)

INV_MIX_MATRIX = np.array([
    [0x0E, 0x0B, 0x0D, 0x09],
    [0x09, 0x0E, 0x0B, 0x0D],
    [0x0D, 0x09, 0x0E, 0x0B],
    [0x0B, 0x0D, 0x09, 0x0E]
], dtype=np.uint8)

def gf_matrix_mul(mat, col):
    """Multiply a 4x4 GF(2^8) matrix by a 4-element column vector."""
    result = np.zeros(4, dtype=np.uint8)
    for i in range(4):
        val = 0
        for j in range(4):
            val ^= GF_MUL_TABLE[mat[i, j], col[j]]
        result[i] = val
    return result

def mix_columns(state):
    """Apply MixColumns to each column of the 4x4 state."""
    out = state.copy()
    for j in range(4):
        out[:, j] = gf_matrix_mul(MIX_MATRIX, state[:, j])
    return out

def inv_mix_columns(state):
    """Inverse MixColumns."""
    out = state.copy()
    for j in range(4):
        out[:, j] = gf_matrix_mul(INV_MIX_MATRIX, state[:, j])
    return out

# Demonstration
demo = np.array([
    [0xD4, 0xE0, 0xB8, 0x1E],
    [0xBF, 0xB4, 0x41, 0x27],
    [0x5D, 0x52, 0x11, 0x98],
    [0x30, 0xAE, 0xF1, 0xE5]
], dtype=np.uint8)

mixed = mix_columns(demo)
print("Before MixColumns:")
for row in demo:
    print(' '.join(f'{b:02X}' for b in row))
print("\nAfter MixColumns:")
for row in mixed:
    print(' '.join(f'{b:02X}' for b in row))

assert np.array_equal(inv_mix_columns(mixed), demo)
print("\nRound-trip verified.")

Before MixColumns:
D4 E0 B8 1E
BF B4 41 27
5D 52 11 98
30 AE F1 E5

After MixColumns:
04 E0 48 28
66 CB F8 06
81 19 D3 26
E5 9A 7A 4C

Round-trip verified.

23.5.4 AddRoundKey#

AddRoundKey simply XORs the state with the 128-bit round key. Since XOR is its own inverse, the same function serves for both encryption and decryption.

import numpy as np

def add_round_key(state, round_key):
    """XOR the state with the round key (both 4x4 uint8 arrays)."""
    return state ^ round_key

# Quick demonstration
s = np.array([
    [0x32, 0x88, 0x31, 0xE0],
    [0x43, 0x5A, 0x31, 0x37],
    [0xF6, 0x30, 0x98, 0x07],
    [0xA8, 0x8D, 0xA2, 0x34]
], dtype=np.uint8)

k = np.array([
    [0x2B, 0x28, 0xAB, 0x09],
    [0x7E, 0xAE, 0xF7, 0xCF],
    [0x15, 0xD2, 0x15, 0x4F],
    [0x16, 0xA6, 0x88, 0x3C]
], dtype=np.uint8)

result = add_round_key(s, k)
print("State XOR Key:")
for row in result:
    print(' '.join(f'{b:02X}' for b in row))

# XOR is self-inverse
assert np.array_equal(add_round_key(result, k), s)
print("\nSelf-inverse verified: (state ^ key) ^ key == state")

State XOR Key:
19 A0 9A E9
3D F4 C6 F8
E3 E2 8D 48
BE 2B 2A 08

Self-inverse verified: (state ^ key) ^ key == state

23.6 AES Key Expansion#

AES-128 expands the 16-byte (128-bit) cipher key into 11 round keys (one for the initial AddRoundKey plus one for each of the 10 rounds), giving 44 four-byte words total.

The key schedule uses:

RotWord: cyclic left rotation of a 4-byte word
SubWord: apply the S-box to each byte
Rcon: round constants \(\text{Rcon}[i] = (\texttt{rc}_i, 0, 0, 0)\) where \(\texttt{rc}_1 = 1\), \(\texttt{rc}_i = 2 \cdot \texttt{rc}_{i-1}\) in \(\mathrm{GF}(2^8)\)

import numpy as np

# Round constants for AES-128 (10 rounds)
RCON = np.array([
    [0x01, 0x00, 0x00, 0x00],
    [0x02, 0x00, 0x00, 0x00],
    [0x04, 0x00, 0x00, 0x00],
    [0x08, 0x00, 0x00, 0x00],
    [0x10, 0x00, 0x00, 0x00],
    [0x20, 0x00, 0x00, 0x00],
    [0x40, 0x00, 0x00, 0x00],
    [0x80, 0x00, 0x00, 0x00],
    [0x1B, 0x00, 0x00, 0x00],
    [0x36, 0x00, 0x00, 0x00],
], dtype=np.uint8)

def key_expansion(key_bytes):
    """
    AES-128 key expansion.
    
    Parameters
    ----------
    key_bytes : array-like of 16 uint8
        The 128-bit cipher key.
    
    Returns
    -------
    round_keys : ndarray of shape (11, 4, 4) uint8
        The 11 round keys as 4x4 state arrays (column-major).
    """
    key = np.array(key_bytes, dtype=np.uint8)
    Nk = 4   # Number of 32-bit words in the key
    Nr = 10  # Number of rounds
    
    # W holds 44 words (each word is 4 bytes)
    W = np.zeros((4 * (Nr + 1), 4), dtype=np.uint8)
    
    # First Nk words are the key itself
    for i in range(Nk):
        W[i] = key[4*i : 4*i + 4]
    
    for i in range(Nk, 4 * (Nr + 1)):
        temp = W[i - 1].copy()
        if i % Nk == 0:
            # RotWord
            temp = np.roll(temp, -1)
            # SubWord
            temp = SBOX[temp]
            # XOR with Rcon
            temp = temp ^ RCON[i // Nk - 1]
        W[i] = W[i - Nk] ^ temp
    
    # Reshape into 11 round keys, each as a 4x4 state (column-major)
    round_keys = np.zeros((Nr + 1, 4, 4), dtype=np.uint8)
    for r in range(Nr + 1):
        # Each round key is 4 consecutive words, arranged as columns
        for col in range(4):
            round_keys[r, :, col] = W[4 * r + col]
    
    return round_keys

# NIST test key
test_key = [0x2B, 0x7E, 0x15, 0x16,
            0x28, 0xAE, 0xD2, 0xA6,
            0xAB, 0xF7, 0x15, 0x88,
            0x09, 0xCF, 0x4F, 0x3C]

round_keys = key_expansion(test_key)

print("Round key 0 (original key):")
for row in round_keys[0]:
    print(' '.join(f'{b:02X}' for b in row))

print("\nRound key 1:")
for row in round_keys[1]:
    print(' '.join(f'{b:02X}' for b in row))

print("\nRound key 10 (last):")
for row in round_keys[10]:
    print(' '.join(f'{b:02X}' for b in row))

Round key 0 (original key):
2B 28 AB 09
7E AE F7 CF
15 D2 15 4F
16 A6 88 3C

Round key 1:
A0 88 23 2A
FA 54 A3 6C
FE 2C 39 76
17 B1 39 05

Round key 10 (last):
D0 C9 E1 B6
14 EE 3F 63
F9 25 0C 0C
A8 89 C8 A6

23.7 The Complete AES Class#

We now assemble all components into a single AES class that provides encrypt_block and decrypt_block methods.

State representation

The state is a \(4 \times 4\) NumPy array of uint8. Bytes are arranged column-major to match the FIPS 197 specification: the 16-byte input \(b_0, b_1, \ldots, b_{15}\) fills the state column by column.

import numpy as np

class AES:
    """AES-128 block cipher (encrypt and decrypt) using NumPy."""
    
    def __init__(self, key):
        """
        Parameters
        ----------
        key : list or array of 16 integers (bytes)
        """
        self.key = np.array(key, dtype=np.uint8)
        assert len(self.key) == 16, "AES-128 requires a 16-byte key."
        self.round_keys = key_expansion(self.key)
    
    @staticmethod
    def bytes_to_state(b):
        """Convert 16 bytes to a 4x4 state array (column-major)."""
        return np.array(b, dtype=np.uint8).reshape(4, 4).T
    
    @staticmethod
    def state_to_bytes(state):
        """Convert 4x4 state array back to 16 bytes (column-major)."""
        return state.T.flatten().tolist()
    
    def sub_bytes(self, state):
        return SBOX[state]
    
    def inv_sub_bytes(self, state):
        return INV_SBOX[state]
    
    def shift_rows(self, state):
        out = state.copy()
        out[1] = np.roll(state[1], -1)
        out[2] = np.roll(state[2], -2)
        out[3] = np.roll(state[3], -3)
        return out
    
    def inv_shift_rows(self, state):
        out = state.copy()
        out[1] = np.roll(state[1], 1)
        out[2] = np.roll(state[2], 2)
        out[3] = np.roll(state[3], 3)
        return out
    
    def mix_columns(self, state):
        out = state.copy()
        for j in range(4):
            out[:, j] = gf_matrix_mul(MIX_MATRIX, state[:, j])
        return out
    
    def inv_mix_columns(self, state):
        out = state.copy()
        for j in range(4):
            out[:, j] = gf_matrix_mul(INV_MIX_MATRIX, state[:, j])
        return out
    
    def add_round_key(self, state, round_idx):
        return state ^ self.round_keys[round_idx]
    
    def encrypt_block(self, plaintext, verbose=False):
        """
        Encrypt a 16-byte plaintext block.
        
        Parameters
        ----------
        plaintext : list or array of 16 integers
        verbose : bool, if True return list of intermediate states
        
        Returns
        -------
        ciphertext : list of 16 integers
        history : list of (label, state) if verbose=True
        """
        state = self.bytes_to_state(plaintext)
        history = []
        if verbose:
            history.append(('Input', state.copy()))
        
        # Initial round key addition
        state = self.add_round_key(state, 0)
        if verbose:
            history.append(('After AddRoundKey (initial)', state.copy()))
        
        # Rounds 1 through 9
        for rnd in range(1, 10):
            state = self.sub_bytes(state)
            if verbose:
                history.append((f'Round {rnd}: After SubBytes', state.copy()))
            state = self.shift_rows(state)
            if verbose:
                history.append((f'Round {rnd}: After ShiftRows', state.copy()))
            state = self.mix_columns(state)
            if verbose:
                history.append((f'Round {rnd}: After MixColumns', state.copy()))
            state = self.add_round_key(state, rnd)
            if verbose:
                history.append((f'Round {rnd}: After AddRoundKey', state.copy()))
        
        # Final round (no MixColumns)
        state = self.sub_bytes(state)
        if verbose:
            history.append(('Round 10: After SubBytes', state.copy()))
        state = self.shift_rows(state)
        if verbose:
            history.append(('Round 10: After ShiftRows', state.copy()))
        state = self.add_round_key(state, 10)
        if verbose:
            history.append(('Output', state.copy()))
        
        if verbose:
            return self.state_to_bytes(state), history
        return self.state_to_bytes(state)
    
    def decrypt_block(self, ciphertext):
        """
        Decrypt a 16-byte ciphertext block.
        
        Parameters
        ----------
        ciphertext : list or array of 16 integers
        
        Returns
        -------
        plaintext : list of 16 integers
        """
        state = self.bytes_to_state(ciphertext)
        
        # Undo final round
        state = self.add_round_key(state, 10)
        state = self.inv_shift_rows(state)
        state = self.inv_sub_bytes(state)
        
        # Undo rounds 9 down to 1
        for rnd in range(9, 0, -1):
            state = self.add_round_key(state, rnd)
            state = self.inv_mix_columns(state)
            state = self.inv_shift_rows(state)
            state = self.inv_sub_bytes(state)
        
        # Undo initial round key
        state = self.add_round_key(state, 0)
        
        return self.state_to_bytes(state)

print("AES class defined successfully.")

AES class defined successfully.

NIST FIPS 197 Test Vector#

We verify our implementation against the official NIST test vector from Appendix B of FIPS 197.

import numpy as np

# NIST FIPS 197 Appendix B test vector
nist_key = [0x2B, 0x7E, 0x15, 0x16, 0x28, 0xAE, 0xD2, 0xA6,
            0xAB, 0xF7, 0x15, 0x88, 0x09, 0xCF, 0x4F, 0x3C]

nist_plaintext = [0x32, 0x43, 0xF6, 0xA8, 0x88, 0x5A, 0x30, 0x8D,
                  0x31, 0x31, 0x98, 0xA2, 0xE0, 0x37, 0x07, 0x34]

nist_expected = [0x39, 0x25, 0x84, 0x1D, 0x02, 0xDC, 0x09, 0xFB,
                 0xDC, 0x11, 0x85, 0x97, 0x19, 0x6A, 0x0B, 0x32]

cipher = AES(nist_key)
ciphertext = cipher.encrypt_block(nist_plaintext)

print("Plaintext: ", ' '.join(f'{b:02X}' for b in nist_plaintext))
print("Key:       ", ' '.join(f'{b:02X}' for b in nist_key))
print("Ciphertext:", ' '.join(f'{b:02X}' for b in ciphertext))
print("Expected:  ", ' '.join(f'{b:02X}' for b in nist_expected))
print("Match:", ciphertext == nist_expected)

# Decrypt and verify round-trip
decrypted = cipher.decrypt_block(ciphertext)
print("\nDecrypted: ", ' '.join(f'{b:02X}' for b in decrypted))
print("Round-trip:", decrypted == nist_plaintext)

Plaintext:  32 43 F6 A8 88 5A 30 8D 31 31 98 A2 E0 37 07 34
Key:        2B 7E 15 16 28 AE D2 A6 AB F7 15 88 09 CF 4F 3C
Ciphertext: 39 25 84 1D 02 DC 09 FB DC 11 85 97 19 6A 0B 32
Expected:   39 25 84 1D 02 DC 09 FB DC 11 85 97 19 6A 0B 32
Match: True

Decrypted:  32 43 F6 A8 88 5A 30 8D 31 31 98 A2 E0 37 07 34
Round-trip: True

23.8 Experiments#

23.8.1 Round-by-Round State Visualisation#

Let us watch how the plaintext state is transformed through all 10 rounds of AES encryption.

../_images/b2f6303ca5dfb34f849a0b5b80499ad46f984b200a6b936b83270eff63b946c9.png

23.8.2 Effect of Each Transformation Within a Single Round#

Let us look more closely at what happens inside a single round. We visualise the state after each of the four transformations in round 1.

../_images/bb1b412e8850bda094b7450af1b626b516d606f46f2c5bca6573697ebc96300a.png

23.8.3 Avalanche Effect in AES#

The avalanche effect states that flipping a single input bit should change approximately half of the output bits. We measure this for AES-128 by:

Encrypting a random plaintext \(P\) to get ciphertext \(C\).
Flipping each of the 128 bits of \(P\) one at a time to get \(P'\).
Encrypting \(P'\) to get \(C'\).
Counting the Hamming distance \(d(C, C')\).

The ideal avalanche means \(d(C, C') \approx 64\) for each single-bit flip.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

def hamming_distance_bytes(a, b):
    """Hamming distance (in bits) between two byte sequences."""
    dist = 0
    for x, y in zip(a, b):
        dist += bin(x ^ y).count('1')
    return dist

rng = np.random.default_rng(42)
plaintext = rng.integers(0, 256, size=16, dtype=np.uint8).tolist()
key = rng.integers(0, 256, size=16, dtype=np.uint8).tolist()

cipher = AES(key)
ct_original = cipher.encrypt_block(plaintext)

# Flip each bit of the plaintext and measure avalanche
distances = []
for byte_idx in range(16):
    for bit_idx in range(8):
        pt_flipped = plaintext.copy()
        pt_flipped[byte_idx] ^= (1 << bit_idx)
        ct_flipped = cipher.encrypt_block(pt_flipped)
        distances.append(hamming_distance_bytes(ct_original, ct_flipped))

distances = np.array(distances)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart: distance for each flipped bit
ax1.bar(range(128), distances, color='steelblue', alpha=0.8, width=1.0)
ax1.axhline(y=64, color='red', linestyle='--', linewidth=1.5, label='Ideal (64 bits)')
ax1.set_xlabel('Flipped plaintext bit index', fontsize=11)
ax1.set_ylabel('Hamming distance (bits)', fontsize=11)
ax1.set_title('AES-128 Avalanche: Single-Bit Flips', fontsize=12)
ax1.legend(fontsize=10)
ax1.set_xlim(-1, 128)
ax1.set_ylim(0, 128)

# Histogram of distances
ax2.hist(distances, bins=range(40, 95), color='darkorange', edgecolor='black', alpha=0.8)
ax2.axvline(x=64, color='red', linestyle='--', linewidth=1.5, label='Ideal mean')
ax2.axvline(x=distances.mean(), color='blue', linestyle='-', linewidth=1.5,
            label=f'Observed mean = {float(distances.mean()):.1f}')
ax2.set_xlabel('Hamming distance', fontsize=11)
ax2.set_ylabel('Frequency', fontsize=11)
ax2.set_title('Distribution of Avalanche Distances', fontsize=12)
ax2.legend(fontsize=10)

plt.tight_layout()
plt.savefig('aes_avalanche.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"Mean Hamming distance:   {float(distances.mean()):.2f} / 128 bits")
print(f"Std dev:                 {float(distances.std()):.2f}")
print(f"Min / Max:               {distances.min()} / {distances.max()}")

../_images/328bb78214452a3967a141719c25e527d0964b845583f3df8ae62d101e8e0f46.png

Mean Hamming distance:   63.56 / 128 bits
Std dev:                 5.48
Min / Max:               48 / 83

23.8.4 Round-by-Round Avalanche Build-up#

How quickly does AES achieve full diffusion? We measure the avalanche effect after each round by performing partial encryption.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

def encrypt_n_rounds(aes_obj, plaintext, n_rounds):
    """Encrypt using only the first n_rounds of AES (for avalanche analysis)."""
    state = aes_obj.bytes_to_state(plaintext)
    state = state ^ aes_obj.round_keys[0]  # initial AddRoundKey
    
    for rnd in range(1, min(n_rounds + 1, 10)):
        state = SBOX[state]
        out = state.copy()
        out[1] = np.roll(state[1], -1)
        out[2] = np.roll(state[2], -2)
        out[3] = np.roll(state[3], -3)
        state = out
        if rnd < 10:
            out2 = state.copy()
            for j in range(4):
                out2[:, j] = gf_matrix_mul(MIX_MATRIX, state[:, j])
            state = out2
        state = state ^ aes_obj.round_keys[rnd]
    
    if n_rounds >= 10:
        # Final round (no MixColumns)
        state = SBOX[state]
        out = state.copy()
        out[1] = np.roll(state[1], -1)
        out[2] = np.roll(state[2], -2)
        out[3] = np.roll(state[3], -3)
        state = out
        state = state ^ aes_obj.round_keys[10]
    
    return aes_obj.state_to_bytes(state)

rng = np.random.default_rng(123)
pt = rng.integers(0, 256, size=16, dtype=np.uint8).tolist()
k = rng.integers(0, 256, size=16, dtype=np.uint8).tolist()
aes_obj = AES(k)

# Flip bit 0 of byte 0
pt_flipped = pt.copy()
pt_flipped[0] ^= 1

round_distances = []
for n in range(1, 11):
    ct1 = encrypt_n_rounds(aes_obj, pt, n)
    ct2 = encrypt_n_rounds(aes_obj, pt_flipped, n)
    d = hamming_distance_bytes(ct1, ct2)
    round_distances.append(d)

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(range(1, 11), round_distances, 'o-', color='darkgreen', linewidth=2,
        markersize=8, label='Hamming distance')
ax.axhline(y=64, color='red', linestyle='--', linewidth=1.5, label='Ideal (64 bits)')
ax.set_xlabel('Number of rounds', fontsize=12)
ax.set_ylabel('Hamming distance (bits)', fontsize=12)
ax.set_title('Avalanche Build-up: Single Bit Flip Through AES Rounds', fontsize=13)
ax.set_xticks(range(1, 11))
ax.set_ylim(0, 130)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('aes_avalanche_rounds.png', dpi=150, bbox_inches='tight')
plt.show()

for i, d in enumerate(round_distances, 1):
    print(f"After round {int(i):2d}: {int(d):3d} / 128 bits changed ({float(100*d/128):.1f}%)")

../_images/d548c3bc6c806236345e980a37f465cea557e8ad9cdc7fc87abc5b4c9c344f77.png

After round  1:  14 / 128 bits changed (10.9%)
After round  2:  63 / 128 bits changed (49.2%)
After round  3:  65 / 128 bits changed (50.8%)
After round  4:  64 / 128 bits changed (50.0%)
After round  5:  65 / 128 bits changed (50.8%)
After round  6:  65 / 128 bits changed (50.8%)
After round  7:  71 / 128 bits changed (55.5%)
After round  8:  65 / 128 bits changed (50.8%)
After round  9:  64 / 128 bits changed (50.0%)
After round 10:  61 / 128 bits changed (47.7%)

Observation

After just 2 rounds, the avalanche effect is already close to 50% of bits. By round 4-5, full diffusion is essentially achieved. This rapid diffusion is a direct consequence of the wide trail strategy built into the ShiftRows + MixColumns combination.

23.8.5 S-Box Nonlinearity Analysis#

The nonlinearity of a Boolean function \(f: \mathbb{F}_2^n \to \mathbb{F}_2\) is the minimum Hamming distance between \(f\) and all affine functions. It can be computed via the Walsh-Hadamard transform (WHT).

For the AES S-box (which maps 8 bits to 8 bits), we analyse each of the 8 component Boolean functions and compute their nonlinearity.

The maximum possible nonlinearity for an 8-bit function is \(2^7 - 2^3 = 112\). The AES S-box achieves this maximum, making it optimally resistant to linear cryptanalysis.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

def walsh_hadamard_transform(f):
    """
    Compute the Walsh-Hadamard transform of a Boolean function.
    f: array of length 2^n with values in {0, 1}
    Returns: WHT coefficients.
    """
    n = len(f)
    # Convert {0,1} -> {1,-1}
    g = 1 - 2 * f.astype(np.float64)
    # Fast Walsh-Hadamard
    h = g.copy()
    step = 1
    while step < n:
        for i in range(0, n, 2 * step):
            for j in range(step):
                u = h[i + j]
                v = h[i + j + step]
                h[i + j] = u + v
                h[i + j + step] = u - v
        step *= 2
    return h

def nonlinearity(f):
    """Compute the nonlinearity of a Boolean function."""
    wht = walsh_hadamard_transform(f)
    return int((len(f) - np.max(np.abs(wht))) / 2)

# Analyse each component function of the AES S-box
n_bits = 8
nonlinearities = []
wht_spectra = []

for bit in range(n_bits):
    # Component function: output bit 'bit' of SBox
    f = np.array([(SBOX[x] >> bit) & 1 for x in range(256)], dtype=np.int32)
    nl = nonlinearity(f)
    nonlinearities.append(nl)
    wht = walsh_hadamard_transform(f)
    wht_spectra.append(wht)

print("Component function nonlinearities:")
for bit, nl in enumerate(nonlinearities):
    print(f"  Bit {bit}: NL = {nl}  (max possible = 112)")

# Plot WHT spectrum of component 0
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.bar(range(n_bits), nonlinearities, color='teal', edgecolor='black')
ax1.axhline(y=112, color='red', linestyle='--', label='Maximum (112)')
ax1.set_xlabel('Component function (output bit)', fontsize=11)
ax1.set_ylabel('Nonlinearity', fontsize=11)
ax1.set_title('AES S-Box Component Nonlinearities', fontsize=12)
ax1.set_ylim(0, 130)
ax1.legend()

# WHT spectrum histogram (all components combined)
all_wht = np.concatenate(wht_spectra)
ax2.hist(all_wht, bins=50, color='salmon', edgecolor='black', alpha=0.8)
ax2.set_xlabel('Walsh-Hadamard coefficient', fontsize=11)
ax2.set_ylabel('Frequency', fontsize=11)
ax2.set_title('WHT Spectrum of AES S-Box (all components)', fontsize=12)

plt.tight_layout()
plt.savefig('aes_sbox_nonlinearity.png', dpi=150, bbox_inches='tight')
plt.show()

Component function nonlinearities:
  Bit 0: NL = 112  (max possible = 112)
  Bit 1: NL = 112  (max possible = 112)
  Bit 2: NL = 112  (max possible = 112)
  Bit 3: NL = 112  (max possible = 112)
  Bit 4: NL = 112  (max possible = 112)
  Bit 5: NL = 112  (max possible = 112)
  Bit 6: NL = 112  (max possible = 112)
  Bit 7: NL = 112  (max possible = 112)

../_images/14d720dd3747390d744eac6fe2126f76c640acab07fbfbd2d5e89e4c91ef8d28.png

23.8.6 Differential Uniformity of the S-Box#

The differential uniformity \(\delta\) of a function \(S\) is:

\[ \delta = \max_{\Delta x \neq 0, \Delta y} \#\{ x : S(x \oplus \Delta x) \oplus S(x) = \Delta y \}\]

For the AES S-box, \(\delta = 4\), which is the theoretical minimum for any power mapping over \(\mathrm{GF}(2^8)\). This means no differential characteristic through a single S-box can have probability greater than \(4/256 = 1/64\).

Differential uniformity of AES S-box: delta = 4
Maximum differential probability: 4/256 = 1/64

../_images/ddf178cfec7547bd8ba3ec4ff79723ed453ad96f6df79e5ba127e4e2624cc18c.png

Distribution of DDT entries (excluding row dx=0):
  Value 0: appears 32895 times
  Value 2: appears 32130 times
  Value 4: appears 255 times

Security implications

The AES S-box has nonlinearity 112 (the maximum) and differential uniformity 4 (the minimum for power mappings). These two properties together make AES provably resistant to both linear cryptanalysis and differential cryptanalysis after a sufficient number of rounds.

23.8.7 Comparing AES and DES Avalanche#

We compare the avalanche effect of our AES implementation with a simple Feistel-based DES-like construction to highlight AES’s superior diffusion speed.

../_images/2cd77f87d14a9cbd55e8d447f1fd3fa9564a8c4ee4b9a822a539d7b23a8d6d56.png

Mean avalanche per round:
  Round  1: 16.1 +/- 3.8 bits  (12.6%)
  Round  2: 64.7 +/- 7.8 bits  (50.5%)
  Round  3: 63.6 +/- 5.9 bits  (49.7%)
  Round  4: 63.7 +/- 6.2 bits  (49.8%)
  Round  5: 64.3 +/- 5.4 bits  (50.3%)
  Round  6: 64.0 +/- 5.9 bits  (50.0%)
  Round  7: 63.5 +/- 5.6 bits  (49.6%)
  Round  8: 64.5 +/- 6.0 bits  (50.4%)
  Round  9: 64.4 +/- 5.4 bits  (50.3%)
  Round 10: 63.8 +/- 5.6 bits  (49.9%)

23.9 Exercises#

Exercise 23.1: Verify the S-Box

Compute the AES S-box entry for input 0xAB by hand (or step by step in code):

Find the multiplicative inverse of 0xAB in \(\mathrm{GF}(2^8)\).
Apply the affine transformation.
Verify that your result matches SBOX[0xAB].

Hint

Use gf_inv(0xAB) to find the inverse, then apply the affine matrix and XOR with 0x63. Alternatively, compute \(\texttt{0xAB}^{254}\) in \(\mathrm{GF}(2^8)\) using repeated squaring.

Exercise 23.2: MixColumns by Hand

For the column vector \([\texttt{0xDB}, \texttt{0x13}, \texttt{0x53}, \texttt{0x45}]^T\), compute the result of MixColumns by hand. Show each GF(2^8) multiplication explicitly.

The expected output is \([\texttt{0x8E}, \texttt{0x4D}, \texttt{0xA1}, \texttt{0xBC}]^T\).

Hint

Recall that multiplication by 0x02 is a left shift followed by conditional XOR with 0x1B, and multiplication by 0x03 is multiplication by 0x02 XOR the original value. Work out each of the 16 products in the matrix-vector multiplication.

Exercise 23.3: Key Schedule Properties

Using our key_expansion function, verify that the AES key schedule is not invertible from a single intermediate round key:

Show that knowing round key \(K_5\) (say) does not trivially give you \(K_0\).
However, show that knowing any round key \(K_i\) allows you to recover all round keys (and hence the master key) by running the key schedule both forward and backward.

Hint

To invert the key schedule: from \(W[i]\) you can recover \(W[i-N_k]\) using \(W[i-N_k] = W[i] \oplus T_i\) where \(T_i\) is the transformation applied in the forward direction. The S-box is invertible, so RotWord and SubWord can be undone.

Exercise 23.4: Avalanche After Reduced Rounds

Modify the encryption function to perform only 1, 2, 3, or 4 rounds. For each variant, measure the avalanche effect over 500 random plaintext pairs (single-bit difference). At how many rounds does the mean Hamming distance first exceed 60 bits?

Hint

Use the encrypt_n_rounds function defined earlier. Loop over \(n \in \{1, 2, 3, 4\}\), generate 500 random plaintexts, flip one random bit, encrypt both, and compute the mean Hamming distance.

Exercise 23.5: Fixed Points of the S-Box

A fixed point of the S-box is a value \(x\) such that \(S(x) = x\).

How many fixed points does the AES S-box have?
How many values \(x\) satisfy \(S(x) = \overline{x}\) (complement)?
Compare with the expected number for a random permutation of 256 elements.

Hint

Simply iterate over all 256 values and check. For a random permutation of \(n\) elements, the expected number of fixed points is 1 (by the derangement formula: the probability of at least one fixed point approaches \(1 - 1/e \approx 0.632\)).

import numpy as np

# Starter code for Exercise 23.5
# Find fixed points and complement fixed points of the AES S-box

fixed_points = [x for x in range(256) if SBOX[x] == x]
complement_fixed = [x for x in range(256) if SBOX[x] == (x ^ 0xFF)]

print(f"Number of fixed points S(x) = x:       {len(fixed_points)}")
if fixed_points:
    print(f"  Values: {[hex(x) for x in fixed_points]}")

print(f"Number of complement fixed S(x) = ~x:  {len(complement_fixed)}")
if complement_fixed:
    print(f"  Values: {[hex(x) for x in complement_fixed]}")

print(f"\nExpected fixed points for random permutation of 256: ~1")

Number of fixed points S(x) = x:       0
Number of complement fixed S(x) = ~x:  0

Expected fixed points for random permutation of 256: ~1

23.10 Summary#

In this chapter we built a complete AES-128 implementation from scratch using NumPy:

GF(2^8) arithmetic – multiplication, inversion, and lookup tables using the irreducible polynomial \(x^8 + x^4 + x^3 + x + 1\).
S-box construction from first principles: multiplicative inversion followed by an affine transformation over \(\mathrm{GF}(2)\). We verified that the S-box achieves maximum nonlinearity (112) and minimum differential uniformity (4).
Four transformations – SubBytes, ShiftRows, MixColumns, AddRoundKey – and their inverses, enabling both encryption and decryption.
Key expansion for AES-128, producing 11 round keys from a 128-bit master key.
Experimental analysis showing:
- Excellent avalanche properties (mean ~64 out of 128 bits change)
- Rapid diffusion (full avalanche by round 4–5)
- Optimal S-box nonlinearity and differential uniformity

Key Takeaways#

AES is a substitution-permutation network (SPN), fundamentally different from the Feistel structure of DES.
The algebraic structure over \(\mathrm{GF}(2^8)\) enables both elegant analysis and efficient implementation.
The wide trail strategy guarantees provable resistance to differential and linear cryptanalysis.
AES remains unbroken after more than two decades; the best known attacks are only marginally better than brute force.

References#

Daemen, J., & Rijmen, V. (2002). The Design of Rijndael: AES – The Advanced Encryption Standard. Springer-Verlag.
NIST (2001). FIPS 197: Advanced Encryption Standard (AES). https://csrc.nist.gov/publications/detail/fips/197/final
Daemen, J., & Rijmen, V. (1999). AES Proposal: Rijndael. NIST AES submission.
Heys, H. M. (2002). A Tutorial on Linear and Differential Cryptanalysis. Cryptologia, 26(3), 189–221.
Stinson, D. R., & Paterson, M. (2018). Cryptography: Theory and Practice, 4th edition. CRC Press. Chapter 4.

Chapter 23: The AES (Rijndael) Design

Contents

Chapter 23: The AES (Rijndael) Design#

23.1 Historical Context: The NIST Competition#

The Wide Trail Strategy#

23.2 Overview of AES#

23.3 Arithmetic in \(\mathrm{GF}(2^8)\)#

23.4 Building the AES S-Box from First Principles#

Visualising the S-Box#

23.5 The Four AES Transformations#

23.5.1 SubBytes#

23.5.2 ShiftRows#

23.5.3 MixColumns#

23.5.4 AddRoundKey#

23.6 AES Key Expansion#

23.7 The Complete AES Class#

NIST FIPS 197 Test Vector#

23.8 Experiments#

23.8.1 Round-by-Round State Visualisation#

23.8.2 Effect of Each Transformation Within a Single Round#

23.8.3 Avalanche Effect in AES#

23.8.4 Round-by-Round Avalanche Build-up#

23.8.5 S-Box Nonlinearity Analysis#

23.8.6 Differential Uniformity of the S-Box#

23.8.7 Comparing AES and DES Avalanche#

23.9 Exercises#

23.10 Summary#

Key Takeaways#

References#