Chapter 16: Foundations of Linear Cryptanalysis

Chapter 16: Foundations of Linear Cryptanalysis#

16.1 Historical Context: Matsui’s Breakthrough#

In 1993, at the EUROCRYPT conference in Lofthus, Norway, Mitsuru Matsui presented a paper that would fundamentally change the landscape of block cipher analysis. Titled “Linear Cryptanalysis Method for DES Cipher,” this work introduced a novel attack paradigm that exploited linear statistical relationships between plaintext bits, ciphertext bits, and key bits. Within a year, Matsui refined the method and demonstrated a practical key-recovery attack on the full 16-round Data Encryption Standard (DES) using approximately \(2^{43}\) known plaintext-ciphertext pairs – the first publicly known attack on full DES that was faster than brute-force search over the \(2^{56}\)-bit key space.

The significance of this achievement cannot be overstated. DES had been the de facto standard for symmetric encryption since 1977. Despite decades of intensive cryptanalytic scrutiny, no public attack had broken the full cipher faster than exhaustive key search. Differential cryptanalysis, introduced by Biham and Shamir in 1990, required \(2^{47}\) chosen plaintexts – a chosen-plaintext attack that was theoretically powerful but practically difficult to mount. Matsui’s linear cryptanalysis, by contrast, required only known plaintexts, making it significantly more practical in real-world scenarios where an attacker might passively collect plaintext-ciphertext pairs.

“The purpose of this paper is to present a new method, called linear cryptanalysis, for cryptanalysis of DES cipher and to show that DES is breakable with \(2^{43}\) known plaintext-ciphertext pairs.”

– Mitsuru Matsui, Linear Cryptanalysis Method for DES Cipher, EUROCRYPT 1993

The core idea behind linear cryptanalysis is deceptively simple. Consider a block cipher as a complex nonlinear function mapping plaintext \(P\) and key \(K\) to ciphertext \(C\). Although this function is designed to be highly nonlinear (to resist algebraic attacks), it is never perfectly nonlinear. For certain choices of bit positions – specified by input masks \(\alpha\) and output masks \(\beta\) – the XOR of selected plaintext bits and selected ciphertext bits correlates with the XOR of certain key bits:

\[ P[\alpha_1] \oplus P[\alpha_2] \oplus \cdots \oplus C[\beta_1] \oplus C[\beta_2] \oplus \cdots = K[\gamma_1] \oplus K[\gamma_2] \oplus \cdots\]

This equation holds not with certainty, but with probability \(p = \frac{1}{2} + \varepsilon\), where \(\varepsilon\) is the bias. If \(\varepsilon\) is non-negligible, collecting enough known plaintext-ciphertext pairs allows the attacker to determine the value of the key-bit XOR with high confidence. By constructing multiple such linear approximations and combining them, the attacker can recover the full key.

Important

Linear cryptanalysis is a known-plaintext attack. Unlike differential cryptanalysis, which requires the attacker to choose specific plaintext pairs, linear cryptanalysis only requires the attacker to observe plaintext-ciphertext pairs encrypted under the same key. This makes it significantly more practical in many real-world scenarios.

Matsui’s work had profound consequences for cipher design. Every modern block cipher – including AES (Rijndael), Serpent, Twofish, and Camellia – is now explicitly designed to resist linear cryptanalysis. The maximum bias of the S-box linear approximation table became a primary design criterion, and the wide trail strategy used in AES was specifically engineered to ensure that any linear approximation spanning multiple rounds accumulates negligibly small bias.

16.2 Formal Definitions#

We now develop the mathematical foundations of linear cryptanalysis with full rigor. Throughout this section, we work over \(\mathbb{F}_2 = \{0, 1\}\), the field with two elements, where addition is XOR (\(\oplus\)).

Boolean Functions and Inner Products#

For a vector \(x = (x_1, x_2, \ldots, x_n) \in \mathbb{F}_2^n\) and a mask \(\alpha = (\alpha_1, \alpha_2, \ldots, \alpha_n) \in \mathbb{F}_2^n\), the inner product is:

\[ \langle \alpha, x \rangle = \alpha_1 x_1 \oplus \alpha_2 x_2 \oplus \cdots \oplus \alpha_n x_n = \bigoplus_{i=1}^{n} \alpha_i x_i\]

This selects and XORs exactly those bits of \(x\) where the corresponding bit of \(\alpha\) is 1.

Definition 16.1 (Linear Approximation of a Boolean Function)

Let \(S: \mathbb{F}_2^n \to \mathbb{F}_2^m\) be a vectorial Boolean function (such as an S-box). A linear approximation of \(S\) with input mask \(\alpha \in \mathbb{F}_2^n\) and output mask \(\beta \in \mathbb{F}_2^m\) is the Boolean equation:

\[ \langle \alpha, x \rangle = \langle \beta, S(x) \rangle\]

or equivalently:

\[ \bigoplus_{i=1}^{n} \alpha_i x_i = \bigoplus_{j=1}^{m} \beta_j S(x)_j\]

The probability of this approximation is:

\[ p(\alpha, \beta) = \Pr_{x \leftarrow \mathbb{F}_2^n}\!\big[\langle \alpha, x \rangle = \langle \beta, S(x) \rangle\big] = \frac{\#\{x \in \mathbb{F}_2^n : \langle \alpha, x \rangle = \langle \beta, S(x) \rangle\}}{2^n}\]

For a random function, we would expect \(p(\alpha, \beta) = 1/2\) for all non-trivial masks (i.e., \((\alpha, \beta) \neq (0, 0)\)). The degree to which an S-box deviates from this ideal is captured by the bias.

Definition 16.2 (Bias)

The bias of a linear approximation \((\alpha, \beta)\) for a function \(S\) is:

\[ \varepsilon(\alpha, \beta) = p(\alpha, \beta) - \frac{1}{2} = \frac{\#\{x : \langle \alpha, x \rangle = \langle \beta, S(x) \rangle\} - 2^{n-1}}{2^n}\]

Equivalently, if we define the random variable \(X = \langle \alpha, x \rangle \oplus \langle \beta, S(x) \rangle\), then:

\[ \varepsilon = \Pr[X = 0] - \frac{1}{2}\]

The bias satisfies \(-\frac{1}{2} \leq \varepsilon \leq \frac{1}{2}\), with \(\varepsilon = 0\) indicating a perfectly balanced (ideal) approximation.

Definition 16.3 (Linear Approximation Table)

The Linear Approximation Table (LAT) of an S-box \(S: \mathbb{F}_2^n \to \mathbb{F}_2^m\) is the \((2^n) \times (2^m)\) matrix \(\Lambda\) where:

\[ \Lambda[\alpha][\beta] = \#\{x \in \mathbb{F}_2^n : \langle \alpha, x \rangle = \langle \beta, S(x) \rangle\} - 2^{n-1}\]

Each entry \(\Lambda[\alpha][\beta]\) is related to the bias by \(\varepsilon(\alpha, \beta) = \Lambda[\alpha][\beta] / 2^n\).

The LAT has the following properties:

\(\Lambda[0][0] = 2^{n-1}\) (the trivial approximation always holds for all inputs, so the count is \(2^n\) and the entry is \(2^n - 2^{n-1} = 2^{n-1}\)).
\(\Lambda[\alpha][0] = 0\) for \(\alpha \neq 0\) (any non-trivial input mask with trivial output mask is balanced for a permutation).
All entries are integers with the same parity.

16.3 The Piling-Up Lemma#

The piling-up lemma is the theoretical engine that powers linear cryptanalysis. It tells us how to combine biases of individual round approximations to estimate the bias of a multi-round linear approximation.

Theorem 16.1 (Piling-Up Lemma, Matsui 1993)

Let \(X_1, X_2, \ldots, X_n\) be independent binary random variables with biases \(\varepsilon_i = \Pr[X_i = 0] - 1/2\) for \(i = 1, \ldots, n\). Then the bias of their XOR is:

\[ \varepsilon(X_1 \oplus X_2 \oplus \cdots \oplus X_n) = 2^{n-1} \prod_{i=1}^{n} \varepsilon_i\]

That is, if \(Y = X_1 \oplus X_2 \oplus \cdots \oplus X_n\), then:

\[ \Pr[Y = 0] = \frac{1}{2} + 2^{n-1} \prod_{i=1}^{n} \varepsilon_i\]

Proof of the Piling-Up Lemma

Base case (\(n = 2\)). Let \(X_1, X_2\) be independent binary random variables with \(\Pr[X_i = 0] = 1/2 + \varepsilon_i\), so \(\Pr[X_i = 1] = 1/2 - \varepsilon_i\).

We compute \(\Pr[X_1 \oplus X_2 = 0]\). Since \(X_1 \oplus X_2 = 0\) iff both are 0 or both are 1:

\[ \Pr[X_1 \oplus X_2 = 0] = \Pr[X_1 = 0]\Pr[X_2 = 0] + \Pr[X_1 = 1]\Pr[X_2 = 1]\]

(by independence). Substituting:

\[ = \left(\frac{1}{2} + \varepsilon_1\right)\left(\frac{1}{2} + \varepsilon_2\right) + \left(\frac{1}{2} - \varepsilon_1\right)\left(\frac{1}{2} - \varepsilon_2\right)\]

Expanding the first product:

\[ \left(\frac{1}{2} + \varepsilon_1\right)\left(\frac{1}{2} + \varepsilon_2\right) = \frac{1}{4} + \frac{\varepsilon_2}{2} + \frac{\varepsilon_1}{2} + \varepsilon_1\varepsilon_2\]

Expanding the second product:

\[ \left(\frac{1}{2} - \varepsilon_1\right)\left(\frac{1}{2} - \varepsilon_2\right) = \frac{1}{4} - \frac{\varepsilon_2}{2} - \frac{\varepsilon_1}{2} + \varepsilon_1\varepsilon_2\]

Adding:

\[ \Pr[X_1 \oplus X_2 = 0] = \frac{1}{2} + 2\varepsilon_1\varepsilon_2\]

So the bias of \(X_1 \oplus X_2\) is \(2\varepsilon_1\varepsilon_2 = 2^{2-1}\varepsilon_1\varepsilon_2\). This confirms the formula for \(n = 2\).

Inductive step. Assume the result holds for \(n-1\) variables: the bias of \(Y_{n-1} = X_1 \oplus \cdots \oplus X_{n-1}\) is \(\varepsilon_{Y_{n-1}} = 2^{n-2}\prod_{i=1}^{n-1}\varepsilon_i\).

Now \(Y_n = Y_{n-1} \oplus X_n\). Since \(Y_{n-1}\) depends only on \(X_1, \ldots, X_{n-1}\) and \(X_n\) is independent of all of these, \(Y_{n-1}\) and \(X_n\) are independent. Applying the base case:

\[ \varepsilon_{Y_n} = 2 \cdot \varepsilon_{Y_{n-1}} \cdot \varepsilon_n = 2 \cdot 2^{n-2}\prod_{i=1}^{n-1}\varepsilon_i \cdot \varepsilon_n = 2^{n-1}\prod_{i=1}^{n}\varepsilon_i\]

This completes the induction. \(\square\)

Warning

The independence assumption is crucial. The piling-up lemma assumes that the random variables \(X_1, \ldots, X_n\) are independent. In the context of a block cipher, these correspond to the linear approximations of individual S-boxes or individual rounds. In practice, the approximations are not perfectly independent because they share intermediate state bits. However, the heuristic assumption of independence (sometimes called the linear hull hypothesis) works remarkably well in practice, and the piling-up lemma gives good predictions for the bias of multi-round approximations.

16.4 Python Implementation: The `LinearAnalyzer` Class#

We now implement the core tools for linear cryptanalysis. The LinearAnalyzer class computes the full linear approximation table (LAT) of any S-box, extracts biases, and identifies the most exploitable linear approximations.

import numpy as np
import matplotlib.pyplot as plt


class LinearAnalyzer:
    """
    Linear cryptanalysis toolkit for S-box analysis.
    
    Given an S-box S: F_2^n -> F_2^m (represented as a lookup table),
    this class computes the Linear Approximation Table (LAT), extracts
    biases for specific mask pairs, and identifies the strongest linear
    approximations.
    
    Parameters
    ----------
    sbox : list or np.ndarray
        The S-box lookup table. sbox[x] = S(x).
        For a 4-bit S-box, this is a list of 16 integers in {0,...,15}.
    
    Attributes
    ----------
    n_input : int
        Number of input bits (log2 of input size).
    n_output : int
        Number of output bits (log2 of max output value + 1).
    lat : np.ndarray
        The Linear Approximation Table (counts minus 2^{n-1}).
    """
    
    def __init__(self, sbox):
        self.sbox = np.array(sbox, dtype=int)
        self.size = len(sbox)
        self.n_input = int(np.log2(self.size))
        self.n_output = int(np.log2(max(sbox) + 1))
        # Verify power-of-2 size
        assert self.size == 2**self.n_input, "S-box size must be a power of 2"
        self.lat = self.compute_lat(self.sbox)
    
    @staticmethod
    def _inner_product(mask, value):
        """Compute the inner product <mask, value> over F_2."""
        # XOR of bits where mask is 1 = parity of (mask AND value)
        return bin(mask & value).count('1') % 2
    
    @staticmethod
    def compute_lat(sbox):
        """
        Compute the Linear Approximation Table of an S-box.
        
        For input mask alpha and output mask beta, compute:
            LAT[alpha][beta] = #{x : <alpha,x> = <beta,S(x)>} - 2^{n-1}
        
        Parameters
        ----------
        sbox : array-like
            The S-box lookup table.
        
        Returns
        -------
        lat : np.ndarray
            The LAT matrix with shape (2^n, 2^m).
        """
        sbox = np.array(sbox, dtype=int)
        n = len(sbox)
        n_bits_in = int(np.log2(n))
        n_bits_out = int(np.log2(max(sbox) + 1))
        n_out = 2**n_bits_out
        half = n // 2
        
        lat = np.zeros((n, n_out), dtype=int)
        
        for alpha in range(n):
            for beta in range(n_out):
                count = 0
                for x in range(n):
                    # Compute <alpha, x> XOR <beta, S(x)>
                    lhs = bin(alpha & x).count('1') % 2
                    rhs = bin(beta & sbox[x]).count('1') % 2
                    if lhs == rhs:
                        count += 1
                lat[alpha][beta] = count - half
        
        return lat
    
    def bias(self, input_mask, output_mask):
        """
        Compute the bias for a given (input_mask, output_mask) pair.
        
        Parameters
        ----------
        input_mask : int
            The input mask alpha.
        output_mask : int
            The output mask beta.
        
        Returns
        -------
        float
            The bias epsilon = LAT[alpha][beta] / 2^n.
        """
        return self.lat[input_mask][output_mask] / self.size
    
    def best_approximations(self, n=10):
        """
        Find the top-n linear approximations with largest absolute bias.
        
        Parameters
        ----------
        n : int
            Number of top approximations to return.
        
        Returns
        -------
        list of tuples
            Each tuple is (input_mask, output_mask, bias_value).
            Sorted by |bias| in descending order.
            Excludes the trivial approximation (0, 0).
        """
        results = []
        n_out = self.lat.shape[1]
        for alpha in range(self.size):
            for beta in range(n_out):
                if alpha == 0 and beta == 0:
                    continue  # skip trivial
                b = self.lat[alpha][beta] / self.size
                results.append((alpha, beta, b))
        
        # Sort by absolute bias, descending
        results.sort(key=lambda t: abs(t[2]), reverse=True)
        return results[:n]
    
    def print_lat(self):
        """Print the LAT in a formatted table."""
        n_out = self.lat.shape[1]
        header = "     " + "".join(f"{b:4X}" for b in range(n_out))
        print(header)
        print("    " + "-" * (4 * n_out + 1))
        for alpha in range(self.size):
            row = f"{alpha:2X} | " + "".join(f"{int(self.lat[alpha][b]):4d}" for b in range(n_out))
            print(row)


# === Define the Heys S-box (from Heys' tutorial) ===
HEYS_SBOX = [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8,
             0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]

print("Heys S-box:", [hex(x) for x in HEYS_SBOX])
print()

# Create the analyzer
analyzer = LinearAnalyzer(HEYS_SBOX)
print(f"S-box size: {analyzer.size} ({analyzer.n_input}-bit input, {analyzer.n_output}-bit output)")
print()

# Print the full LAT
print("Linear Approximation Table (LAT):")
print("Rows = input mask (alpha), Columns = output mask (beta)")
print("Entries = count - 2^{n-1} (so bias = entry / 2^n)")
print()
analyzer.print_lat()

Heys S-box: ['0xe', '0x4', '0xd', '0x1', '0x2', '0xf', '0xb', '0x8', '0x3', '0xa', '0x6', '0xc', '0x5', '0x9', '0x0', '0x7']

S-box size: 16 (4-bit input, 4-bit output)

Linear Approximation Table (LAT):
Rows = input mask (alpha), Columns = output mask (beta)
Entries = count - 2^{n-1} (so bias = entry / 2^n)

        0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
    -----------------------------------------------------------------
 0 |    8   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 1 |    0   0  -2  -2   0   0  -2   6   2   2   0   0   2   2   0   0
 2 |    0   0  -2  -2   0   0  -2  -2   0   0   2   2   0   0  -6   2
 3 |    0   0   0   0   0   0   0   0   2  -6  -2  -2   2   2  -2  -2
 4 |    0   2   0  -2  -2  -4  -2   0   0  -2   0   2   2  -4   2   0
 5 |    0  -2  -2   0  -2   0   4   2  -2   0  -4   2   0  -2  -2   0
 6 |    0   2  -2   4   2   0   0   2   0  -2   2   4  -2   0   0  -2
 7 |    0  -2   0   2   2  -4   2   0  -2   0   2   0   4   2   0   2
 8 |    0   0   0   0   0   0   0   0  -2   2   2  -2   2  -2  -2  -6
 9 |    0   0  -2  -2   0   0  -2  -2  -4   0  -2   2   0   4   2  -2
 A |    0   4  -2   2  -4   0   2  -2   2   2   0   0   2   2   0   0
 B |    0   4   0  -4   4   0   4   0   0   0   0   0   0   0   0   0
 C |    0  -2   4  -2  -2   0   2   0   2   0   2   4   0   2   0  -2
 D |    0   2   2   0  -2   4   0   2  -4  -2   2   0   2   0   0   2
 E |    0   2   2   0  -2  -4   0   2  -2   0   0  -2  -4   2  -2   0
 F |    0  -2  -4  -2  -2   0   2   0   0  -2   4  -2  -2   0   2   0

Identifying the Strongest Approximations#

The most useful linear approximations for an attacker are those with the largest absolute bias. Let us extract and examine them.

import numpy as np
import matplotlib.pyplot as plt

# Recreate the analyzer
HEYS_SBOX = [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8,
             0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]

# Redefine helper for standalone execution
def compute_lat(sbox):
    sbox = np.array(sbox, dtype=int)
    n = len(sbox)
    n_out = 2**int(np.log2(max(sbox) + 1))
    half = n // 2
    lat = np.zeros((n, n_out), dtype=int)
    for alpha in range(n):
        for beta in range(n_out):
            count = 0
            for x in range(n):
                lhs = bin(alpha & x).count('1') % 2
                rhs = bin(beta & sbox[x]).count('1') % 2
                if lhs == rhs:
                    count += 1
            lat[alpha][beta] = count - half
    return lat

lat = compute_lat(HEYS_SBOX)
n = len(HEYS_SBOX)

# Find best approximations
results = []
for alpha in range(n):
    for beta in range(n):
        if alpha == 0 and beta == 0:
            continue
        bias = lat[alpha][beta] / n
        results.append((alpha, beta, bias))

results.sort(key=lambda t: abs(t[2]), reverse=True)

print("Top 15 Linear Approximations for Heys S-box")
print("=" * 62)
print(f"{'Rank':>4s}  {'alpha':>6s}  {'beta':>6s}  {'LAT entry':>10s}  {'Bias':>10s}  {'|Bias|':>8s}")
print("-" * 62)
for rank, (a, b, bias) in enumerate(results[:15], 1):
    print(f"{int(rank):4d}  0x{a:01X} ({a:04b})  0x{b:01X} ({b:04b})  {int(lat[a][b]):10d}  {float(bias):10.4f}  {float(abs(bias)):8.4f}")

print()
print("Key observations:")
max_abs_bias = max(abs(r[2]) for r in results)
count_max = sum(1 for r in results if abs(r[2]) == max_abs_bias)
print(f"  - Maximum absolute bias: {float(max_abs_bias):.4f}")
print(f"  - Number of approximations achieving max |bias|: {count_max}")
print(f"  - Bias = {float(max_abs_bias):.4f} means probability = {float(0.5 + max_abs_bias):.4f}")
print(f"  - These are the most exploitable approximations for linear cryptanalysis")

Top 15 Linear Approximations for Heys S-box
==============================================================
Rank   alpha    beta   LAT entry        Bias    |Bias|
--------------------------------------------------------------
   1  0x1 (0001)  0x7 (0111)           6      0.3750    0.3750
   2  0x2 (0010)  0xE (1110)          -6     -0.3750    0.3750
   3  0x3 (0011)  0x9 (1001)          -6     -0.3750    0.3750
   4  0x8 (1000)  0xF (1111)          -6     -0.3750    0.3750
   5  0x4 (0100)  0x5 (0101)          -4     -0.2500    0.2500
   6  0x4 (0100)  0xD (1101)          -4     -0.2500    0.2500
   7  0x5 (0101)  0x6 (0110)           4      0.2500    0.2500
   8  0x5 (0101)  0xA (1010)          -4     -0.2500    0.2500
   9  0x6 (0110)  0x3 (0011)           4      0.2500    0.2500
  10  0x6 (0110)  0xB (1011)           4      0.2500    0.2500
  11  0x7 (0111)  0x5 (0101)          -4     -0.2500    0.2500
  12  0x7 (0111)  0xC (1100)           4      0.2500    0.2500
  13  0x9 (1001)  0x8 (1000)          -4     -0.2500    0.2500
  14  0x9 (1001)  0xD (1101)           4      0.2500    0.2500
  15  0xA (1010)  0x1 (0001)           4      0.2500    0.2500

Key observations:
  - Maximum absolute bias: 0.3750
  - Number of approximations achieving max |bias|: 4
  - Bias = 0.3750 means probability = 0.8750
  - These are the most exploitable approximations for linear cryptanalysis

16.5 Visualizing the Linear Approximation Table#

The LAT reveals the linear structure hidden within a nonlinear S-box. A heatmap visualization makes it easy to spot strong approximations (dark red or dark blue) and assess the overall linearity profile.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

HEYS_SBOX = [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8,
             0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]

def compute_lat(sbox):
    sbox = np.array(sbox, dtype=int)
    n = len(sbox)
    n_out = 2**int(np.log2(max(sbox) + 1))
    half = n // 2
    lat = np.zeros((n, n_out), dtype=int)
    for alpha in range(n):
        for beta in range(n_out):
            count = 0
            for x in range(n):
                lhs = bin(alpha & x).count('1') % 2
                rhs = bin(beta & sbox[x]).count('1') % 2
                if lhs == rhs:
                    count += 1
            lat[alpha][beta] = count - half
    return lat

lat = compute_lat(HEYS_SBOX)
n = len(HEYS_SBOX)

# Convert to bias values
bias_table = lat / n

fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Left: Raw LAT counts
im1 = axes[0].imshow(lat, cmap='RdBu_r', aspect='equal', interpolation='nearest',
                      vmin=-8, vmax=8)
axes[0].set_title('LAT: Raw Counts $\\Lambda[\\alpha][\\beta]$', fontsize=13, fontweight='bold')
axes[0].set_xlabel('Output mask $\\beta$', fontsize=12)
axes[0].set_ylabel('Input mask $\\alpha$', fontsize=12)
axes[0].set_xticks(range(16))
axes[0].set_yticks(range(16))
axes[0].set_xticklabels([f'{i:X}' for i in range(16)], fontsize=9)
axes[0].set_yticklabels([f'{i:X}' for i in range(16)], fontsize=9)
# Add text annotations
for i in range(16):
    for j in range(16):
        color = 'white' if abs(lat[i][j]) >= 5 else 'black'
        axes[0].text(j, i, f'{int(lat[i][j]):d}', ha='center', va='center',
                     fontsize=7, color=color, fontweight='bold')
fig.colorbar(im1, ax=axes[0], shrink=0.8, label='Count $-$ $2^{n-1}$')

# Right: Bias values
im2 = axes[1].imshow(bias_table, cmap='RdBu_r', aspect='equal', interpolation='nearest',
                      vmin=-0.5, vmax=0.5)
axes[1].set_title('LAT: Bias Values $\\varepsilon(\\alpha, \\beta)$', fontsize=13, fontweight='bold')
axes[1].set_xlabel('Output mask $\\beta$', fontsize=12)
axes[1].set_ylabel('Input mask $\\alpha$', fontsize=12)
axes[1].set_xticks(range(16))
axes[1].set_yticks(range(16))
axes[1].set_xticklabels([f'{i:X}' for i in range(16)], fontsize=9)
axes[1].set_yticklabels([f'{i:X}' for i in range(16)], fontsize=9)
for i in range(16):
    for j in range(16):
        val = bias_table[i][j]
        color = 'white' if abs(val) >= 0.3 else 'black'
        axes[1].text(j, i, f'{val:+.2f}', ha='center', va='center',
                     fontsize=6, color=color)
fig.colorbar(im2, ax=axes[1], shrink=0.8, label='Bias $\\varepsilon$')

fig.suptitle('Figure 16.1: Linear Approximation Table of the Heys S-box',
             fontsize=14, fontweight='bold', y=1.02)

plt.tight_layout()
plt.savefig('fig_16_1_heys_lat_heatmap.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

../_images/294f6dd1902932d713f731aa5329ecf71a07bc2b3c8bddaa3c73618898c190fc.png

16.6 Experimental Verification of the Piling-Up Lemma#

The piling-up lemma predicts how biases compose when independent random variables are XORed. Let us verify this experimentally by generating biased random variables and comparing the observed bias of their XOR against the theoretical prediction.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

def generate_biased_bits(n_samples, bias):
    """
    Generate n_samples binary random variables with given bias.
    P(X=0) = 0.5 + bias, P(X=1) = 0.5 - bias
    """
    prob_zero = 0.5 + bias
    return (np.random.random(n_samples) >= prob_zero).astype(int)

def observed_bias(bits):
    """Compute observed bias: P(X=0) - 0.5"""
    return np.mean(bits == 0) - 0.5

# Experiment: XOR of n biased variables
n_samples = 500_000
individual_biases = [0.1, -0.15, 0.2, 0.05, -0.12]

print("Piling-Up Lemma Verification")
print("=" * 70)
print(f"Number of samples: {n_samples:,}")
print()

# Generate individual biased bits
bits_list = [generate_biased_bits(n_samples, eps) for eps in individual_biases]

# Verify individual biases
print("Individual variable biases:")
for i, (eps_true, bits) in enumerate(zip(individual_biases, bits_list)):
    eps_obs = observed_bias(bits)
    print(f"  X_{i+1}: true bias = {eps_true:+.4f}, observed bias = {eps_obs:+.4f}")
print()

# XOR progressively more variables and check
xor_bits = bits_list[0].copy()
theoretical_biases = [individual_biases[0]]
observed_biases = [observed_bias(xor_bits)]
n_vars_list = [1]

print("Progressive XOR results:")
print(f"{'n vars':>7s}  {'Theoretical bias':>18s}  {'Observed bias':>16s}  {'Relative error':>16s}")
print("-" * 65)

# For n=1, the theoretical bias is just eps_1
theo = individual_biases[0]
obs = observed_bias(xor_bits)
rel_err = abs(theo - obs) / abs(theo) if theo != 0 else 0
print(f"{int(1):7d}  {theo:+18.8f}  {obs:+16.8f}  {float(rel_err):16.4%}")

for k in range(1, len(individual_biases)):
    xor_bits = xor_bits ^ bits_list[k]
    
    # Piling-up lemma: bias = 2^{n-1} * product of individual biases
    n_combined = k + 1
    theo = (2**(n_combined - 1)) * np.prod(individual_biases[:n_combined])
    obs = observed_bias(xor_bits)
    rel_err = abs(theo - obs) / abs(theo) if theo != 0 else 0
    
    theoretical_biases.append(theo)
    observed_biases.append(obs)
    n_vars_list.append(n_combined)
    
    print(f"{int(n_combined):7d}  {theo:+18.8f}  {obs:+16.8f}  {float(rel_err):16.4%}")

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5.5))

# Left: Theoretical vs Observed bias
x_pos = np.arange(len(n_vars_list))
width = 0.35
bars1 = axes[0].bar(x_pos - width/2, theoretical_biases, width, label='Theoretical (Piling-Up)',
                     color='#1565C0', edgecolor='black', linewidth=0.8)
bars2 = axes[0].bar(x_pos + width/2, observed_biases, width, label='Observed (Monte Carlo)',
                     color='#E65100', edgecolor='black', linewidth=0.8, alpha=0.85)
axes[0].set_xlabel('Number of XORed Variables', fontsize=12)
axes[0].set_ylabel('Bias $\\varepsilon$', fontsize=12)
axes[0].set_title('Piling-Up Lemma: Predicted vs Observed', fontsize=13, fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(n_vars_list)
axes[0].legend(fontsize=10)
axes[0].axhline(y=0, color='gray', linestyle='-', linewidth=0.5)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)

# Right: |Bias| on log scale to show exponential decay
abs_theo = [abs(b) for b in theoretical_biases]
abs_obs = [abs(b) for b in observed_biases]
axes[1].semilogy(n_vars_list, abs_theo, 'o-', color='#1565C0', linewidth=2,
                  markersize=8, label='|Theoretical bias|')
axes[1].semilogy(n_vars_list, abs_obs, 's--', color='#E65100', linewidth=2,
                  markersize=8, label='|Observed bias|', alpha=0.85)
axes[1].set_xlabel('Number of XORed Variables', fontsize=12)
axes[1].set_ylabel('$|\\varepsilon|$ (log scale)', fontsize=12)
axes[1].set_title('Bias Decay (Logarithmic Scale)', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)

fig.suptitle('Figure 16.2: Experimental Verification of the Piling-Up Lemma',
             fontsize=14, fontweight='bold', y=1.02)

plt.tight_layout()
plt.savefig('fig_16_2_piling_up_verification.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

Piling-Up Lemma Verification
======================================================================
Number of samples: 500,000

Individual variable biases:
  X_1: true bias = +0.1000, observed bias = +0.0992
  X_2: true bias = -0.1500, observed bias = -0.1504
  X_3: true bias = +0.2000, observed bias = +0.2017
  X_4: true bias = +0.0500, observed bias = +0.0509
  X_5: true bias = -0.1200, observed bias = -0.1199

Progressive XOR results:
 n vars    Theoretical bias     Observed bias    Relative error
-----------------------------------------------------------------
      1         +0.10000000       +0.09923600           0.7640%
      2         -0.03000000       -0.02925000           2.5000%
      3         -0.01200000       -0.01323800          10.3167%
      4         -0.00120000       -0.00230000          91.6667%
      5         +0.00028800       +0.00089200         209.7222%

../_images/5be367164df0c8f4cd4c1f90f1a2119b21394e50e0452b2fb62f334f068b67c4.png

16.7 Comparing S-box Linearity: Heys vs PRESENT vs Random#

A well-designed S-box should minimize the maximum absolute bias in its LAT (excluding the trivial entry). Let us compare three S-boxes:

Heys S-box – the 4-bit educational S-box from Heys’ tutorial
PRESENT S-box – a well-designed 4-bit S-box from the PRESENT lightweight cipher, which is known to achieve optimal nonlinearity
Random permutation – a randomly generated 4-bit permutation

For an ideal \(n\)-bit S-box, the maximum absolute bias is bounded below by \(2^{-n/2}\) (the “bent function” bound for \(n\) even). For \(n = 4\), this gives \(|\varepsilon|_{\max} \geq 1/4\), so no 4-bit S-box can achieve maximum bias below \(1/4\).

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

def compute_lat(sbox):
    sbox = np.array(sbox, dtype=int)
    n = len(sbox)
    n_out = 2**int(np.log2(max(sbox) + 1))
    half = n // 2
    lat = np.zeros((n, n_out), dtype=int)
    for alpha in range(n):
        for beta in range(n_out):
            count = 0
            for x in range(n):
                lhs = bin(alpha & x).count('1') % 2
                rhs = bin(beta & sbox[x]).count('1') % 2
                if lhs == rhs:
                    count += 1
            lat[alpha][beta] = count - half
    return lat

def max_absolute_bias(lat, n):
    """Max absolute bias excluding trivial (0,0) entry."""
    best = 0
    for a in range(lat.shape[0]):
        for b in range(lat.shape[1]):
            if a == 0 and b == 0:
                continue
            best = max(best, abs(lat[a][b]))
    return best / n

# Define the three S-boxes
HEYS_SBOX = [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8,
             0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]

# PRESENT cipher S-box (a well-designed 4-bit S-box)
PRESENT_SBOX = [0xC, 0x5, 0x6, 0xB, 0x9, 0x0, 0xA, 0xD,
                0x3, 0xE, 0xF, 0x8, 0x4, 0x7, 0x1, 0x2]

# Random permutation S-box
np.random.seed(2023)
RANDOM_SBOX = list(np.random.permutation(16))

sboxes = {
    'Heys': HEYS_SBOX,
    'PRESENT': PRESENT_SBOX,
    'Random': RANDOM_SBOX
}

lats = {}
for name, sbox in sboxes.items():
    lats[name] = compute_lat(sbox)

# Print comparison
print("S-box Linearity Comparison")
print("=" * 55)
for name, sbox in sboxes.items():
    lat = lats[name]
    mb = max_absolute_bias(lat, len(sbox))
    print(f"  {name:10s} S-box: max |bias| = {float(mb):.4f}  (ideal >= 0.2500)")
print()

# Visualize all three LATs side by side
fig, axes = plt.subplots(1, 3, figsize=(18, 5.5))

for idx, (name, lat) in enumerate(lats.items()):
    n = lat.shape[0]
    bias_table = lat / n
    im = axes[idx].imshow(bias_table, cmap='RdBu_r', aspect='equal',
                           interpolation='nearest', vmin=-0.5, vmax=0.5)
    axes[idx].set_title(f'{name} S-box', fontsize=13, fontweight='bold')
    axes[idx].set_xlabel('Output mask $\\beta$', fontsize=11)
    if idx == 0:
        axes[idx].set_ylabel('Input mask $\\alpha$', fontsize=11)
    axes[idx].set_xticks(range(16))
    axes[idx].set_yticks(range(16))
    axes[idx].set_xticklabels([f'{i:X}' for i in range(16)], fontsize=8)
    axes[idx].set_yticklabels([f'{i:X}' for i in range(16)], fontsize=8)
    
    mb = max_absolute_bias(lat, n)
    axes[idx].text(0.5, -0.12, f'max |bias| = {float(mb):.4f}',
                   transform=axes[idx].transAxes, fontsize=10,
                   ha='center', style='italic')

fig.colorbar(im, ax=axes.tolist(), shrink=0.8, label='Bias $\\varepsilon$', pad=0.02)

fig.suptitle('Figure 16.3: LAT Comparison -- Heys vs PRESENT vs Random S-box',
             fontsize=14, fontweight='bold', y=1.02)

plt.tight_layout()
plt.savefig('fig_16_3_sbox_comparison.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

S-box Linearity Comparison
=======================================================
  Heys       S-box: max |bias| = 0.3750  (ideal >= 0.2500)
  PRESENT    S-box: max |bias| = 0.2500  (ideal >= 0.2500)
  Random     S-box: max |bias| = 0.3750  (ideal >= 0.2500)

../_images/90ca35ade0d35a3e032bc1109f0ab37788987c1ae25dace42013ba5553f6783c.png

16.8 Distribution of Bias Values#

For a “good” S-box (one that resists linear cryptanalysis), the bias values should be concentrated near zero, with only a few entries having large absolute bias. Let us examine the distribution of bias values across all three S-boxes and compare them to the theoretical distribution for a random function.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

def compute_lat(sbox):
    sbox = np.array(sbox, dtype=int)
    n = len(sbox)
    n_out = 2**int(np.log2(max(sbox) + 1))
    half = n // 2
    lat = np.zeros((n, n_out), dtype=int)
    for alpha in range(n):
        for beta in range(n_out):
            count = 0
            for x in range(n):
                lhs = bin(alpha & x).count('1') % 2
                rhs = bin(beta & sbox[x]).count('1') % 2
                if lhs == rhs:
                    count += 1
            lat[alpha][beta] = count - half
    return lat

HEYS_SBOX = [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8,
             0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]
PRESENT_SBOX = [0xC, 0x5, 0x6, 0xB, 0x9, 0x0, 0xA, 0xD,
                0x3, 0xE, 0xF, 0x8, 0x4, 0x7, 0x1, 0x2]
np.random.seed(2023)
RANDOM_SBOX = list(np.random.permutation(16))

sboxes = {'Heys': HEYS_SBOX, 'PRESENT': PRESENT_SBOX, 'Random': RANDOM_SBOX}
colors = {'Heys': '#1565C0', 'PRESENT': '#2E7D32', 'Random': '#E65100'}

fig, axes = plt.subplots(1, 3, figsize=(17, 5))

for idx, (name, sbox) in enumerate(sboxes.items()):
    lat = compute_lat(sbox)
    n = len(sbox)
    
    # Collect all bias values (excluding trivial (0,0) entry)
    biases = []
    for a in range(n):
        for b in range(n):
            if a == 0 and b == 0:
                continue
            biases.append(lat[a][b] / n)
    
    biases = np.array(biases)
    
    # Histogram
    bins = np.linspace(-0.55, 0.55, 23)
    axes[idx].hist(biases, bins=bins, color=colors[name], edgecolor='black',
                   linewidth=0.8, alpha=0.85, density=False)
    axes[idx].set_title(f'{name} S-box', fontsize=13, fontweight='bold')
    axes[idx].set_xlabel('Bias $\\varepsilon$', fontsize=11)
    if idx == 0:
        axes[idx].set_ylabel('Frequency (count)', fontsize=11)
    axes[idx].axvline(x=0, color='red', linestyle='--', alpha=0.5, linewidth=1.5)
    
    # Statistics
    axes[idx].text(0.98, 0.95,
                   f'Mean: {np.mean(biases):+.4f}\n'
                   f'Std: {float(np.std(biases)):.4f}\n'
                   f'Max |bias|: {float(np.max(np.abs(biases))):.4f}',
                   transform=axes[idx].transAxes, fontsize=9,
                   verticalalignment='top', horizontalalignment='right',
                   bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    axes[idx].spines['top'].set_visible(False)
    axes[idx].spines['right'].set_visible(False)

fig.suptitle('Figure 16.4: Distribution of Bias Values Across S-boxes',
             fontsize=14, fontweight='bold', y=1.02)

plt.tight_layout()
plt.savefig('fig_16_4_bias_distribution.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

../_images/611af51f528319368cf2c2c2405d7a0ae5e8620687cf1b773fb14b37514a524a.png

16.9 Nonlinearity and the Walsh-Hadamard Transform#

The LAT is intimately related to the Walsh-Hadamard transform, which provides an efficient way to compute all entries simultaneously. The nonlinearity of an S-box – a key design metric – is defined in terms of the maximum absolute value in the LAT.

Definition 16.4 (Nonlinearity of an S-box)

The nonlinearity of an S-box \(S: \mathbb{F}_2^n \to \mathbb{F}_2^n\) is:

\[\begin{split} \mathcal{NL}(S) = 2^{n-1} - \max_{\substack{\alpha \in \mathbb{F}_2^n, \beta \in \mathbb{F}_2^n \\ (\alpha, \beta) \neq (0,0)}} |\Lambda[\alpha][\beta]|\end{split}\]

Higher nonlinearity means better resistance to linear cryptanalysis. For a 4-bit S-box (\(n = 4\)), the maximum achievable nonlinearity is \(2^{n-1} - 2^{n/2-1} = 8 - 2 = 4\) (in the LAT-count metric, this means \(\max |\Lambda| = 4\)).

Let us compute the nonlinearity of our three S-boxes and verify with the Walsh-Hadamard approach.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

def compute_lat(sbox):
    sbox = np.array(sbox, dtype=int)
    n = len(sbox)
    n_out = 2**int(np.log2(max(sbox) + 1))
    half = n // 2
    lat = np.zeros((n, n_out), dtype=int)
    for alpha in range(n):
        for beta in range(n_out):
            count = 0
            for x in range(n):
                lhs = bin(alpha & x).count('1') % 2
                rhs = bin(beta & sbox[x]).count('1') % 2
                if lhs == rhs:
                    count += 1
            lat[alpha][beta] = count - half
    return lat

def walsh_hadamard_lat(sbox):
    """
    Compute the LAT using the Walsh-Hadamard transform.
    
    For each output mask beta, define the Boolean function
    f_beta(x) = <beta, S(x)>. The Walsh-Hadamard coefficient is:
        W_f(alpha) = sum_x (-1)^{<alpha,x> XOR f_beta(x)}
    
    Then LAT[alpha][beta] = W_f(alpha) / 2
    """
    sbox = np.array(sbox, dtype=int)
    n = len(sbox)
    n_bits = int(np.log2(n))
    n_out = 2**int(np.log2(max(sbox) + 1))
    lat = np.zeros((n, n_out), dtype=int)
    
    for beta in range(n_out):
        # Build truth table of f_beta(x) = <beta, S(x)>
        f = np.array([bin(beta & sbox[x]).count('1') % 2 for x in range(n)])
        
        for alpha in range(n):
            # Compute Walsh coefficient
            walsh = 0
            for x in range(n):
                ax = bin(alpha & x).count('1') % 2
                walsh += (-1)**(ax ^ f[x])
            lat[alpha][beta] = walsh // 2
    
    return lat

HEYS_SBOX = [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8,
             0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]
PRESENT_SBOX = [0xC, 0x5, 0x6, 0xB, 0x9, 0x0, 0xA, 0xD,
                0x3, 0xE, 0xF, 0x8, 0x4, 0x7, 0x1, 0x2]
np.random.seed(2023)
RANDOM_SBOX = list(np.random.permutation(16))

sboxes = {'Heys': HEYS_SBOX, 'PRESENT': PRESENT_SBOX, 'Random': RANDOM_SBOX}

print("Nonlinearity Analysis")
print("=" * 60)
print()

for name, sbox in sboxes.items():
    lat_direct = compute_lat(sbox)
    lat_walsh = walsh_hadamard_lat(sbox)
    
    # Verify both methods agree
    assert np.array_equal(lat_direct, lat_walsh), f"Mismatch for {name}!"
    
    n = len(sbox)
    # Max absolute LAT entry (excluding (0,0))
    max_lat = 0
    for a in range(n):
        for b in range(n):
            if a == 0 and b == 0:
                continue
            max_lat = max(max_lat, abs(lat_direct[a][b]))
    
    nonlinearity = (n // 2) - max_lat
    max_bias = max_lat / n
    
    print(f"  {name:10s} S-box:")
    print(f"    Max |LAT entry|      = {max_lat}")
    print(f"    Nonlinearity NL(S)   = {n//2} - {max_lat} = {nonlinearity}")
    print(f"    Max |bias|           = {float(max_bias):.4f}")
    print(f"    Walsh methods agree  = True")
    print()

print("Verification: direct LAT computation and Walsh-Hadamard give identical results.")
print()

# Visualize nonlinearity comparison as a bar chart
names = list(sboxes.keys())
nonlinearities = []
max_biases = []
for name, sbox in sboxes.items():
    lat = compute_lat(sbox)
    n = len(sbox)
    max_lat = max(abs(lat[a][b]) for a in range(n) for b in range(n)
                  if not (a == 0 and b == 0))
    nonlinearities.append((n // 2) - max_lat)
    max_biases.append(max_lat / n)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

colors = ['#1565C0', '#2E7D32', '#E65100']

# Nonlinearity
bars1 = axes[0].bar(names, nonlinearities, color=colors, edgecolor='black', linewidth=1)
axes[0].set_ylabel('Nonlinearity $\\mathcal{NL}(S)$', fontsize=12)
axes[0].set_title('S-box Nonlinearity', fontsize=13, fontweight='bold')
axes[0].axhline(y=4, color='red', linestyle='--', alpha=0.6, label='Optimal (NL=4)')
axes[0].legend(fontsize=10)
for bar, nl in zip(bars1, nonlinearities):
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1,
                 str(nl), ha='center', fontweight='bold', fontsize=12)
axes[0].set_ylim(0, 5.5)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)

# Max absolute bias
bars2 = axes[1].bar(names, max_biases, color=colors, edgecolor='black', linewidth=1)
axes[1].set_ylabel('Max $|\\varepsilon|$', fontsize=12)
axes[1].set_title('Maximum Absolute Bias', fontsize=13, fontweight='bold')
axes[1].axhline(y=0.25, color='red', linestyle='--', alpha=0.6, label='Optimal bound (1/4)')
axes[1].legend(fontsize=10)
for bar, mb in zip(bars2, max_biases):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005,
                 f'{float(mb):.3f}', ha='center', fontweight='bold', fontsize=11)
axes[1].set_ylim(0, 0.55)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)

fig.suptitle('Figure 16.5: S-box Quality Metrics for Linear Cryptanalysis Resistance',
             fontsize=14, fontweight='bold', y=1.02)

plt.tight_layout()
plt.savefig('fig_16_5_nonlinearity_comparison.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

Nonlinearity Analysis
============================================================

  Heys       S-box:
    Max |LAT entry|      = 6
    Nonlinearity NL(S)   = 8 - 6 = 2
    Max |bias|           = 0.3750
    Walsh methods agree  = True

  PRESENT    S-box:
    Max |LAT entry|      = 4
    Nonlinearity NL(S)   = 8 - 4 = 4
    Max |bias|           = 0.2500
    Walsh methods agree  = True

  Random     S-box:
    Max |LAT entry|      = 6
    Nonlinearity NL(S)   = 8 - 6 = 2
    Max |bias|           = 0.3750
    Walsh methods agree  = True

Verification: direct LAT computation and Walsh-Hadamard give identical results.

../_images/11c02f9f2fd4dcfea9cee4b3b775650789ea3dde4060142f727905b9d5f2c5cd.png

16.10 Data Complexity of Linear Cryptanalysis#

A crucial question for the practitioner is: how many known plaintexts are needed to exploit a linear approximation with bias \(\varepsilon\)? The answer comes from a statistical argument based on the Central Limit Theorem.

Theorem 16.2 (Data Complexity)

To distinguish a linear approximation with bias \(\varepsilon\) from a random guess (bias 0) with success probability \(\geq 97.7\%\), the number of known plaintext-ciphertext pairs required is approximately:

\[ N \approx \frac{1}{\varepsilon^2}\]

More precisely, using a sign-determination test, the success probability after \(N\) samples is:

\[ P_{\text{success}} = \Phi\!\left(2|\varepsilon|\sqrt{N}\right)\]

where \(\Phi\) is the standard normal CDF. Setting \(P_{\text{success}} = 0.977\) (corresponding to \(2\sigma\)) gives \(N \approx 1/\varepsilon^2\).

This inverse-square relationship means that halving the bias requires four times as many plaintexts. For Matsui’s attack on DES, the best linear approximation across 14 rounds had bias \(|\varepsilon| \approx 2^{-21.2}\), yielding a data requirement of approximately \(2^{42.4} \approx 2^{43}\) known plaintexts.

../_images/d559bfb2a0bb8c1fb3570193e11a9483b0999e65d0f030006496013e0052e55a.png

Monte Carlo Validation of the Data Complexity Bound#

Let us verify the \(N \approx 1/\varepsilon^2\) bound experimentally. We simulate a linear approximation with known bias and measure the success rate of the sign-determination test as a function of the sample size.

Show code cell source Hide code cell source

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

def run_sign_test(true_bias, n_samples, n_trials=5000):
    """
    Simulate the sign-determination test.
    
    For each trial:
    1. Generate n_samples biased bits (P(X=0) = 0.5 + true_bias)
    2. Estimate the bias from the sample
    3. Check if the sign of the estimate matches the true sign
    
    Returns the fraction of trials where the sign is correct.
    """
    correct = 0
    for _ in range(n_trials):
        bits = (np.random.random(n_samples) >= (0.5 + true_bias)).astype(int)
        estimated_bias = np.mean(bits == 0) - 0.5
        if np.sign(estimated_bias) == np.sign(true_bias):
            correct += 1
    return correct / n_trials

# Test with several bias values
test_biases = [0.1, 0.05, 0.02, 0.01]
sample_sizes = [10, 30, 100, 300, 1000, 3000, 10000]

fig, ax = plt.subplots(figsize=(11, 6))

colors = ['#1565C0', '#2E7D32', '#E65100', '#7B1FA2']

for i, eps in enumerate(test_biases):
    success_rates = []
    for n_s in sample_sizes:
        sr = run_sign_test(eps, n_s, n_trials=3000)
        success_rates.append(sr)
    
    # Theoretical prediction
    N_theory = 1.0 / eps**2
    
    ax.semilogx(sample_sizes, success_rates, 'o-', color=colors[i], linewidth=2,
                markersize=7, label=f'$\\varepsilon = {eps}$ ($N^* = {int(N_theory)}$)')
    
    # Mark theoretical threshold
    ax.axvline(x=N_theory, color=colors[i], linestyle=':', alpha=0.4, linewidth=1.5)

ax.axhline(y=0.977, color='red', linestyle='--', alpha=0.5, linewidth=1.5)
ax.text(12, 0.985, '97.7% threshold', fontsize=10, color='red')

ax.set_xlabel('Number of Samples $N$', fontsize=12)
ax.set_ylabel('Success Rate of Sign Test', fontsize=12)
ax.set_title('Figure 16.7: Monte Carlo Validation of Data Complexity',
             fontsize=13, fontweight='bold')
ax.legend(fontsize=10, loc='lower right')
ax.set_ylim(0.45, 1.02)
ax.grid(True, alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig('fig_16_7_monte_carlo_validation.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

../_images/465c29defe6d575e7189704458465e88c2e460e83a203d3b08a08401d5ed4c2f.png

16.11 Statistical Properties of the LAT#

The entries of the LAT are not arbitrary integers – they obey specific structural constraints. Understanding these constraints is important both for S-box design and for assessing the quality of linear approximations.

Theorem 16.3 (Parseval’s Relation for the LAT)

For any S-box \(S: \mathbb{F}_2^n \to \mathbb{F}_2^n\) (a permutation), and for any fixed output mask \(\beta \neq 0\):

\[ \sum_{\alpha=0}^{2^n - 1} \Lambda[\alpha][\beta]^2 = 2^{2(n-1)}\]

This is the Parseval relation, and it implies that the “energy” in each column of the LAT is fixed. If one entry is large, others must be correspondingly smaller.

Let us verify Parseval’s relation numerically for our S-boxes.

Parseval's Relation Verification for the Heys S-box
=======================================================

Expected: sum_alpha LAT[alpha][beta]^2 = 2^{2(n-1)} = 2^6 = 64

  beta    Sum of squares    Expected   Match
----------------------------------------------
0x1                   64          64     Yes
0x2                   64          64     Yes
0x3                   64          64     Yes
0x4                   64          64     Yes
0x5                   64          64     Yes
0x6                   64          64     Yes
0x7                   64          64     Yes
0x8                   64          64     Yes
0x9                   64          64     Yes
0xA                   64          64     Yes
0xB                   64          64     Yes
0xC                   64          64     Yes
0xD                   64          64     Yes
0xE                   64          64     Yes
0xF                   64          64     Yes

All columns satisfy Parseval's relation: True

Row-wise check (fixed alpha, sum over beta):
 alpha    Sum of squares
----------------------------
0x1                   64
0x2                   64
0x3                   64
0x4                   64
0x5                   64
0x6                   64
0x7                   64
0x8                   64
0x9                   64
0xA                   64
0xB                   64
0xC                   64
0xD                   64
0xE                   64
0xF                   64

Note: Row sums also equal 64 = 2^6 for this permutation S-box.
This is because Parseval's relation holds both row-wise and column-wise
for bijective S-boxes.

../_images/348fde74c15bd9477e94245e6264dd953fb367b6ca7d599dbdf4de6548750ed5.png

16.12 Exercises#

Exercise 16.1 (Computing the LAT by Hand). Consider the 3-bit S-box \(S: \mathbb{F}_2^3 \to \mathbb{F}_2^3\) defined by:

\(x\)	0	1	2	3	4	5	6	7
\(S(x)\)	6	4	0	3	7	2	1	5

(a) Compute the full \(8 \times 8\) LAT of this S-box by hand (or by systematic calculation). Show your work for at least three non-trivial entries.

(b) Identify the linear approximation with the largest absolute bias. What are \(\alpha\), \(\beta\), and \(\varepsilon\)?

(c) If this S-box were used in a 4-round SPN cipher with no other sources of diffusion, what would the piling-up lemma predict for the bias of a 4-round linear approximation built from copies of your best single-round approximation?

Hint

For each entry \(\Lambda[\alpha][\beta]\), iterate over all 8 inputs \(x \in \{0, \ldots, 7\}\) and count how many satisfy \(\langle \alpha, x \rangle = \langle \beta, S(x) \rangle\). Then subtract \(2^{n-1} = 4\).

For part (c), if the best single-round bias is \(\varepsilon_1\), then by the piling-up lemma the 4-round bias is \(\varepsilon_4 = 2^3 \cdot \varepsilon_1^4\).

Exercise 16.2 (Piling-Up Lemma: Edge Cases).

(a) Prove that if any one of the \(n\) independent binary random variables \(X_i\) is perfectly balanced (i.e., \(\varepsilon_i = 0\)), then the bias of \(X_1 \oplus \cdots \oplus X_n\) is exactly zero, regardless of the other biases. Interpret this result in the context of cipher design.

(b) Prove that if all \(n\) variables have the same bias \(\varepsilon\), then the combined bias is \(\varepsilon_{\text{total}} = 2^{n-1} \varepsilon^n\). For what value of \(n\) does \(|\varepsilon_{\text{total}}|\) begin to decrease as a function of \(n\) (assuming \(|\varepsilon| < 1/2\))?

(c) Matsui’s attack on DES uses a 14-round linear approximation. If the average per-round bias is \(|\varepsilon_r| \approx 2^{-2}\), estimate the total bias and the required number of known plaintexts. Compare this to the reported \(2^{43}\).

Hint

For (a), if \(\varepsilon_i = 0\) for any \(i\), the product \(\prod \varepsilon_j\) contains a zero factor.

For (b), differentiate \(f(n) = 2^{n-1} \varepsilon^n\) with respect to \(n\) (treating \(n\) as continuous) and set \(f'(n) = 0\). Alternatively, note that \(f(n) = \frac{1}{2}(2\varepsilon)^n\), so \(|f(n)|\) decreases when \(|2\varepsilon| < 1\), i.e., \(|\varepsilon| < 1/2\) – which means it decreases for all \(n \geq 1\).

For (c), be careful: the 14-round approximation is not simply 14 copies of the same one-round approximation. The actual construction uses a chain of different S-box approximations.

Exercise 16.3 (S-box Design Criteria).

Design a 4-bit S-box \(S: \mathbb{F}_2^4 \to \mathbb{F}_2^4\) (a permutation of \(\{0, \ldots, 15\}\)) that achieves optimal nonlinearity for linear cryptanalysis resistance (i.e., \(\mathcal{NL}(S) = 4\), meaning the maximum absolute LAT entry is 4).

(a) Verify your S-box by computing its full LAT and confirming that no entry (excluding \(\Lambda[0][0]\)) exceeds 4 in absolute value.

(b) Is your S-box also optimal with respect to differential cryptanalysis resistance? (You may use tools from a later chapter, or simply check whether the maximum entry in the Difference Distribution Table exceeds 4.)

(c) Explain why it is impossible to have all non-trivial LAT entries equal to \(\pm 2\) for a 4-bit S-box. (Hint: use Parseval’s relation.)

Hint

The PRESENT S-box [0xC, 0x5, 0x6, 0xB, 0x9, 0x0, 0xA, 0xD, 0x3, 0xE, 0xF, 0x8, 0x4, 0x7, 0x1, 0x2] is known to achieve optimal nonlinearity.

For (c), Parseval says \(\sum_\alpha \Lambda[\alpha][\beta]^2 = 64\) for each non-zero \(\beta\). If all 16 entries (including \(\alpha = 0\)) were \(\pm 2\), the sum would be \(16 \times 4 = 64\). But \(\Lambda[0][\beta] = 0\) for any non-zero \(\beta\) when \(S\) is a permutation, so you only have 15 non-zero-alpha entries, giving \(15 \times 4 = 60 \neq 64\). This is a contradiction.

Exercise 16.4 (Linear Approximations of Composed Functions).

Let \(S_1: \mathbb{F}_2^4 \to \mathbb{F}_2^4\) and \(S_2: \mathbb{F}_2^4 \to \mathbb{F}_2^4\) be two S-boxes, and define \(T = S_2 \circ S_1\).

(a) Write a Python function that computes the LAT of \(T\) directly from the LATs of \(S_1\) and \(S_2\) using the composition formula:

\[ \Lambda_T[\alpha][\beta] = \frac{1}{2^{n-1}} \sum_{\gamma} \Lambda_{S_1}[\alpha][\gamma] \cdot \Lambda_{S_2}[\gamma][\beta]\]

(b) Verify your formula by comparing the result to the LAT computed directly from the composed S-box lookup table.

(c) Explain the connection between this composition formula and the piling-up lemma. Under what conditions does the formula give exact results vs. approximate results?

Hint

The composition formula is a matrix multiplication (up to a scaling factor). It is exact when the intermediate values are uniformly distributed (i.e., when \(S_1\) is a permutation). The piling-up lemma is the special case where you approximate the product of two biases as \(2 \varepsilon_1 \varepsilon_2\).

In matrix form: \(\Lambda_T = \frac{1}{2^{n-1}} \Lambda_{S_1} \cdot \Lambda_{S_2}\).

Exercise 16.5 (Challenge: From LAT to Key Recovery).

Consider a simplified 2-round SPN cipher with:

Block size: 4 bits
S-box: the Heys S-box [0xE, 0x4, 0xD, 0x1, 0x2, 0xF, 0xB, 0x8, 0x3, 0xA, 0x6, 0xC, 0x5, 0x9, 0x0, 0x7]
No permutation layer (identity permutation)
Three round keys \(K_0, K_1, K_2 \in \mathbb{F}_2^4\) XORed before/between/after the S-box layers

The encryption is: \(C = S(S(P \oplus K_0) \oplus K_1) \oplus K_2\)

(a) Using the LAT, find the best linear approximation across the 2-round cipher (one approximation per S-box application, combined via the piling-up lemma).

(b) Write a Python program that:

Encrypts 1000 random plaintexts under a random key \((K_0, K_1, K_2)\)
For each candidate value of \(K_2\), partially decrypts the ciphertext through the last round
Counts the bias of the linear approximation for each candidate \(K_2\)
Recovers \(K_2\) as the candidate with the largest absolute bias

(c) How many known plaintexts are needed to recover \(K_2\) with \(> 95\%\) success rate? Verify experimentally.

Hint

The best approximation for the Heys S-box has bias \(\pm 1/4\). Using the piling-up lemma for 2 rounds: \(\varepsilon_{2r} = 2 \cdot (1/4) \cdot (1/4) = 1/8\). So \(N \approx 1/(1/8)^2 = 64\) plaintexts should suffice.

For the key recovery: partially decrypt with each of the 16 candidate values of \(K_2\), apply the inverse S-box, and check which candidate gives the strongest bias for the linear approximation. The correct \(K_2\) will produce a distinctly larger bias than incorrect candidates.

16.13 Summary#

This chapter has laid the mathematical and computational foundations of linear cryptanalysis, one of the two most important general-purpose attacks on block ciphers (the other being differential cryptanalysis, which we study in Chapter 17).

Key concepts introduced:

Linear approximation – a probabilistic linear equation relating input bits, output bits, and key bits of a cipher component. Characterized by input mask \(\alpha\), output mask \(\beta\), and bias \(\varepsilon\).
Bias – the deviation \(\varepsilon = \Pr[\langle \alpha, x \rangle = \langle \beta, S(x) \rangle] - 1/2\) that measures how far a linear approximation deviates from random. Large \(|\varepsilon|\) means the S-box is “more linear” along that approximation.
Linear Approximation Table (LAT) – the complete catalog of all biases for all mask pairs. The LAT is to linear cryptanalysis what the DDT (Difference Distribution Table) is to differential cryptanalysis.
Piling-up lemma – the composition rule \(\varepsilon_{1 \oplus \cdots \oplus n} = 2^{n-1}\prod \varepsilon_i\) that allows biases of individual round approximations to be combined into a multi-round approximation.
Nonlinearity – the key S-box design metric \(\mathcal{NL}(S) = 2^{n-1} - \max|\Lambda[\alpha][\beta]|\) that quantifies resistance to linear cryptanalysis.
Data complexity – the number of known plaintexts \(N \approx 1/\varepsilon^2\) required to exploit a linear approximation.
Parseval’s relation – the constraint \(\sum_\alpha \Lambda[\alpha][\beta]^2 = 2^{2(n-1)}\) that limits how “nonlinear” an S-box can be.

Tip

Looking ahead. In Chapter 17, we will apply these foundations to mount a complete linear attack on a simplified SPN cipher, following Heys’ tutorial. We will construct multi-round linear approximations, implement the key-recovery phase, and analyze the success probability as a function of the number of known plaintexts.

16.14 References#

Mitsuru Matsui, “Linear Cryptanalysis Method for DES Cipher,” Advances in Cryptology – EUROCRYPT ‘93, Lecture Notes in Computer Science, vol. 765, pp. 386–397, Springer, 1994. The foundational paper introducing linear cryptanalysis. Matsui presents the piling-up lemma, constructs linear approximations for DES, and outlines Algorithm 1 (basic key recovery) and Algorithm 2 (improved key recovery).
Mitsuru Matsui, “The First Experimental Cryptanalysis of the Data Encryption Standard,” Advances in Cryptology – CRYPTO ‘94, Lecture Notes in Computer Science, vol. 839, pp. 1–11, Springer, 1994. Matsui’s follow-up paper reporting the first successful experimental linear cryptanalysis of full DES, recovering the 56-bit key using \(2^{43}\) known plaintexts on a set of 12 HP 9735 workstations.
Howard M. Heys, “A Tutorial on Linear and Differential Cryptanalysis,” Cryptologia, 26(3), pp. 189–221, 2002. An excellent pedagogical introduction to both linear and differential cryptanalysis, using a simplified SPN cipher as the running example. The Heys S-box used throughout this chapter comes from this tutorial. Highly recommended as a companion to this chapter.
Eli Biham and Adi Shamir, “Differential Cryptanalysis of DES-like Cryptosystems,” Journal of Cryptology, 4(1), pp. 3–72, 1991. While focused on differential cryptanalysis, this paper provides essential context for understanding why linear cryptanalysis was such an important development – it was the second general attack methodology for Feistel ciphers.
Joan Daemen and Vincent Rijmen, The Design of Rijndael: AES – The Advanced Encryption Standard, Springer, 2002. The designers of AES explain the wide trail strategy, which was explicitly designed to provide provable bounds against both linear and differential cryptanalysis. Chapter 9 discusses the relationship between S-box properties (including the LAT) and cipher security.
Kaisa Nyberg, “Linear Approximation of Block Ciphers,” Advances in Cryptology – EUROCRYPT ‘94, Lecture Notes in Computer Science, vol. 950, pp. 439–444, Springer, 1995. Nyberg establishes theoretical bounds on the nonlinearity of S-boxes and introduces the concept of “almost bent” functions, which achieve optimal resistance to linear cryptanalysis.

Chapter 16: Foundations of Linear Cryptanalysis

Contents

Chapter 16: Foundations of Linear Cryptanalysis#

16.1 Historical Context: Matsui’s Breakthrough#

16.2 Formal Definitions#

Boolean Functions and Inner Products#

16.3 The Piling-Up Lemma#

16.4 Python Implementation: The LinearAnalyzer Class#

Identifying the Strongest Approximations#

16.5 Visualizing the Linear Approximation Table#

16.6 Experimental Verification of the Piling-Up Lemma#

16.7 Comparing S-box Linearity: Heys vs PRESENT vs Random#

16.8 Distribution of Bias Values#

16.9 Nonlinearity and the Walsh-Hadamard Transform#

16.10 Data Complexity of Linear Cryptanalysis#

Monte Carlo Validation of the Data Complexity Bound#

16.11 Statistical Properties of the LAT#

16.12 Exercises#

16.13 Summary#

16.14 References#

16.4 Python Implementation: The `LinearAnalyzer` Class#