Elements of Cryptanalysis

Parts IV--V

Enigma & Information Theory

Bartosz Naskręcki

Chapters 10--15

Part IV

The Enigma Machine

Chapters 10--12

Chapter 10

Enigma -- Design and Electromechanical Operation

Historical Context

  • 1918: Arthur Scherbius patents the electromechanical rotor cipher
  • 1926: German Reichswehr adopts Military Enigma
  • Key differences from commercial model:
    • Plugboard (Steckerbrett) adds \(\approx 1.5 \times 10^{14}\) factor
    • Rotor wirings classified as secret
    • Daily key sheets (Kenngruppenbuch)
YearChangeEffect
1930Plugboard addedKey space × \(1.5 \times 10^{14}\)
1938Rotor set 3 → 5Rotor orders 6 → 60
1942Naval M4 (4 rotors)Key space × \(26 \times 2\)

Mathematical Structure

Definition 10.4 -- Enigma Permutation
For three rotors \(R_1, R_2, R_3\), reflector \(U\), and plugboard \(P\): $$E(x) = P \circ R_1^{-1} \circ R_2^{-1} \circ R_3^{-1} \circ U \circ R_3 \circ R_2 \circ R_1 \circ P(x)$$

Signal path:

Keyboard ⟶ \(P\) ⟶ \(R_1\) ⟶ \(R_2\) ⟶ \(R_3\) ⟶ \(U\) ⟶ \(R_3^{-1}\) ⟶ \(R_2^{-1}\) ⟶ \(R_1^{-1}\) ⟶ \(P\) ⟶ Lampboard

Definition 10.1 -- Rotor at Position \(p\)
$$\sigma_p(x) = \sigma\bigl(x + (p - r)\bigr) - (p - r) \pmod{26}$$

Rotors step before each character is encrypted -- the permutation changes with every keypress.

Key Theorems

Theorem 10.1 -- Self-Reciprocal Property
The Enigma permutation is an involution: \(E^2 = \mathrm{id}\).
Proof: Write \(E = F^{-1} \circ U \circ F\). Then \(E^2 = F^{-1} U (F F^{-1}) U F = F^{-1} U^2 F = \mathrm{id}\).
Theorem 10.2 -- No Fixed Points
If the reflector \(U\) is fixed-point-free, then \(E(x) \neq x\) for all \(x\).
Proof: If \(E(x) = x\), then \(U(F(x)) = F(x)\), contradicting \(U\) having no fixed points.
Cryptanalytic consequence: No letter ever encrypts to itself. This was exploited in crib-based attacks at Bletchley Park to eliminate invalid crib positions.

Keyspace Analysis

ComponentChoicesCount
Rotor selection (3 from 5)\(5 \times 4 \times 3\)60
Rotor positions\(26^3\)17,576
Ring settings\(26^3\) (one redundant)17,576
Plugboard (10 pairs)\(\frac{26!}{6!\,2^{10}\,10!}\)150,738,274,937,250

Total keyspace: \(\approx 1.59 \times 10^{20}\) ≈ 67 bits

The plugboard alone contributes \(\approx 47\) bits -- yet its involutory structure was exploitable. The key insight: mathematical structure, not raw key size, determines security.

Chapter 11

The Polish Codebreakers: Rejewski's Mathematical Attack

The Doubled Message Key (Pre-1938)

  1. Daily key sets rotor order, rings, plugboard, and Grundstellung
  2. Operator chooses random 3-letter message key, e.g. ABC
  3. Set rotors to Grundstellung, type key twice: ABCABC
  4. Transmit the 6-letter encrypted indicator, e.g. DMQVBN
  5. Set rotors to message key and encrypt the message
Definition 11.1 -- Message Indicator
$$X_1 X_2 X_3 X_4 X_5 X_6 = E_1(m_1)\, E_2(m_2)\, E_3(m_3)\, E_4(m_1)\, E_5(m_2)\, E_6(m_3)$$ Positions 1 & 4 encrypt the same letter \(m_1\), but at different rotor states.

Characteristic Permutations

Definition 11.2 -- Permutations \(A\), \(B\), \(C\)
From many intercepted indicators on one day, define:
  • \(A\): maps \(X_1 \to X_4\)   ⇒   \(A = E_4 \circ E_1^{-1}\)
  • \(B\): maps \(X_2 \to X_5\)   ⇒   \(B = E_5 \circ E_2^{-1}\)
  • \(C\): maps \(X_3 \to X_6\)   ⇒   \(C = E_6 \circ E_3^{-1}\)
Theorem 11.1 -- Independence from Message Keys
\(A\), \(B\), \(C\) depend only on the daily key (rotor order, ring settings, plugboard, Grundstellung) -- not on individual operators' message key choices.

Key insight: With ~60--100 intercepted messages per day, all three permutations could be fully determined (coupon collector problem on 26 letters).

The Cycle Structure Attack

Rejewski's method:

  1. Extract \(A\), \(B\), \(C\) from intercepts
  2. Decompose each into cycles
  3. Record cycle length multisets as fingerprints
  4. Look up in pre-computed catalog
Theorem 11.2
A random \(\sigma \in S_{26}\) has expected number of cycles: $$\mathbb{E}[\#\text{cycles}] = H_{26} = \sum_{k=1}^{26}\frac{1}{k} \approx 3.76$$

The number of distinct cycle structures in \(S_{26}\) is \(p(26) = 2{,}436\) (partitions of 26). Using the triple \((A, B, C)\) together yields near-unique identification of the Grundstellung among the \(26^3 = 17{,}576\) possibilities.

Rejewski's Legacy

ConceptKey Point
Doubled message keyCreated observable pairs \((X_1,X_4)\), \((X_2,X_5)\), \((X_3,X_6)\)
Characteristic permutations\(A = E_4 \circ E_1^{-1}\) -- observable from intercepts alone
Cycle structureFingerprint preserved under conjugation by plugboard
Catalog methodPre-compute for all \(26^3\) positions; match fingerprint
Attack typeCiphertext-only -- no known plaintext needed
"Pure mathematics unwrapped the riddle" -- the attack relied on abstract algebra (permutation groups), not on guessed plaintext.

Chapter 12

Bletchley Park and the Bombe

Cribs and the Menu Graph

Definition 12.1 -- Crib
A crib \((p, c)\) is a known plaintext-ciphertext pair of length \(L\), where each \(p_i \mapsto c_i\) under the Enigma at rotor state \(\sigma_i\).
Definition 12.2 -- Menu
The menu is an undirected multigraph \(G = (V, E)\) where:
  • Vertices: distinct letters in \(p\) and \(c\)
  • Edges: for each position \(i\), an edge \((p_i, c_i)\) labeled with offset \(i\)
Cycles in this graph create systems of simultaneous constraints.

Crib sources: weather reports (Wetterbericht), routine phrases ("keine besonderen Ereignisse"), re-encipherments.

The Bombe: Contradiction-Based Elimination

Definition 12.4 -- Bombe Logic
For each of the \(26^3 = 17{,}576\) rotor positions (fixed rotor order):
  1. Assume a plugboard mapping for one letter, e.g. \(\pi(A) = A\)
  2. Propagate through the menu: deduce implied mappings via rotor permutations
  3. Check for contradictions: if \(\pi(X) = Y\) and \(\pi(X) = Z\) with \(Y \neq Z\), reject
  4. No contradiction ⇒ stop (candidate for correct key)
The Bombe does not "decrypt." It eliminates impossible rotor settings. It reduces \(\sim 10^{23}\) possible keys to a handful of candidate positions for manual verification.

Filtering Power & the Diagonal Board

Theorem 12.2 -- Cycle Filtering Power
Each cycle of length \(k\) in the menu eliminates all but \(\approx 26^{1-k}\) of candidate positions. A cycle of length \(k\) filters 17,576 positions to \(\approx 17{,}576 / 26^{k-1}\) expected stops.
Theorem 12.4 -- Probability of Loops
$$P(\text{at least one cycle}) \approx 1 - \exp\!\left(-\frac{L(L-1)}{4 \cdot 26}\right)$$ For \(L = 13\): \(P > 0.90\).   For \(L = 20\): \(P > 0.99\).
Welchman's Diagonal Board
Exploits plugboard symmetry: \(\pi(A) = B \Rightarrow \pi(B) = A\). Doubles the number of implications, reducing false stops by a factor of 2--4.

Bletchley Park: Putting It All Together

No-self-encryption filter: Since \(E(x) \neq x\), any crib position where a letter encrypts to itself is immediately rejected. This eliminates 30--50% of candidate placements before the Bombe runs.
DateEvent
Jul 1939Pyry conference: Polish handover to Britain
Mar 1940First Bombe ("Victory") operational
Aug 1940Welchman's diagonal board added
Jun 1944Ultra intelligence supports D-Day
1945200+ Bombes, decoding 4,000+ messages/day

Bombe complexity: \(17{,}576 \times 26 \times 60 \approx 2.7 \times 10^7\) tests. Intelligence from Enigma decrypts was codenamed Ultra.

Part V

Information Theory & Block Ciphers

Chapters 13--15

Chapter 13

Shannon's Theory of Secrecy Systems

Perfect Secrecy

Definition 13.1 -- Cryptosystem
A five-tuple \((\mathcal{P}, \mathcal{C}, \mathcal{K}, E, D)\) with encryption \(E_k : \mathcal{P} \to \mathcal{C}\), decryption \(D_k : \mathcal{C} \to \mathcal{P}\), satisfying \(D_k(E_k(p)) = p\) for all \(k, p\).
Definition 13.2 -- Perfect Secrecy
A cryptosystem has perfect secrecy if for all \(p \in \mathcal{P}\) and \(c \in \mathcal{C}\) with \(\Pr[C = c] > 0\): $$\Pr[P = p \mid C = c] = \Pr[P = p]$$ Equivalently: \(P\) and \(C\) are statistically independent. The ciphertext reveals zero information about the plaintext.

Entropy & Mutual Information

Definition 13.3 -- Shannon Entropy
$$H(X) = -\sum_{x} p(x) \log_2 p(x) \quad \text{(bits)}$$

Conditional entropy:

$$H(X \mid Y) = -\sum_{x,y} p(x,y) \log_2 p(x \mid y)$$

Mutual information:

$$I(X; Y) = H(X) - H(X \mid Y)$$

Key equivalence
Perfect secrecy ⇔ \(I(P; C) = 0\) ⇔ ciphertext carries zero bits of information about the plaintext.

Examples: Fair coin \(H = 1\) bit. Uniform over 26 letters: \(H = \log_2 26 \approx 4.70\) bits.

Shannon's Fundamental Theorems

Theorem 13.1 -- Perfect Secrecy Requires Large Keys
If a cryptosystem has perfect secrecy, then \(|\mathcal{K}| \geq |\mathcal{P}|\).
Proof idea: If \(|\mathcal{K}| < |\mathcal{P}|\), some plaintext \(p^*\) is unreachable from some ciphertext \(c\), so \(\Pr[P = p^* \mid C = c] = 0 \neq \Pr[P = p^*]\).
Theorem 13.2 -- One-Time Pad Achieves Perfect Secrecy
Encryption \(E_k(p) = (p + k) \bmod n\) with \(k\) uniform and \(|k| = |p|\): $$\Pr[C = c \mid P = p] = \frac{1}{n^L}$$ is the same for all \(p\), so \(\Pr[P = p \mid C = c] = \Pr[P = p]\).   \(\blacksquare\)
Reusing a one-time pad key destroys perfect secrecy. If \(c_1 \oplus c_2 = p_1 \oplus p_2\), the XOR of plaintexts leaks. (Exploited in the VENONA project against Soviet reuse.)

Unicity Distance

Definition 13.4 -- Unicity Distance
$$n_0 \approx \frac{H(K)}{D}$$ where \(H(K) = \log_2|\mathcal{K}|\) is the key entropy and \(D = R - r_L\) is the language redundancy.

English redundancy:

  • Absolute rate: \(R = \log_2 26 \approx 4.70\) bits
  • True entropy: \(r_L \approx 1.0\text{--}1.5\) bits
  • Redundancy: \(D \approx 3.2\text{--}3.7\) bits/char
Cipher\(H(K)\)\(n_0\)
Shift4.71.4
Vigenere (m=10)47.013.8
Substitution88.426.0
Enigma67.119.7

Spurious keys decay exponentially: \(\approx |\mathcal{K}| \cdot 2^{-nD} - 1\). Even substitution ciphers (\(26!\) keys) are uniquely breakable from ~25 characters.

Shannon's Framework: Summary

ConceptKey Point
Perfect secrecy\(\Pr[P=p \mid C=c] = \Pr[P=p]\); ciphertext reveals nothing
Shannon entropy\(H(X) = -\sum p(x)\log_2 p(x)\); average uncertainty in bits
OTP theoremAchieves perfect secrecy with \(|\mathcal{K}| = |\mathcal{P}|\)
Lower boundPerfect secrecy requires \(|\mathcal{K}| \geq |\mathcal{P}|\)
Unicity distance\(n_0 = H(K)/D\); how much ciphertext determines the key
Fundamental tensionUnconditional security ⇔ impractical keys
This tension motivated the shift from information-theoretic security to computational security -- the foundation of modern cryptography.

Chapter 14

Block Cipher Design Principles

Shannon's Design Principles

Definition 14.2 -- Confusion
Each ciphertext bit depends on the key in a complex, nonlinear way. The mapping \(k \mapsto E_k(m)\) is highly nonlinear for every fixed \(m\).
Achieved through S-boxes (substitution boxes).
Definition 14.3 -- Diffusion
Each plaintext bit influences many (ideally all) ciphertext bits. Flipping one input bit causes each output bit to change with probability \(\frac{1}{2}\) (avalanche criterion).
Achieved through P-boxes (permutation layers) and linear mixing.
Shannon (1949): A product cipher alternating substitution (confusion) and transposition (diffusion) can achieve both simultaneously. This insight is the ancestor of every modern block cipher.

Substitution-Permutation Network (SPN)

Definition 14.6 -- SPN
For \(R\) rounds, each round applies:
  1. Key mixing: XOR state with round key \(K_r\)
  2. Substitution: Apply S-box \(S : \{0,1\}^m \to \{0,1\}^m\) to each block
  3. Permutation: Apply P-box \(\pi\) to rearrange bit positions
Final round: key mixing only (no permutation layer).
S-box (Def 14.4)
Nonlinear function \(S : \{0,1\}^m \to \{0,1\}^n\).
Only source of nonlinearity. Quality measured by nonlinearity (Walsh spectrum), differential uniformity, algebraic degree.
P-box (Def 14.5)
Bit permutation \(\pi : \{0,\ldots,n{-}1\} \to \{0,\ldots,n{-}1\}\).
Linear operation. Ensures outputs of one S-box feed inputs of different S-boxes in the next round.

S-box Design & the Avalanche Effect

Heys tutorial S-box (4-bit permutation):
In0123456789ABCDEF
OutE4D12FB83A6C5907

Strict Avalanche Criterion:

Flipping any single input bit should flip each output bit with probability \(\frac{1}{2}\).

A single S-box cannot satisfy SAC perfectly (only 4 bits), but after multiple SPN rounds the full cipher achieves it.

Theorem 14.1 (Shannon, 1949)
The XOR cipher (one-time pad) achieves perfect secrecy iff \(|\mathcal{K}| \geq |\mathcal{P}|\).
Block ciphers trade perfect secrecy for computational security with short, reusable keys.

Key Schedule & Design Summary

Key Schedule
Derives round keys \(K_1, \ldots, K_{R+1}\) from the master key. The Heys tutorial uses a sliding window: \(K_i\) = bits \([4i, 4i+16)\) of a 32-bit master key. Weakness: consecutive keys share 75% of bits.
Design ElementPurposeAchieves
S-box (substitution)Nonlinear mappingConfusion
P-box (permutation)Bit rearrangementDiffusion
Key mixing (XOR)Key dependencyKey sensitivity
Multiple roundsIterated compositionFull avalanche
Key scheduleRound key derivationKey bit spreading

Heys P-box: \(\pi(i) = 4(i \bmod 4) + \lfloor i/4 \rfloor\) -- matrix transpose ensures each S-box feeds into four different S-boxes in the next round.

Chapter 15

The Data Encryption Standard (DES)

The Feistel Network

Definition 15.1 -- Feistel Network
Split input into halves \((L_0, R_0)\). For \(r\) rounds: $$L_i = R_{i-1}, \quad R_i = L_{i-1} \oplus F(R_{i-1}, K_i)$$ where \(F\) is the round function and \(K_i\) is the \(i\)-th round key.
Theorem 15.1 -- Feistel Invertibility
A Feistel cipher is always invertible regardless of whether \(F\) is invertible.
Proof: Given \((L_i, R_i)\), recover: \(R_{i-1} = L_i\),   \(L_{i-1} = R_i \oplus F(L_i, K_i)\).
Only \(F\) in the forward direction is needed -- never \(F^{-1}\).   \(\square\)

This is the crucial advantage of the Feistel structure: the round function \(F\) can be any function at all (not necessarily invertible), and the overall cipher remains a bijection. DES uses 16 Feistel rounds.

DES Round Function

Definition 15.2 -- DES Round Function \(F(R, K)\)
  1. Expansion \(E\): 32 bits → 48 bits (bit duplication)
  2. Key mixing: XOR with 48-bit round key
  3. S-box substitution: eight \(6 \to 4\)-bit S-boxes
  4. Permutation \(P\): 32-bit straight permutation
$$F(R, K) = P\bigl(S_1(B_1) \| S_2(B_2) \| \cdots \| S_8(B_8)\bigr)$$ where \(B_1 \| \cdots \| B_8 = E(R) \oplus K\).
Key schedule:
  • PC-1: 64 → 56 bits (drop parity)
  • Split into C, D halves (28 bits each)
  • Left-rotate each half per round
  • PC-2: 56 → 48-bit round key
Parameters:
  • Block size: 64 bits
  • Key: 56 effective bits (+8 parity)
  • Rounds: 16
  • S-boxes: 8 × (6→4 bit)

Security Properties

Complementation Property
$$E_K(P) = \overline{E_{\bar{K}}(\bar{P})}$$ Halves the brute-force search: only \(2^{55}\) encryptions needed (not \(2^{56}\)).
Weak keys: 4 keys where all 16 round keys are identical. Double encryption returns the original: \(E_K(E_K(P)) = P\).
Avalanche effect:
  • Round 1: ~1 bit differs
  • Round 5: ~30 bits differ
  • Round 16: ~32 bits (ideal 50%)
YearAttackComplexity
1993Matsui's linear cryptanalysis\(2^{43}\) known plaintexts
1998EFF Deep Crack (brute force)56 hours, \$250K hardware
1999Deep Crack + distributed.net22 hours

Diffusion Across Rounds

The diffusion heatmap visualizes which output bits are affected by flipping each input bit, for different numbers of DES rounds:

  • Round 1: Feistel structure → only the left half affected. Sparse, block-diagonal pattern.
  • Round 3: Cross-half diffusion begins. Each input bit influences ~20 output bits.
  • Round 5: Near-uniform influence. Most entries approach probability 0.5.
  • Round 16: Full diffusion achieved. Every input bit affects every output bit with \(P \approx 0.5\).
Full Avalanche
After 16 rounds, a single-bit plaintext change produces an average of \(\approx 32\) differing ciphertext bits out of 64 -- exactly the ideal 50% threshold. This is the diffusion property Shannon identified as essential.

DES: Summary

ConceptKey Point
Structure16-round Feistel network, 64-bit blocks
Key length56 bits effective (too small by modern standards)
Round functionExpansion → XOR → 8 S-boxes → P-permutation
Feistel propertyAlways invertible, regardless of \(F\)
Weak keys4 keys where \(E_K(E_K(P)) = P\)
Complementation\(E_K(P) = \overline{E_{\bar{K}}(\bar{P})}\) halves search
AvalancheFull diffusion by round 5--6
DES is now obsolete (replaced by AES in 2001), but its Feistel structure influenced every subsequent block cipher. Understanding DES is essential for understanding linear and differential cryptanalysis.

Summary: Parts IV--V

Part IV: Enigma

  • Ch 10: Enigma as composition of permutations; self-reciprocal, no fixed points
  • Ch 11: Rejewski's cycle-structure attack on the doubled message key
  • Ch 12: Turing's Bombe and crib-based contradiction elimination

Part V: Information Theory

  • Ch 13: Shannon's perfect secrecy, entropy, unicity distance
  • Ch 14: Confusion + diffusion; SPN design with S-boxes and P-boxes
  • Ch 15: DES: 16-round Feistel network, avalanche, weak keys

← Back to Slides Index