Parts I–III:
Classical Ciphers & Cryptanalysis

Bartosz Naskręcki

Elements of Cryptanalysis — Chapters 1–9
Foundations • Polyalphabetic Ciphers • Polygraphic Ciphers

Part I: Foundations

Chapters 1–3

Formal definitions • Substitution ciphers • Frequency analysis

Chapter 1

Introduction to Cryptanalysis

Historical Milestones

  • ~850 ADAl-Kindi invents frequency analysis in Baghdad, breaking monoalphabetic substitution ciphers
  • 1883Kerckhoffs formulates six design principles for military ciphers: "security must reside in the key, not the algorithm"
  • 1949Shannon publishes Communication Theory of Secrecy Systems, formalizing perfect secrecy and the one-time pad
"The enemy knows the system being used."
— Claude Shannon (1949)

Formal Definition: Cryptosystem

Definition 1.1 — Cryptosystem
A cryptosystem is a 5-tuple \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) where:
  • \(\mathcal{P}\) — plaintext space,   \(\mathcal{C}\) — ciphertext space,   \(\mathcal{K}\) — key space
  • \(\mathcal{E} = \{E_k : \mathcal{P} \to \mathcal{C}\}\) — encryption functions
  • \(\mathcal{D} = \{D_k : \mathcal{C} \to \mathcal{P}\}\) — decryption functions
Correctness:   \(D_k(E_k(m)) = m\)   for all \(k \in \mathcal{K},\; m \in \mathcal{P}\).
Kerckhoffs's Principle
Security must hold even when \((\mathcal{P}, \mathcal{C}, \mathcal{K}, \mathcal{E}, \mathcal{D})\) are public. The only secret is the key \(k \in \mathcal{K}\).

Attack Taxonomy

Attack ModelAbbr.Adversary's Capabilities
Ciphertext-onlyCOAObserves ciphertexts \(c_1, c_2, \ldots\)
Known-plaintextKPAKnows some pairs \((m_i, c_i)\)
Chosen-plaintextCPACan choose \(m_i\), obtains \(c_i = E_k(m_i)\)
Chosen-ciphertextCCACan choose \(c_i\), obtains \(m_i = D_k(c_i)\)
Related-keyRKAEncryptions under keys related to \(k\)

COA \(\subset\) KPA \(\subset\) CPA \(\subset\) ACPA   (increasing adversary power)

Chapter 2

Permutations and Substitution Ciphers

Permutations as Cipher Keys

Definition 2.1 — Permutation
A permutation of \(\mathcal{A}\) with \(|\mathcal{A}| = n\) is a bijection \(\sigma: \mathcal{A} \to \mathcal{A}\).
Definition 2.2 — Symmetric Group \(S_n\)
The set of all permutations of \(\{0, 1, \ldots, n{-}1\}\) under composition forms the symmetric group \(S_n\) with \(|S_n| = n!\).
Shift Cipher (Caesar)
\(\sigma_k(x) = (x + k) \bmod n\)
Key space: \(n = 26\) keys
General Substitution
Key \(\sigma \in S_n\), arbitrary permutation
Key space: \(26! \approx 4 \times 10^{26}\)

Key Theorems on Substitution Ciphers

Theorem 2.1 — Keyspace Size
The number of distinct substitution ciphers over an alphabet of size \(n\) is \(n!\).
Theorem 2.2 — Shift Ciphers Form a Cyclic Subgroup
\(\{\sigma_0, \sigma_1, \ldots, \sigma_{n-1}\}\) is a cyclic subgroup of \(S_n\) of order \(n\), isomorphic to \(\mathbb{Z}_n\).

Proof: \(\sigma_j \circ \sigma_k = \sigma_{j+k \bmod n}\),  identity \(\sigma_0\),  inverse \(\sigma_k^{-1} = \sigma_{n-k}\),  generator \(\sigma_1\).

Theorem 2.3 — Involutions
A substitution cipher is self-invertible iff \(\sigma^2 = \mathrm{id}\) (involution).
For \(n = 26\): only \(k = 0\) and \(k = 13\) (ROT13) are self-invertible shifts.

The Fundamental Weakness

Critical Vulnerability
Keyspace size \(\neq\) security.

Although \(26! \approx 4 \times 10^{26}\) keys make brute force impossible, substitution ciphers preserve letter frequencies:

\(\sigma\) relabels letters but does not alter their frequency distribution.

Al-Kindi's insight (850 AD): compare ciphertext frequencies to known language frequencies to recover \(\sigma\).

This motivates Chapter 3: formal frequency analysis.

Chapter 3

Frequency Analysis

Letter Frequencies & Distributions

Definition — Frequency Distribution
For text \(T\) of length \(N\), the frequency distribution is \[ \mathbf{F}(T) = \bigl(f(a_1), f(a_2), \ldots, f(a_n)\bigr) \in \Delta^{n-1} \] where \(f(c) = \mathrm{count}(c, T) / N\) and \(\Delta^{n-1}\) is the probability simplex.
English "ETAOINSHRDLU" frequencies:
ETAOINSHRDLU
12.7%9.1%8.2%7.5%7.0%6.8%6.3%6.1%6.0%4.3%4.0%2.8%

Breaking the Shift Cipher

Theorem — Shift Invariance of Frequencies
If \(C = \mathrm{Shift}_k(T)\), then \[ \mathbf{F}(C) = (f_{(0-k) \bmod n},\; f_{(1-k) \bmod n},\; \ldots,\; f_{(n-1-k) \bmod n}) \] The ciphertext frequency distribution is a cyclic shift of the plaintext distribution.
Key Recovery via Chi-Squared
\[ \hat{k} = \arg\min_{s \in \{0,\ldots,25\}} \chi^2\!\left(\mathrm{Shift}_{-s}(\mathbf{F}(C)),\; \mathbf{F}_{\mathrm{eng}}\right) \] where \(\displaystyle\chi^2(\mathbf{p}, \mathbf{q}) = \sum_{i=0}^{25} \frac{(p_i - q_i)^2}{q_i}\).

Only 26 candidates to test — an \(O(n)\) attack regardless of message length.

Text Length and Reliability

  • < 50 characters: frequency analysis unreliable (insufficient data)
  • ~100 characters: success rate reaches ~80%
  • 200+ characters: success rate > 95%
  • 500+ characters: nearly guaranteed correct key recovery
Limitations
Frequency analysis is powerful against monoalphabetic ciphers but is defeated by polyalphabetic ciphers (Vigenere), which flatten the frequency distribution by using multiple substitution alphabets.

Part II: Classical Polyalphabetic Ciphers

Chapters 4–6

Monoalphabetic cryptanalysis • Vigenère & Kasiski • Index of Coincidence

Chapter 4

Monoalphabetic Cryptanalysis

Digram Analysis

Definition — Digrams & Digram Frequency
A digram is a consecutive pair \(t_i t_{i+1}\). The digram frequency is: \[ f_{ab}(t) = \frac{\#\{i : t_i = a,\; t_{i+1} = b\}}{\ell - 1} \]
Theorem 4.1 — Preservation of Digram Structure
For a monoalphabetic cipher with key \(\sigma\): \[ f_{\sigma(a)\sigma(b)}(c) = f_{ab}(m) \] The digram frequency matrix of the ciphertext is a permuted version of the plaintext digram matrix. This extends to \(n\)-grams of any order.

Top English digrams: TH (3.6%), HE (3.1%), IN (2.4%), ER (2.1%), AN (2.0%)

The Frequency Matching Attack

Algorithm 4.1 — Key Recovery
  1. Single-letter matching: Sort ciphertext letters by frequency; map to sorted English frequencies (E, T, A, O, I, ...)
  2. Digram refinement: Check the most frequent ciphertext digram — it should map to TH or HE. Resolve ambiguities using the TH/HT asymmetry.
  3. Greedy swap improvement: Try all pairwise swaps in the candidate key; keep swaps that improve a scoring metric.

Effective above:
200–500 characters (letter matching alone)
100–200 characters (with digram refinement)

Key insight:
Despite \(26! \approx 4 \times 10^{26}\) keys, statistical structure reduces the effective search to manageable size.

Chapter 5

The Vigenère Cipher & Kasiski Examination

The Vigenère Cipher

Definition 5.1 — Vigenère Cipher
Key: \(\mathbf{k} = (k_0, k_1, \ldots, k_{L-1}) \in \mathbb{Z}_{26}^L\) of length \(L\). \[ c_i = (m_i + k_{i \bmod L}) \bmod 26 \] \[ m_i = (c_i - k_{i \bmod L}) \bmod 26 \]
  • Polyalphabetic: same plaintext letter maps to different ciphertext letters depending on position
  • Flattens frequencies: the ciphertext distribution approaches uniformity, defeating single-letter frequency analysis
  • Considered "le chiffre indéchiffrable" for ~300 years (1553–1863)

The Kasiski Examination

Theorem 5.1 — Kasiski's Observation
If a plaintext trigram repeats at positions \(i\) and \(j\) with \(L \mid (j - i)\), then the ciphertext trigrams are identical: \[ c_i c_{i+1} c_{i+2} = c_j c_{j+1} c_{j+2} \]
Algorithm — Kasiski Examination
  1. Find all repeated trigrams in the ciphertext
  2. Compute the distances between their occurrences
  3. Take the GCD of all distances → estimate of key length \(L\)

First discovered by Babbage (~1846, unpublished) and independently by Kasiski (1863).

Complete Vigenère Attack Pipeline

Two-Stage Ciphertext-Only Attack

Stage 1: Key length \(L\)

  • Kasiski: GCD of repeated trigram distances
  • Or: IoC column method (Chapter 6)

Stage 2: Key recovery

  • Split ciphertext into \(L\) streams
  • Stream \(j\): \(c_j, c_{j+L}, c_{j+2L}, \ldots\)
  • Each stream is a Caesar cipher!
  • Apply \(\chi^2\) to recover each \(k_j\)
Limitation
Longer keys produce fewer repeated trigrams. As key length \(L \to N\), the cipher approaches a one-time pad (perfect secrecy).

Chapter 6

Index of Coincidence

The Index of Coincidence

Definition 6.1 — Index of Coincidence (Friedman, 1922)
For text \(x\) of length \(N\) with letter counts \(f_0, \ldots, f_{25}\): \[ \mathrm{IC}(x) = \frac{\displaystyle\sum_{i=0}^{25} f_i(f_i - 1)}{N(N-1)} \] This is the probability that two randomly chosen letters are identical.
English plaintext
\[\kappa_p = \sum_{i=0}^{25} p_i^2 \approx 0.0667\]
Random text (uniform)
\[\kappa_r = \frac{1}{26} \approx 0.0385\]

The gap \(\kappa_p - \kappa_r \approx 0.028\) is the engine of the IoC method.

Friedman's Theorem & Key Length Formula

Theorem 6.1 — Friedman (1922)
For a polyalphabetic cipher with key length \(L\): \[ \mathrm{IC}_{\text{expected}} \approx \frac{\kappa_p - \kappa_r}{L} + \kappa_r = \frac{1}{L}(\kappa_p - \kappa_r) + \kappa_r \]

Proof sketch: With probability \(1/L\), two positions share the same key letter (coincidence rate \(\kappa_p\)); with probability \((L{-}1)/L\), different key letters (coincidence rate \(\kappa_r\)).

Corollary — Key Length Estimation
\[ L \approx \frac{\kappa_p - \kappa_r}{\mathrm{IC}_{\text{obs}} - \kappa_r} = \frac{0.0282}{\mathrm{IC}_{\text{obs}} - 0.0385} \]

IoC: Key Properties & Comparisons

Invariance
The IoC is invariant under monoalphabetic substitution: any permutation of the alphabet merely relabels the counts \(\{f_i\}\), and the IoC formula depends only on the multiset of counts.

Kasiski Examination

  • Finds repeated \(n\)-grams
  • Exact divisor information
  • Needs longer ciphertext
  • Discrete (GCD-based)

Index of Coincidence

  • Global statistical measure
  • Works on shorter texts
  • No repeated \(n\)-grams needed
  • Continuous (numerical score)

In practice, the two methods complement each other.

Part III: Polygraphic Ciphers

Chapters 7–9

Hill cipher • Playfair cipher • Automated cryptanalysis

Chapter 7

The Hill Cipher

Matrix Encryption over \(\mathbb{Z}_{26}\)

Definition 7.1 — Hill Cipher
Block size \(n\), key: invertible \(K \in \mathrm{GL}(n, \mathbb{Z}_{26})\). \[ \mathbf{c} = K \cdot \mathbf{m} \pmod{26}, \qquad \mathbf{m} = K^{-1} \cdot \mathbf{c} \pmod{26} \]
Invertibility Criterion
\(K\) is invertible mod 26 \(\iff\) \(\gcd(\det(K), 26) = 1\).

Since \(26 = 2 \times 13\), the determinant must be coprime to both 2 and 13.
Valid values: \(\det(K) \bmod 26 \in \{1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, 25\}\) — 12 out of 26.

Modular Inverse
\(K^{-1} = (\det K)^{-1} \cdot \mathrm{adj}(K) \pmod{26}\)   via the extended Euclidean algorithm.

Strengths and Fatal Weakness

Theorem 7.3 — Frequency Analysis Resistance
The Hill cipher with \(n \geq 2\) destroys single-letter frequencies: the IoC of ciphertext approaches \(\kappa_r \approx 0.0385\).
Theorem 7.2 — Known-Plaintext Attack
Given \(n\) known plaintext-ciphertext block pairs, if \(M = [\mathbf{m}_1 | \cdots | \mathbf{m}_n]\) is invertible mod 26: \[ K = C \cdot M^{-1} \pmod{26} \]

Completely broken with just \(n\) known pairs (e.g., \(n = 2\) for a \(2 \times 2\) key).

Design Lesson
Algebraic structure that makes a cipher elegant also makes it breakable.
Linearity enables efficient encryption and efficient cryptanalysis.
Modern ciphers (AES) combine linear and nonlinear operations.

Chapter 8

The Playfair Cipher

The 5×5 Grid and Encryption Rules

Definition 8.1 — Playfair Grid
A 5×5 matrix of 25 letters (I/J merged), constructed from a keyword.
Example with keyword MONARCHY:
M O N A R
C H Y B D
E F G I K
L P Q S T
U V W X Z
Three Encryption Rules (for digram \((a, b)\), \(a \neq b\))
  1. Same row: each letter shifts one position right (wrap around)
  2. Same column: each letter shifts one position down (wrap around)
  3. Rectangle: swap columns — each letter moves to the other corner of its row

Decryption reverses: shift left, shift up, same rectangle swap (self-inverse).

Cryptanalysis of Playfair

Theorem 8.1 — Digram Permutation
For a fixed grid \(G\), the Playfair encryption \(E_G\) is a permutation on the 600 ordered digrams \(\{(a, b) \in \mathcal{A}^2 : a \neq b\}\).

Consequence: the digram frequency distribution is preserved (merely relabelled).

Vulnerability

  • Most frequent ciphertext digrams correspond to TH, HE, IN, ER, AN
  • Key space: \(25! \approx 1.55 \times 10^{25}\) (but keyword construction reduces it)

Hill-Climbing Attack

  1. Start with random 5×5 grid
  2. Decrypt, score with quadgram log-probabilities
  3. Perturb grid (swap letters/rows/cols)
  4. Keep improvements; repeat
  5. Multiple restarts to escape local optima

Chapter 9

Automated Cryptanalysis:
Hill Climbing with N-grams

N-gram Scoring

Definition 9.1 — N-gram
An \(n\)-gram is a contiguous sequence of \(n\) characters: \(g_i = x_i x_{i+1} \cdots x_{i+n-1}\).
Definition 9.3 — Log-Likelihood Score
\[ \text{score}(x) = \sum_{i=1}^{N-n+1} \log_{10} F(x_i x_{i+1} \cdots x_{i+n-1}) \] where \(F(g)\) is the frequency of \(n\)-gram \(g\) in a reference corpus.
Floor probability for unseen \(n\)-grams: \(F_{\text{floor}} = 0.01 / (M - n + 1)\).
ModelPossible \(n\)-gramsEntropy rateDiscrimination
Unigram26~4.2 bits/charLow
Bigram676~3.6 bits/charMedium
Trigram17,576~3.1 bits/charGood
Quadgram456,976~2.8 bits/charExcellent

Hill Climbing Algorithm

Definition 9.4 — Hill Climbing for Cryptanalysis
  1. Initialize: Choose a random key \(k_0 \in \mathcal{K}\)
  2. Evaluate: Decrypt with \(k_0\), compute \(\text{score}(D_{k_0}(c))\)
  3. Perturb: Generate neighbor \(k'\) via small random modification
  4. Accept/Reject: If \(\text{score}(D_{k'}(c)) > \text{score}(D_{k_0}(c))\), set \(k_0 \leftarrow k'\)
  5. Repeat steps 3–4 for many iterations
  6. Restart from a new random key; keep the best result across restarts
Theorem 9.1 — Expected Restarts
If \(p\) = probability a single run finds the global optimum, then: \[ E[\text{restarts}] = \frac{1}{p}, \qquad P(\text{found in } k \text{ runs}) = 1 - (1-p)^k \]

For substitution ciphers with quadgram scoring on 200+ chars: \(p \approx 0.05\text{--}0.15\), so ~7–20 restarts suffice.

Generality of Automated Cryptanalysis

Substitution Cipher
  • Key: permutation of 26 letters
  • Perturbation: swap two letters
  • Effective for 200+ characters
Playfair Cipher
  • Key: 5×5 grid (25 letters)
  • Perturbation: swap grid entries
  • Effective for 200+ characters
Key Insight
The same framework attacks different cipher types by changing only the key representation and perturbation strategy. The n-gram scoring function remains unchanged — it measures "how English" the candidate decryption looks.
Extension: Simulated Annealing
Accept worse solutions with probability \(e^{\Delta / T}\), where \(\Delta = \text{score}_{\text{new}} - \text{score}_{\text{old}}\) and temperature \(T\) decreases over time. This helps escape local optima without requiring restarts.

Parts I–III: Summary of Key Concepts

ChapterCore ConceptKey Result
1Cryptosystem formalism5-tuple definition, Kerckhoffs's principle, attack taxonomy
2Substitution = permutation\(|S_{26}| = 26! \approx 4 \times 10^{26}\), shifts form \(\mathbb{Z}_n \leq S_n\)
3Frequency analysisShift invariance of distributions, \(\chi^2\) key recovery
4Digram analysisTheorem 4.1: monoalphabetic ciphers preserve digram structure
5Vigenère & KasiskiRepeated trigrams reveal key length; streams are Caesar ciphers
6Index of CoincidenceFriedman: \(\mathrm{IC} \approx (\kappa_p - \kappa_r)/L + \kappa_r\)
7Hill cipherMatrix encryption; broken by \(n\) known-plaintext pairs
8Playfair cipherDigram permutation on 600 pairs; digram frequencies preserved
9Automated cryptanalysisQuadgram scoring + hill climbing: general-purpose attack

Recurring Themes

  • Large keyspace ≠ security — statistical structure leaks through substitution (\(26!\) keys, but frequency analysis breaks it)
  • Polyalphabetic flattening — Vigenère defeats letter-frequency analysis, but period detection (Kasiski, IoC) reduces it to monoalphabetic subproblems
  • Polygraphic mixing — Hill and Playfair destroy single-letter statistics, but linearity (Hill) and digram preservation (Playfair) create new vulnerabilities
  • Linearity is dangerous — the Hill cipher's algebraic structure enables both efficient encryption and efficient cryptanalysis; modern ciphers require nonlinearity (S-boxes)
  • General-purpose attacks — n-gram scoring with hill climbing transcends cipher-specific methods, foreshadowing the computational approach to modern cryptanalysis

End of Parts I–III

Next: Parts IV–V — Enigma & Information Theory (Chapters 10–15)