Chapter 30: Index Calculus#

30.1 Historical Background: Adleman’s Subexponential Breakthrough#

In 1979, Leonard Adleman introduced the index calculus method for solving the discrete logarithm problem (DLP) in the multiplicative group \((\mathbb{Z}/p\mathbb{Z})^*\) of a prime field. The problem is: given a generator \(g\), a prime \(p\), and a target \(h \equiv g^x \pmod{p}\), find the unknown exponent \(x\).

Prior to Adleman’s work, the best generic algorithms for the DLP – such as baby-step/giant-step (Shanks, 1971) and Pollard’s rho (1978) – ran in \(O(\sqrt{p})\) time, which is fully exponential in the bit-length of \(p\). Adleman’s key insight was that the multiplicative group of a prime field has additional structure that generic group algorithms ignore: every element can be lifted to the integers and factored.

The index calculus strategy has three phases:

  1. Factor base selection. Choose a set \(\mathcal{B} = \{p_1, p_2, \ldots, p_k\}\) of small primes.

  2. Relation collection. Compute powers \(g^r \bmod p\) for random exponents \(r\) and keep those that are \(\mathcal{B}\)-smooth (factorable entirely over \(\mathcal{B}\)). Each such power yields a linear equation in the unknown discrete logarithms \(\log_g p_i \pmod{p-1}\).

  3. Individual logarithm. Express \(h \cdot g^s \bmod p\) as a \(\mathcal{B}\)-smooth number for some offset \(s\), then combine with the solved logarithms from phase 2.

This approach achieves subexponential running time, making it dramatically faster than any generic method for large primes. It is the foundational idea behind all modern attacks on the DLP in prime fields, including the number field sieve variants used in practice today.

30.2 Core Definitions#

30.3 Helper Functions: Smoothness Testing and Modular Arithmetic#

import numpy as np

def sieve_primes(B):
    """Return all primes up to B using the Sieve of Eratosthenes."""
    if B < 2:
        return np.array([], dtype=np.int64)
    is_prime = np.ones(B + 1, dtype=bool)
    is_prime[0] = is_prime[1] = False
    for i in range(2, int(np.sqrt(B)) + 1):
        if is_prime[i]:
            is_prime[i*i::i] = False
    return np.where(is_prime)[0].astype(np.int64)

# Quick test
primes_30 = sieve_primes(30)
print(f"Primes up to 30: {primes_30}")
print(f"Count: {len(primes_30)}")
Primes up to 30: [ 2  3  5  7 11 13 17 19 23 29]
Count: 10
import numpy as np

def is_smooth(n, factor_base):
    """
    Test whether n is smooth over the given factor base.
    
    Returns (True, exponents) if n factors completely over factor_base,
    or (False, None) otherwise.
    """
    if n <= 0:
        return False, None
    exponents = np.zeros(len(factor_base), dtype=np.int64)
    remaining = int(n)
    for i, p in enumerate(factor_base):
        p = int(p)
        while remaining % p == 0:
            exponents[i] += 1
            remaining //= p
    if remaining == 1:
        return True, exponents
    return False, None

# Test: 360 = 2^3 * 3^2 * 5
fb = sieve_primes(10)  # [2, 3, 5, 7]
smooth, exp = is_smooth(360, fb)
print(f"Factor base: {fb}")
print(f"360 is smooth: {smooth}, exponents: {exp}")
print(f"Verification: {np.prod(fb ** exp)} == 360")

# Test a non-smooth number: 77 = 7 * 11
smooth2, _ = is_smooth(77, fb)
print(f"77 is smooth over {{2,3,5,7}}: {smooth2}")
Factor base: [2 3 5 7]
360 is smooth: True, exponents: [3 2 1 0]
Verification: 360 == 360
77 is smooth over {2,3,5,7}: False
import numpy as np
import math

def extended_gcd(a, b):
    """Extended Euclidean algorithm. Returns (g, x, y) with a*x + b*y = g."""
    if a == 0:
        return b, 0, 1
    g, x1, y1 = extended_gcd(b % a, a)
    return g, y1 - (b // a) * x1, x1

def mod_inverse(a, m):
    """Compute modular inverse of a mod m, or None if it does not exist."""
    a = a % m
    g, x, _ = extended_gcd(a, m)
    if g != 1:
        return None  # No inverse exists
    return x % m

def gauss_elim_mod(A, b, m):
    """
    Solve A @ x = b (mod m) using Gaussian elimination with modular arithmetic.
    
    Parameters
    ----------
    A : ndarray of shape (n_eq, n_var), integer entries
    b : ndarray of shape (n_eq,), integer entries
    m : modulus (typically p - 1)
    
    Returns
    -------
    x : ndarray of shape (n_var,) -- a solution mod m, or None if no solution.
    """
    n_eq, n_var = A.shape
    # Build augmented matrix [A | b]
    aug = np.zeros((n_eq, n_var + 1), dtype=np.int64)
    aug[:, :n_var] = A % m
    aug[:, n_var] = b % m
    
    pivot_row = 0
    pivot_cols = []
    
    for col in range(n_var):
        if pivot_row >= n_eq:
            break
        # Find a row with an invertible entry in this column
        found = False
        for row in range(pivot_row, n_eq):
            val = int(aug[row, col] % m)
            inv = mod_inverse(val, m)
            if inv is not None:
                # Swap rows
                aug[[pivot_row, row]] = aug[[row, pivot_row]]
                # Scale pivot row
                aug[pivot_row] = (aug[pivot_row] * inv) % m
                # Eliminate column in all other rows
                for r in range(n_eq):
                    if r != pivot_row and aug[r, col] != 0:
                        factor = int(aug[r, col])
                        aug[r] = (aug[r] - factor * aug[pivot_row]) % m
                pivot_cols.append(col)
                pivot_row += 1
                found = True
                break
        # If no invertible pivot found, skip this column
    
    # Extract solution for pivot variables
    x = np.zeros(n_var, dtype=np.int64)
    for i, col in enumerate(pivot_cols):
        x[col] = int(aug[i, n_var] % m)
    
    # Verify solution
    residual = (A @ x - b) % m
    if not np.all(residual == 0):
        return None
    return x

# Quick test: solve 3x = 6 (mod 10) => x = 2
A_test = np.array([[3]], dtype=np.int64)
b_test = np.array([6], dtype=np.int64)
sol = gauss_elim_mod(A_test, b_test, 10)
print(f"Solution of 3x = 6 (mod 10): x = {sol}")
print(f"Check: 3 * {sol[0]} mod 10 = {(3 * sol[0]) % 10}")
Solution of 3x = 6 (mod 10): x = [2]
Check: 3 * 2 mod 10 = 6

Implementation Note

The modular Gaussian elimination must handle the fact that \(p - 1\) is not prime in general. When the modulus is composite, not every nonzero element has a modular inverse. Our implementation skips columns where no invertible pivot is found, which means some variables may remain undetermined. In practice, collecting extra relations (more than the factor base size) greatly increases the chance of obtaining a full-rank system.

30.4 The Index#

Calculus Class

import numpy as np
import math

class IndexCalculus:
    """
    Index calculus algorithm for the discrete logarithm problem
    in (Z/pZ)*.
    
    Given g (generator), p (prime), and h = g^x mod p,
    find x.
    """
    
    def __init__(self, g, p, bound=None):
        """
        Parameters
        ----------
        g : int -- generator of (Z/pZ)*
        p : int -- prime modulus
        bound : int or None -- smoothness bound (auto-selected if None)
        """
        self.g = int(g)
        self.p = int(p)
        self.order = self.p - 1  # order of the group
        
        if bound is None:
            # Heuristic: L_p[1/2] ~ exp(sqrt(ln p * ln ln p))
            ln_p = math.log(self.p)
            bound = max(10, int(math.exp(0.5 * math.sqrt(ln_p * math.log(ln_p)))))
        self.bound = bound
        self.factor_base = self._sieve_primes(self.bound)
        self.k = len(self.factor_base)  # factor base size
        self.logs = None  # discrete logs of factor base elements
        self.relations = []  # collected relations
    
    @staticmethod
    def _sieve_primes(B):
        """Sieve of Eratosthenes up to B."""
        if B < 2:
            return np.array([], dtype=np.int64)
        is_prime = np.ones(B + 1, dtype=bool)
        is_prime[0] = is_prime[1] = False
        for i in range(2, int(np.sqrt(B)) + 1):
            if is_prime[i]:
                is_prime[i*i::i] = False
        return np.where(is_prime)[0].astype(np.int64)
    
    @staticmethod
    def _is_smooth(n, factor_base):
        """Test smoothness and return exponent vector."""
        if n <= 0:
            return False, None
        exponents = np.zeros(len(factor_base), dtype=np.int64)
        remaining = int(n)
        for i, p in enumerate(factor_base):
            p = int(p)
            while remaining % p == 0:
                exponents[i] += 1
                remaining //= p
        if remaining == 1:
            return True, exponents
        return False, None
    
    def collect_relations(self, max_tries=None, verbose=False):
        """
        Phase 1: Collect B-smooth relations g^r mod p.
        
        We need at least k relations (one per factor base element).
        We collect extra relations for robustness.
        """
        target = self.k + 10  # collect extra relations
        if max_tries is None:
            max_tries = target * 200
        
        self.relations = []
        rng = np.random.default_rng(42)
        
        for _ in range(max_tries):
            r = int(rng.integers(1, self.order))
            val = pow(self.g, r, self.p)
            smooth, exps = self._is_smooth(val, self.factor_base)
            if smooth:
                self.relations.append((r, exps))
                if verbose and len(self.relations) % 5 == 0:
                    print(f"  Collected {len(self.relations)}/{target} relations")
                if len(self.relations) >= target:
                    break
        
        if verbose:
            print(f"  Total relations collected: {len(self.relations)} "
                  f"(need >= {self.k})")
        return len(self.relations)
    
    def solve_system(self, verbose=False):
        """
        Phase 2: Solve the linear system mod (p-1) to find
        log_g(p_i) for each prime p_i in the factor base.
        """
        if len(self.relations) < self.k:
            raise ValueError(
                f"Not enough relations: have {len(self.relations)}, "
                f"need >= {self.k}")
        
        n_rel = len(self.relations)
        A = np.zeros((n_rel, self.k), dtype=np.int64)
        b = np.zeros(n_rel, dtype=np.int64)
        
        for i, (r, exps) in enumerate(self.relations):
            A[i, :] = exps
            b[i] = r
        
        self.relation_matrix = A  # store for visualization
        self.logs = gauss_elim_mod(A, b, self.order)
        
        if self.logs is None:
            if verbose:
                print("  System solve failed -- try more relations.")
            return False
        
        if verbose:
            for i in range(min(5, self.k)):
                pi = int(self.factor_base[i])
                li = int(self.logs[i])
                check = pow(self.g, li, self.p)
                print(f"  log_g({pi}) = {li}  "
                      f"(verify: g^{li} mod p = {check}, "
                      f"match: {check == pi})")
        return True
    
    def compute_log(self, h, max_tries=None, verbose=False):
        """
        Phase 3: Compute log_g(h) using the precomputed factor base logs.
        
        Find offset s such that h * g^s mod p is B-smooth, then
        log_g(h) = sum(e_i * log_g(p_i)) - s  (mod p-1).
        """
        if self.logs is None:
            raise ValueError("Must call solve_system() first.")
        
        if max_tries is None:
            max_tries = self.order
        
        h = int(h) % self.p
        
        for s in range(max_tries):
            val = (h * pow(self.g, s, self.p)) % self.p
            smooth, exps = self._is_smooth(val, self.factor_base)
            if smooth:
                # log_g(h) + s = sum(e_i * log_g(p_i)) mod (p-1)
                log_h = (int(np.dot(exps, self.logs)) - s) % self.order
                if verbose:
                    print(f"  Found smooth value at offset s = {s}")
                    print(f"  h * g^{s} mod p = {val}")
                    print(f"  Factorization exponents: {exps}")
                    print(f"  Computed log_g(h) = {log_h}")
                return int(log_h)
        
        raise RuntimeError(
            f"Failed to find smooth value in {max_tries} attempts.")

Algorithm Correctness

Each smooth relation \(g^r \equiv p_1^{e_1} \cdots p_k^{e_k} \pmod{p}\) translates into the linear congruence

\[ r \equiv e_1 \log_g p_1 + e_2 \log_g p_2 + \cdots + e_k \log_g p_k \pmod{p-1}.\]

With enough such relations (\(\geq k\)), the system is (over)determined and Gaussian elimination mod \(p-1\) recovers the individual logarithms. The correctness of Phase 3 follows from the identity \(\log_g(h \cdot g^s) = \log_g h + s\).

30.5 Experiment 1: Full Index Calculus on a ~20-bit Prime#

import numpy as np
import math

# Choose a ~20-bit prime and a generator
p = 1048583  # prime, approx 2^20
# Verify primality by trial division (sufficient for this size)
def is_prime(n):
    if n < 2:
        return False
    for d in range(2, int(math.isqrt(n)) + 1):
        if n % d == 0:
            return False
    return True

print(f"p = {p}, is_prime: {is_prime(p)}, bit-length: {p.bit_length()}")

# Find a primitive root (generator) by testing candidates
def find_generator(p):
    """Find a primitive root modulo p by checking order."""
    # Factor p-1
    n = p - 1
    factors = []
    temp = n
    for d in range(2, int(math.isqrt(temp)) + 1):
        if temp % d == 0:
            factors.append(d)
            while temp % d == 0:
                temp //= d
    if temp > 1:
        factors.append(temp)
    
    for g in range(2, p):
        is_gen = True
        for q in factors:
            if pow(g, n // q, p) == 1:
                is_gen = False
                break
        if is_gen:
            return g
    return None

g = find_generator(p)
print(f"Generator g = {g}")

# Set up the DLP instance
rng = np.random.default_rng(2024)
secret_x = int(rng.integers(2, p - 1))
h = pow(g, secret_x, p)
print(f"Secret x = {secret_x}")
print(f"h = g^x mod p = {h}")
p = 1048583, is_prime: True, bit-length: 21
Generator g = 5
Secret x = 253263
h = g^x mod p = 861121
import numpy as np

# Run the full index calculus attack
ic = IndexCalculus(g, p, bound=30)
print(f"Factor base (B={ic.bound}): {ic.factor_base}")
print(f"Factor base size k = {ic.k}")
print()

# Phase 1: Relation collection
print("=== Phase 1: Relation Collection ===")
n_rel = ic.collect_relations(verbose=True)
print(f"Collected {n_rel} relations.")
print()

# Phase 2: Solve for factor base logs
print("=== Phase 2: Linear Algebra ===")
success = ic.solve_system(verbose=True)
print(f"System solved: {success}")
print()

# Phase 3: Compute individual log
print("=== Phase 3: Individual Logarithm ===")
computed_x = ic.compute_log(h, verbose=True)
print()
print(f"Computed x = {computed_x}")
print(f"True x     = {secret_x}")
print(f"Correct: {computed_x == secret_x}")
print(f"Verification: g^(computed_x) mod p = {pow(g, computed_x, p)} == h = {h}")
Factor base (B=30): [ 2  3  5  7 11 13 17 19 23 29]
Factor base size k = 10

=== Phase 1: Relation Collection ===
  Collected 5/20 relations
  Collected 10/20 relations
  Collected 15/20 relations
  Collected 20/20 relations
  Total relations collected: 20 (need >= 10)
Collected 20 relations.

=== Phase 2: Linear Algebra ===
  log_g(2) = 810776  (verify: g^810776 mod p = 2, match: True)
  log_g(3) = 505750  (verify: g^505750 mod p = 3, match: True)
  log_g(5) = 1  (verify: g^1 mod p = 5, match: True)
  log_g(7) = 1011081  (verify: g^1011081 mod p = 7, match: True)
  log_g(11) = 920680  (verify: g^920680 mod p = 11, match: True)
System solved: True

=== Phase 3: Individual Logarithm ===
  Found smooth value at offset s = 126
  h * g^126 mod p = 1001832
  Factorization exponents: [3 1 0 0 0 3 0 1 0 0]
  Computed log_g(h) = 253263

Computed x = 253263
True x     = 253263
Correct: True
Verification: g^(computed_x) mod p = 861121 == h = 861121

Why It Works

The attack above solved a 20-bit DLP in well under a second. A brute-force search would examine up to \(\sim 10^6\) exponents; baby-step/giant-step would require \(\sim 10^3\) steps but \(O(\sqrt{p})\) memory. Index calculus needs only a modest number of smooth relations and a small linear system, making it particularly efficient as \(p\) grows.

30.6 Experiment 2: Relation Collection Analysis#

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Analyze how the number of random trials needed to collect k relations
# varies with factor base size.
p_test = 1048583
g_test = find_generator(p_test)

bounds = [10, 15, 20, 30, 40, 50, 70, 100]
fb_sizes = []
trials_needed = []
relations_found = []

for B in bounds:
    fb = sieve_primes(B)
    k = len(fb)
    fb_sizes.append(k)
    
    # Count how many trials to get k+5 relations
    target = k + 5
    count = 0
    found = 0
    rng = np.random.default_rng(123)
    
    for trial in range(200000):
        r = int(rng.integers(1, p_test - 1))
        val = pow(g_test, r, p_test)
        smooth, _ = is_smooth(val, fb)
        count += 1
        if smooth:
            found += 1
            if found >= target:
                break
    
    trials_needed.append(count)
    relations_found.append(found)
    print(f"B={int(B):3d}: |FB|={int(k):2d}, target={int(target):2d}, "
          f"trials={int(count):6d}, found={int(found):2d}")

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.plot(bounds, trials_needed, 'bo-', linewidth=2, markersize=8)
ax1.set_xlabel('Smoothness bound B', fontsize=12)
ax1.set_ylabel('Trials to collect k+5 relations', fontsize=12)
ax1.set_title('Trials Needed vs. Smoothness Bound', fontsize=13)
ax1.set_yscale('log')
ax1.grid(True, alpha=0.3)

ax2.plot(fb_sizes, trials_needed, 'rs-', linewidth=2, markersize=8)
ax2.set_xlabel('Factor base size k', fontsize=12)
ax2.set_ylabel('Trials to collect k+5 relations', fontsize=12)
ax2.set_title('Trials Needed vs. Factor Base Size', fontsize=13)
ax2.set_yscale('log')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('relation_collection_analysis.png', dpi=150, bbox_inches='tight')
plt.show()
print("\nAs B increases, more numbers are smooth, so fewer trials are needed.")
print("But k also increases, so the linear system grows. The optimal B balances both.")
B= 10: |FB|= 4, target= 9, trials=  5251, found= 9
B= 15: |FB|= 6, target=11, trials=  3688, found=11
B= 20: |FB|= 8, target=13, trials=  1518, found=13
B= 30: |FB|=10, target=15, trials=  1285, found=15
B= 40: |FB|=12, target=17, trials=  1214, found=17
B= 50: |FB|=15, target=20, trials=   801, found=20
B= 70: |FB|=19, target=24, trials=   557, found=24
B=100: |FB|=25, target=30, trials=   518, found=30
../_images/18b03c7a9ed0d6988376141ea243ad9f00737da3e92f4b44622cb67edc8a5e7b.png
As B increases, more numbers are smooth, so fewer trials are needed.
But k also increases, so the linear system grows. The optimal B balances both.

The Smoothness Trade-off

There is a fundamental tension in choosing the smoothness bound \(B\):

  • Larger \(B\): more numbers are \(B\)-smooth, so relation collection is faster. However, the factor base is larger, so the linear system has more unknowns and the linear algebra phase is slower.

  • Smaller \(B\): the linear system is compact, but smooth numbers are rarer, making relation collection the bottleneck.

The optimal balance yields the subexponential complexity \(L_p[1/2, c]\).

30.7 Experiment 3: Visualizing the Relation Matrix#

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt

# Use the relation matrix from our index calculus instance
A = ic.relation_matrix
n_rel, n_var = A.shape

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Left: sparsity pattern (nonzero entries)
spy_matrix = (A != 0).astype(int)
ax1.imshow(spy_matrix, cmap='Blues', aspect='auto', interpolation='nearest')
ax1.set_xlabel('Factor base index', fontsize=12)
ax1.set_ylabel('Relation index', fontsize=12)
ax1.set_title(f'Sparsity Pattern ({n_rel} x {n_var})\n'
              f'Nonzeros: {np.count_nonzero(A)} / {A.size} '
              f'({float(100*np.count_nonzero(A)/A.size):.1f}%)', fontsize=13)
ax1.set_xticks(range(n_var))
ax1.set_xticklabels([str(int(q)) for q in ic.factor_base], fontsize=9)

# Right: distribution of exponent values
nonzero_vals = A[A > 0]
if len(nonzero_vals) > 0:
    max_exp = int(nonzero_vals.max())
    bins = np.arange(1, max_exp + 2) - 0.5
    ax2.hist(nonzero_vals, bins=bins, color='steelblue',
             edgecolor='white', linewidth=0.8)
    ax2.set_xlabel('Exponent value', fontsize=12)
    ax2.set_ylabel('Frequency', fontsize=12)
    ax2.set_title('Distribution of Nonzero Exponents\nin Relation Matrix',
                  fontsize=13)
    ax2.set_xticks(range(1, max_exp + 1))
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('relation_matrix_sparsity.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"Matrix dimensions: {n_rel} relations x {n_var} variables")
print(f"Sparsity: {float(100*(1 - np.count_nonzero(A)/A.size)):.1f}% zeros")
print(f"Small primes (2, 3) appear more frequently because smooth numbers")
print(f"are dominated by small prime factors.")
../_images/e6e51417d094bb273604f6d820bf7f458892a2513d8cbd3d40b4c1c2922d1a77.png
Matrix dimensions: 20 relations x 10 variables
Sparsity: 63.0% zeros
Small primes (2, 3) appear more frequently because smooth numbers
are dominated by small prime factors.

30.8 Experiment 4: BSGS vs. Index Calculus vs. Brute Force#

import numpy as np
import math

def brute_force_log(g, h, p):
    """Solve g^x = h mod p by exhaustive search."""
    power = 1
    for x in range(p - 1):
        if power == h % p:
            return x
        power = (power * g) % p
    return None

def bsgs_log(g, h, p):
    """
    Baby-step/giant-step algorithm for discrete log.
    Solves g^x = h mod p in O(sqrt(p)) time and space.
    """
    n = p - 1  # group order
    m = int(math.isqrt(n)) + 1
    
    # Baby step: compute g^j mod p for j = 0, ..., m-1
    table = {}
    power = 1
    for j in range(m):
        table[power] = j
        power = (power * g) % p
    
    # Giant step: g^(-m) mod p
    g_inv_m = pow(g, n - m, p)  # g^(-m) = g^(n-m) mod p
    
    gamma = h % p
    for i in range(m):
        if gamma in table:
            x = (i * m + table[gamma]) % n
            return x
        gamma = (gamma * g_inv_m) % p
    return None

def index_calculus_log(g, h, p, bound=30):
    """Solve DLP using index calculus."""
    ic_local = IndexCalculus(g, p, bound=bound)
    ic_local.collect_relations()
    if not ic_local.solve_system():
        return None
    return ic_local.compute_log(h)

# Quick verification on a small prime
p_small = 1009  # small prime for testing
g_small = find_generator(p_small)
x_true = 456
h_small = pow(g_small, x_true, p_small)

x_bf = brute_force_log(g_small, h_small, p_small)
x_bs = bsgs_log(g_small, h_small, p_small)

print(f"p = {p_small}, g = {g_small}, h = g^{x_true} = {h_small}")
print(f"Brute force:  x = {x_bf}")
print(f"BSGS:         x = {x_bs}")
print(f"All correct: {pow(g_small, x_bf, p_small) == h_small and pow(g_small, x_bs, p_small) == h_small}")
p = 1009, g = 11, h = g^456 = 185
Brute force:  x = 456
BSGS:         x = 456
All correct: True
Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
import math
import time

# Timing comparison across different prime sizes
test_primes = [
    (1009, "~10-bit"),
    (10007, "~14-bit"),
    (100003, "~17-bit"),
    (1048583, "~20-bit"),
]

results = {'prime': [], 'bits': [], 'brute': [], 'bsgs': [], 'ic': []}

for p_val, label in test_primes:
    g_val = find_generator(p_val)
    rng = np.random.default_rng(99)
    x_val = int(rng.integers(2, p_val - 1))
    h_val = pow(g_val, x_val, p_val)
    bits = p_val.bit_length()
    
    results['prime'].append(p_val)
    results['bits'].append(bits)
    
    # Brute force (skip if too large)
    if p_val <= 200000:
        t0 = time.perf_counter()
        x_bf = brute_force_log(g_val, h_val, p_val)
        t_bf = time.perf_counter() - t0
        results['brute'].append(t_bf)
    else:
        results['brute'].append(None)
    
    # BSGS
    t0 = time.perf_counter()
    x_bs = bsgs_log(g_val, h_val, p_val)
    t_bs = time.perf_counter() - t0
    results['bsgs'].append(t_bs)
    
    # Index calculus
    t0 = time.perf_counter()
    x_ic = index_calculus_log(g_val, h_val, p_val, bound=30)
    t_ic = time.perf_counter() - t0
    results['ic'].append(t_ic)
    
    print(f"{label} (p={p_val}): "
          f"BF={results['brute'][-1]}, "
          f"BSGS={float(t_bs):.4f}s, "
          f"IC={float(t_ic):.4f}s")

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
bits_arr = np.array(results['bits'])

# Brute force (only for primes where it ran)
bf_mask = [t is not None for t in results['brute']]
bf_bits = bits_arr[bf_mask]
bf_times = [t for t in results['brute'] if t is not None]
if bf_times:
    ax.semilogy(bf_bits, bf_times, 'ro-', linewidth=2, markersize=10,
                label='Brute force $O(p)$')

ax.semilogy(bits_arr, results['bsgs'], 'bs-', linewidth=2, markersize=10,
            label=r'BSGS $O(\sqrt{p})$')
ax.semilogy(bits_arr, results['ic'], 'g^-', linewidth=2, markersize=10,
            label=r'Index calculus $L_p[1/2,c]$')

ax.set_xlabel('Bit-length of p', fontsize=13)
ax.set_ylabel('Time (seconds, log scale)', fontsize=13)
ax.set_title('DLP Solver Comparison: Brute Force vs. BSGS vs. Index Calculus',
             fontsize=14)
ax.legend(fontsize=12)
ax.grid(True, alpha=0.3)
ax.set_xticks(bits_arr)

plt.tight_layout()
plt.savefig('dlp_timing_comparison.png', dpi=150, bbox_inches='tight')
plt.show()
~10-bit (p=1009): BF=6.720796227455139e-05, BSGS=0.0000s, IC=0.0012s
~14-bit (p=10007): BF=0.0006700829835608602, BSGS=0.0000s, IC=0.0012s
~17-bit (p=100003): BF=0.006460999953560531, BSGS=0.0001s, IC=0.0022s
~20-bit (p=1048583): BF=None, BSGS=0.0002s, IC=0.0080s
../_images/638edd3d03e7bee63785b293ad9e4e449162b7fdb9c603f87c26198f81a97386.png

Interpreting the Timing Results

For these small primes, BSGS is typically the fastest algorithm due to low constant factors and the \(O(\sqrt{p})\) complexity. Index calculus incurs overhead from relation collection and linear algebra, which only becomes worthwhile for significantly larger primes (hundreds of bits and beyond).

The advantage of index calculus is asymptotic: its subexponential scaling \(L_p[1/2, c]\) eventually dominates the \(O(p^{1/2})\) scaling of BSGS. For cryptographic-size primes (\(\geq 1024\) bits), index calculus-family algorithms (including the number field sieve) are the only feasible attacks.

30.9 Experiment 5: Smoothness Probability#

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
import math

def estimate_smoothness_prob(p, B, n_samples=50000):
    """Estimate the probability that a random element of (Z/pZ)* is B-smooth."""
    fb = sieve_primes(B)
    rng = np.random.default_rng(42)
    count = 0
    for _ in range(n_samples):
        val = int(rng.integers(1, p))
        smooth, _ = is_smooth(val, fb)
        if smooth:
            count += 1
    return count / n_samples

def dickman_approx(u):
    """
    Rough approximation of the Dickman rho function rho(u).
    For u >= 1, rho(u) ~ u^(-u) gives the probability that
    a random integer n is n^(1/u)-smooth.
    """
    if u <= 1:
        return 1.0
    return u ** (-u)

p_smooth = 1048583
bounds_test = [5, 10, 15, 20, 30, 50, 70, 100, 150, 200]
empirical_probs = []
theoretical_probs = []

for B in bounds_test:
    prob = estimate_smoothness_prob(p_smooth, B, n_samples=30000)
    empirical_probs.append(prob)
    
    # u = log(p) / log(B)
    u = math.log(p_smooth) / math.log(max(B, 2))
    theoretical_probs.append(dickman_approx(u))
    
    print(f"B={int(B):4d}: empirical prob = {float(prob):.5f}, "
          f"Dickman approx rho({float(u):.2f}) = {float(dickman_approx(u)):.5f}")

fig, ax = plt.subplots(figsize=(10, 6))
ax.semilogy(bounds_test, empirical_probs, 'bo-', linewidth=2,
            markersize=8, label='Empirical probability')
ax.semilogy(bounds_test, theoretical_probs, 'r--', linewidth=2,
            label=r'Dickman $\rho(u) \approx u^{-u}$')
ax.set_xlabel('Smoothness bound B', fontsize=13)
ax.set_ylabel('Probability of B-smoothness', fontsize=13)
ax.set_title(f'Smoothness Probability for p = {p_smooth} (~20 bits)',
             fontsize=14)
ax.legend(fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('smoothness_probability.png', dpi=150, bbox_inches='tight')
plt.show()
B=   5: empirical prob = 0.00047, Dickman approx rho(8.61) = 0.00000
B=  10: empirical prob = 0.00140, Dickman approx rho(6.02) = 0.00002
B=  15: empirical prob = 0.00450, Dickman approx rho(5.12) = 0.00023
B=  20: empirical prob = 0.00900, Dickman approx rho(4.63) = 0.00083
B=  30: empirical prob = 0.01460, Dickman approx rho(4.08) = 0.00326
B=  50: empirical prob = 0.03293, Dickman approx rho(3.54) = 0.01130
B=  70: empirical prob = 0.04867, Dickman approx rho(3.26) = 0.02109
B= 100: empirical prob = 0.07217, Dickman approx rho(3.01) = 0.03624
B= 150: empirical prob = 0.11047, Dickman approx rho(2.77) = 0.05987
B= 200: empirical prob = 0.14580, Dickman approx rho(2.62) = 0.08073
../_images/f25412a6b20d02ebbb883ec8cddbb9077633008c6aa12e5295a22fcd19749b2a.png

The Dickman Function

The probability that a random integer \(n\) is \(n^{1/u}\)-smooth is given by the Dickman rho function \(\rho(u)\). For \(u = \ln p / \ln B\), the smoothness probability of random elements modulo \(p\) is approximately \(\rho(u)\). The crude approximation \(\rho(u) \approx u^{-u}\) already captures the qualitative behavior: smoothness probability decreases rapidly as the ratio \(\ln p / \ln B\) grows.

30.10 Exercises#


Exercise 30.1. Let \(p = 1009\) and \(g = 11\) (a generator of \((\mathbb{Z}/1009\mathbb{Z})^*\)). Use the factor base \(\mathcal{B} = \{2, 3, 5, 7\}\) and manually verify that \(g^1 = 11\), \(g^2 = 121\), \(g^3 = 1331 \bmod 1009 = 313\). Which of these powers are \(\mathcal{B}\)-smooth? Write out the corresponding linear relations modulo \(p - 1 = 1008\).

Exercise 30.2. Implement a function smoothness_ratio(p, B, N) that generates \(N\) random integers in \([1, p)\) and returns the fraction that are \(B\)-smooth. Plot the ratio as a function of \(B\) for \(p = 10^6 + 3\). At what value of \(B\) is approximately 10% of integers smooth?

Exercise 30.3. The modular Gaussian elimination in Section 30.3 can fail when the modulus \(m = p - 1\) is composite and a pivot element has no inverse modulo \(m\). Describe a concrete example where this happens (give specific values of \(p\), the factor base, and the relation matrix), and explain how collecting additional relations can help overcome this issue.

Exercise 30.4. Modify the IndexCalculus class to use sequential scanning instead of random exponents in the relation collection phase: test \(g^1, g^2, g^3, \ldots\) in order. Compare the number of smooth values found in the first 10000 powers using sequential vs. random scanning for \(p = 1048583\) and \(B = 30\). Which approach finds relations faster, and why?

Exercise 30.5. The subexponential complexity \(L_p[1/2, c]\) depends on the constant \(c\). For the basic index calculus with optimal smoothness bound, one has \(c = \sqrt{2}\). Compute \(L_p[1/2, \sqrt{2}]\) numerically for \(p\) of bit-lengths \(64, 128, 256, 512, 1024\). Plot these values on a log scale and compare with \(\sqrt{p}\) (the BSGS complexity) and \(p\) (brute-force complexity). At what bit-length does index calculus become faster than BSGS according to this estimate?

30.11 Summary and References#


In this chapter we studied the index calculus algorithm, the first subexponential method for the discrete logarithm problem in prime fields.

Key ideas:

  • The algorithm exploits the lifting of group elements to the integers, where factorization provides additional algebraic structure unavailable to generic group algorithms.

  • A factor base of small primes defines the notion of smoothness. Smooth powers of the generator yield linear relations among the discrete logarithms of the factor base elements.

  • Gaussian elimination modulo \(p - 1\) solves for the base logarithms; individual logarithms follow by expressing the target as a smooth product.

  • The smoothness bound \(B\) creates a trade-off: larger \(B\) makes smooth numbers more common but increases the linear system size. The optimal balance gives the celebrated \(L_p[1/2, c]\) complexity.

Historical significance:

  • Adleman’s 1979 algorithm was the first to break the \(O(\sqrt{p})\) barrier for DLP in prime fields.

  • It inspired a long line of improvements: the Coppersmith-Odlyzko-Schroeppel (COS) algorithm (1986), the Gaussian integer method, and ultimately the number field sieve for DLP, which achieves \(L_p[1/3, c]\) complexity.

  • These algorithms are the reason that Diffie-Hellman and ElGamal over \((\mathbb{Z}/p\mathbb{Z})^*\) require primes of at least 2048 bits for modern security.

References:

  1. L. M. Adleman, “A subexponential algorithm for the discrete logarithm problem with applications to cryptography,” Proceedings of the 20th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 55–60, 1979.

  2. A. M. Odlyzko, “Discrete logarithms: The past and the future,” Designs, Codes and Cryptography, vol. 19, pp. 129–145, 2000.

  3. D. Coppersmith, A. M. Odlyzko, and R. Schroeppel, “Discrete logarithms in \(GF(p)\),” Algorithmica, vol. 1, pp. 1–15, 1986.

  4. A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone, Handbook of Applied Cryptography, CRC Press, 1996. Chapter 3: Number-Theoretic Reference Problems.