Table of Contents

1. Sieve of Eratosthenes & Variations 2. GCD, LCM & Extended Euclidean Algorithm 3. Modular Inverse (All Methods) 4. Binary Exponentiation 5. Fermat's Little Theorem 6. Euler's Totient Function & Theorem 7. Chinese Remainder Theorem 8. Combinatorics for CP 9. Inclusion-Exclusion Principle 10. Catalan Numbers 11. Matrix Exponentiation 12. Mobius Function & Inversion 13. FFT & Polynomial Multiplication 14. Practice Problems

1. Sieve of Eratosthenes & Variations

The sieve is the bread and butter of number theory in CP. If a problem mentions primes and n ≤ 107, you almost certainly need a sieve.

Classic Sieve of Eratosthenes

Idea: Mark all composite numbers by iterating through each prime and crossing off its multiples. We only need to check up to √n because any composite number ≤ n has a prime factor ≤ √n.

Time: O(n log log n)  |  Space: O(n)
C++
vector<bool> sieve(int n) {
    vector<bool> is_prime(n + 1, true);
    is_prime[0] = is_prime[1] = false;
    for (int i = 2; i * i <= n; i++) {
        if (is_prime[i]) {
            for (int j = i * i; j <= n; j += i)
                is_prime[j] = false;
        }
    }
    return is_prime;
}
Python
def sieve(n):
    is_prime = [True] * (n + 1)
    is_prime[0] = is_prime[1] = False
    i = 2
    while i * i <= n:
        if is_prime[i]:
            for j in range(i * i, n + 1, i):
                is_prime[j] = False
        i += 1
    return is_prime

Linear Sieve (O(n))

Why it matters: The classic sieve marks some composites multiple times (e.g., 12 is crossed off by both 2 and 3). The linear sieve ensures each composite is marked exactly once by its smallest prime factor. This also gives you the smallest prime factor (SPF) array for free -- extremely useful for fast factorization.

C++
const int MAXN = 10000001;
int spf[MAXN]; // smallest prime factor
vector<int> primes;

void linear_sieve() {
    for (int i = 2; i < MAXN; i++) {
        if (spf[i] == 0) { // i is prime
            spf[i] = i;
            primes.push_back(i);
        }
        for (int j = 0; j < primes.size() && primes[j] <= spf[i] && (long long)i * primes[j] < MAXN; j++) {
            spf[i * primes[j]] = primes[j];
        }
    }
}
Python
def linear_sieve(n):
    spf = [0] * (n + 1)
    primes = []
    for i in range(2, n + 1):
        if spf[i] == 0:
            spf[i] = i
            primes.append(i)
        for p in primes:
            if p > spf[i] or i * p > n:
                break
            spf[i * p] = p
    return spf, primes
Fast Factorization with SPF

Once you have the SPF array, you can factorize any number up to n in O(log n) by repeatedly dividing by its smallest prime factor. This is a game-changer when you need to factorize many numbers.

C++
// Factorize x using the SPF array -- O(log x)
map<int, int> factorize(int x) {
    map<int, int> factors;
    while (x > 1) {
        factors[spf[x]]++;
        x /= spf[x];
    }
    return factors;
}

Segmented Sieve

When to use: You need primes in a range [L, R] where R can be up to 1012, but R - L is small (say ≤ 106). You cannot allocate an array of size 1012, so you sieve only the segment [L, R].

Idea: First, sieve all primes up to √R using the normal sieve. Then, for each such prime p, mark all its multiples in the range [L, R].

C++
vector<bool> segmented_sieve(long long L, long long R) {
    int lim = sqrt((double)R) + 1;
    auto small_primes = sieve(lim); // basic sieve up to sqrt(R)

    vector<bool> is_prime(R - L + 1, true);
    if (L == 0) is_prime[0] = false;
    if (L <= 1) is_prime[1 - L] = false;

    for (int p = 2; p <= lim; p++) {
        if (!small_primes[p]) continue;
        // Find first multiple of p >= L
        long long start = max((long long)p * p, ((L + p - 1) / p) * p);
        for (long long j = start; j <= R; j += p)
            is_prime[j - L] = false;
    }
    return is_prime;
}
Watch Out

In the segmented sieve, be careful with L = 0 or L = 1. Also ensure your starting multiple calculation does not overflow -- use long long throughout.

2. GCD, LCM & Extended Euclidean Algorithm

Euclidean Algorithm

Intuition: gcd(a, b) = gcd(b, a mod b). Why? If d divides both a and b, it also divides a - kb = a mod b. So the set of common divisors stays the same, and we keep reducing until one side hits 0.

gcd(a, 0) = a   |   gcd(a, b) = gcd(b, a mod b)   |   Time: O(log(min(a, b)))
C++
// C++17 has __gcd and std::gcd, but know how to write it
int gcd(int a, int b) {
    while (b) {
        a %= b;
        swap(a, b);
    }
    return a;
}

long long lcm(long long a, long long b) {
    return a / gcd(a, b) * b; // divide first to avoid overflow
}

Extended Euclidean Algorithm

What it does: Given a and b, find integers x and y such that ax + by = gcd(a, b). This is called Bezout's identity, and it always has a solution.

Why it matters: This gives us the modular inverse. If gcd(a, m) = 1, then ax + my = 1, which means ax ≡ 1 (mod m), so x is the modular inverse of a mod m.

Walkthrough: extgcd(30, 21)

gcd(30, 21) → gcd(21, 9) → gcd(9, 3) → gcd(3, 0) = 3

Backtrack: 3 = 9 - 1*3 = 9 - 1*(21 - 2*9) = 3*9 - 1*21 = 3*(30 - 1*21) - 1*21 = 3*30 - 4*21

So x = 3, y = -4, and indeed 3*30 + (-4)*21 = 90 - 84 = 6... wait, that is not right. Let us be careful: 3*30 = 90, 4*21 = 84, 90 - 84 = 6? No, gcd(30,21) = 3. Recheck: 30 = 1*21 + 9, 21 = 2*9 + 3, 9 = 3*3 + 0. So gcd = 3. Back-substitute: 3 = 21 - 2*9 = 21 - 2*(30 - 1*21) = 3*21 - 2*30. So x = -2, y = 3: (-2)*30 + 3*21 = -60 + 63 = 3. Correct.

C++
int extgcd(int a, int b, int &x, int &y) {
    if (b == 0) {
        x = 1; y = 0;
        return a;
    }
    int x1, y1;
    int g = extgcd(b, a % b, x1, y1);
    x = y1;
    y = x1 - (a / b) * y1;
    return g;
}
Python
def extgcd(a, b):
    if b == 0:
        return a, 1, 0
    g, x1, y1 = extgcd(b, a % b)
    return g, y1, x1 - (a // b) * y1
Time: O(log(min(a, b)))  |  Finds x, y such that ax + by = gcd(a, b)

3. Modular Inverse (All Methods)

Finding the modular inverse is one of the most common operations in CP. If you need to compute a/b mod p, you actually compute a * b-1 mod p. The inverse of b mod p exists if and only if gcd(b, p) = 1.

Method 1: Fermat's Little Theorem (p is prime)

If p is prime and gcd(a, p) = 1, then ap-1 ≡ 1 (mod p), so a-1 ≡ ap-2 (mod p). Use binary exponentiation to compute this in O(log p).

C++
long long power(long long base, long long exp, long long mod) {
    long long result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;
        base = base * base % mod;
        exp >>= 1;
    }
    return result;
}

long long mod_inverse_fermat(long long a, long long p) {
    return power(a, p - 2, p); // only works when p is prime
}

Method 2: Extended GCD (works for any modulus)

This works whenever gcd(a, m) = 1, even if m is not prime.

C++
long long mod_inverse_extgcd(long long a, long long m) {
    long long x, y;
    long long g = extgcd(a, m, x, y);
    if (g != 1) return -1; // no inverse
    return (x % m + m) % m; // ensure positive
}
Python
def mod_inverse_extgcd(a, m):
    g, x, _ = extgcd(a, m)
    if g != 1:
        return -1
    return x % m

Method 3: Precompute Inverses for 1..n

When to use: You need inverses of many consecutive numbers mod p (e.g., for computing many nCr values). Uses the recurrence: inv[i] = -(p // i) * inv[p % i] mod p.

inv[1] = 1  |  inv[i] = -(p / i) * inv[p % i] mod p  |  Time: O(n)
C++
const int MOD = 1e9 + 7;
vector<long long> inv(MAXN);

void precompute_inverses(int n) {
    inv[1] = 1;
    for (int i = 2; i <= n; i++)
        inv[i] = (MOD - MOD / i) * inv[MOD % i] % MOD;
}
Python
def precompute_inverses(n, mod):
    inv = [0] * (n + 1)
    inv[1] = 1
    for i in range(2, n + 1):
        inv[i] = (mod - mod // i) * inv[mod % i] % mod
    return inv
Python Shortcut

In Python 3.8+, pow(a, -1, p) computes the modular inverse directly. Under the hood it uses an efficient algorithm. For CP, this is the fastest way in Python.

Common Bug

When computing (a - b) % MOD, the result can be negative in C++. Always do ((a - b) % MOD + MOD) % MOD. Python handles negative mod correctly, but C++ does not.

4. Binary Exponentiation

The idea: To compute an, instead of multiplying a by itself n times (O(n)), decompose n into its binary representation and square repeatedly. This gives O(log n).

Walkthrough: 313

13 in binary = 1101. So 313 = 38 * 34 * 31

We compute: 31 = 3, 32 = 9, 34 = 81, 38 = 6561

Result: 3 * 81 * 6561 = 1594323

Only 4 squarings and 3 multiplications instead of 12 multiplications.

C++
long long binpow(long long a, long long b, long long mod) {
    a %= mod;
    long long res = 1;
    while (b > 0) {
        if (b & 1)
            res = res * a % mod;
        a = a * a % mod;
        b >>= 1;
    }
    return res;
}
Python
def binpow(a, b, mod):
    a %= mod
    res = 1
    while b > 0:
        if b & 1:
            res = res * a % mod
        a = a * a % mod
        b >>= 1
    return res

# Or just use: pow(a, b, mod) -- Python's built-in is optimized

Applications Beyond Simple Powers

Overflow Trap

In C++, a * a can overflow even with long long if a is near 1018. Use __int128 or reduce a modulo first. In most CP problems, MOD = 109+7 so a < 109 and a*a < 1018 which fits in long long.

5. Fermat's Little Theorem

If p is prime and gcd(a, p) = 1, then ap-1 ≡ 1 (mod p)

Intuition: Consider the set {1, 2, 3, ..., p-1}. Multiply every element by a (mod p). Since gcd(a, p) = 1, this is a bijection -- you get the same set back in a different order. So the product of all elements stays the same: ap-1 * (p-1)! ≡ (p-1)! (mod p). Cancel (p-1)! (which is coprime to p) to get ap-1 ≡ 1.

Key Consequences

Example: Reduce a huge exponent

Compute 21018 mod 109+7.

Since 109+7 is prime: 21018 mod (109+7) = 21018 mod (109+6) mod (109+7).

Compute 1018 mod (109+6) first, then use binary exponentiation.

When to Use

Any time you see "answer mod 109+7" and division is involved, Fermat's theorem gives you the modular inverse. It is the most common method in CP because the modulus is almost always prime.

6. Euler's Totient Function & Theorem

φ(n) counts how many integers from 1 to n are coprime to n. This is one of the most important functions in number theory.

φ(n) = n * ∏(1 - 1/p) for each prime p dividing n

Computing φ(n) for a Single Value

Factorize n and apply the formula. Time: O(√n).

C++
int euler_totient(int n) {
    int result = n;
    for (int p = 2; p * p <= n; p++) {
        if (n % p == 0) {
            while (n % p == 0) n /= p;
            result -= result / p;
        }
    }
    if (n > 1) result -= result / n;
    return result;
}
Python
def euler_totient(n):
    result = n
    p = 2
    temp = n
    while p * p <= temp:
        if temp % p == 0:
            while temp % p == 0:
                temp //= p
            result -= result // p
        p += 1
    if temp > 1:
        result -= result // temp
    return result

Sieve for φ(1) through φ(n)

C++
vector<int> totient_sieve(int n) {
    vector<int> phi(n + 1);
    iota(phi.begin(), phi.end(), 0); // phi[i] = i
    for (int i = 2; i <= n; i++) {
        if (phi[i] == i) { // i is prime
            for (int j = i; j <= n; j += i)
                phi[j] -= phi[j] / i;
        }
    }
    return phi;
}

Euler's Theorem (Generalization of Fermat's)

If gcd(a, m) = 1, then aφ(m) ≡ 1 (mod m)

Fermat's is the special case where m = p (prime), so φ(p) = p - 1. Euler's theorem works for any modulus, not just primes. This means a-1 ≡ aφ(m)-1 (mod m) when gcd(a, m) = 1.

Reducing Exponents with Composite Moduli

an mod m = an mod φ(m) mod m (when gcd(a, m) = 1). This is crucial for problems like "compute a^b^c mod m" where you need to reduce the tower of exponents.

Key Properties of φ

7. Chinese Remainder Theorem (CRT)

Problem: Given a system of congruences x ≡ a1 (mod m1), x ≡ a2 (mod m2), ..., find x.

If m1, m2, ..., mk are pairwise coprime, there exists a unique solution modulo M = m1 * m2 * ... * mk.

Intuition: Think of it as reconstructing a number from its "remainders" in different modular systems. Like how knowing the time is 3 o'clock on a 12-hour clock and that it is 15:00 on a 24-hour clock together tell you more than either alone.

Two-Equation CRT (The Workhorse)

In practice, you almost always combine equations two at a time. Given x ≡ a1 (mod m1) and x ≡ a2 (mod m2):

C++
// Returns {r, m} such that x ≡ r (mod m), or {-1, -1} if no solution
pair<long long, long long> crt(long long a1, long long m1, long long a2, long long m2) {
    long long x, y;
    long long g = extgcd(m1, m2, x, y);
    if ((a2 - a1) % g != 0) return {-1, -1}; // no solution
    long long lcm = m1 / g * m2;
    long long diff = (a2 - a1) / g;
    long long r = (a1 + m1 * (diff % (m2 / g) * (x % (m2 / g)) % (m2 / g))) % lcm;
    return {(r % lcm + lcm) % lcm, lcm};
}
Python
def crt(a1, m1, a2, m2):
    """Combine x ≡ a1 (mod m1) and x ≡ a2 (mod m2)"""
    g, p, _ = extgcd(m1, m2)
    if (a2 - a1) % g != 0:
        return -1, -1  # no solution
    lcm = m1 // g * m2
    diff = (a2 - a1) // g
    r = (a1 + m1 * (diff * p % (m2 // g))) % lcm
    return r % lcm, lcm

# Combine multiple congruences
def crt_system(remainders, moduli):
    cur_a, cur_m = remainders[0], moduli[0]
    for i in range(1, len(remainders)):
        cur_a, cur_m = crt(cur_a, cur_m, remainders[i], moduli[i])
        if cur_m == -1:
            return -1, -1
    return cur_a, cur_m
Classic CRT Example

x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7)

M = 3 * 5 * 7 = 105. Solution: x = 23. Verify: 23 mod 3 = 2, 23 mod 5 = 3, 23 mod 7 = 2.

When CRT Appears in CP

1) Computing nCr mod a non-prime (factor it into prime powers, compute nCr mod each, combine with CRT). 2) Problems where constraints are given in different moduli. 3) Hash collision avoidance (use multiple mods).

8. Combinatorics for CP

Combinatorics is everywhere in CP. The core challenge is computing C(n, r) mod p efficiently.

Precomputing Factorials & Inverse Factorials

The standard approach: Precompute fact[i] = i! mod p and ifact[i] = (i!)-1 mod p. Then C(n, r) = fact[n] * ifact[r] * ifact[n-r] mod p. This is O(1) per query after O(n) precomputation.

C++
const int MOD = 1e9 + 7;
const int MAXN = 2000005;
long long fact[MAXN], ifact[MAXN];

void precompute() {
    fact[0] = 1;
    for (int i = 1; i < MAXN; i++)
        fact[i] = fact[i - 1] * i % MOD;
    ifact[MAXN - 1] = power(fact[MAXN - 1], MOD - 2, MOD);
    for (int i = MAXN - 2; i >= 0; i--)
        ifact[i] = ifact[i + 1] * (i + 1) % MOD;
}

long long C(int n, int r) {
    if (r < 0 || r > n) return 0;
    return fact[n] % MOD * ifact[r] % MOD * ifact[n - r] % MOD;
}
Python
MOD = 10**9 + 7

def precompute(n):
    fact = [1] * (n + 1)
    for i in range(1, n + 1):
        fact[i] = fact[i - 1] * i % MOD
    ifact = [1] * (n + 1)
    ifact[n] = pow(fact[n], MOD - 2, MOD)
    for i in range(n - 1, -1, -1):
        ifact[i] = ifact[i + 1] * (i + 1) % MOD
    return fact, ifact

def C(n, r, fact, ifact):
    if r < 0 or r > n:
        return 0
    return fact[n] * ifact[r] % MOD * ifact[n - r] % MOD
The Inverse Factorial Trick

Instead of computing n inverse factorials with n modular exponentiations (O(n log p) total), compute only ifact[n] using one modular exponentiation, then get ifact[n-1] = ifact[n] * n, ifact[n-2] = ifact[n-1] * (n-1), etc. Total: O(n) + O(log p).

Pascal's Triangle (Small n)

If n is small (≤ 5000), build Pascal's triangle with DP. Works with any modulus, not just primes.

C++
int C[5001][5001];
void build_pascal(int n, int mod) {
    for (int i = 0; i <= n; i++) {
        C[i][0] = 1;
        for (int j = 1; j <= i; j++)
            C[i][j] = (C[i-1][j-1] + C[i-1][j]) % mod;
    }
}

Lucas' Theorem (n is huge, p is small prime)

C(n, r) mod p = ∏ C(ni, ri) mod p, where ni and ri are digits of n and r in base p.

When to use: n can be up to 1018 but p is a small prime (say ≤ 106). You need a table of C(a, b) for 0 ≤ a, b < p, then multiply digits.

C++
long long lucas(long long n, long long r, int p) {
    if (r == 0) return 1;
    return C(n % p, r % p) * lucas(n / p, r / p, p) % p;
}
Python
def lucas(n, r, p, fact, ifact):
    if r == 0:
        return 1
    ni, ri = n % p, r % p
    if ri > ni:
        return 0
    return C(ni, ri, fact, ifact) * lucas(n // p, r // p, p, fact, ifact) % p

Stars and Bars

Problem: Distribute n identical objects into k distinct bins.

Stars and Bars Example

Distribute 5 candies among 3 kids (each kid can get 0): C(5 + 3 - 1, 3 - 1) = C(7, 2) = 21 ways.

If each kid must get at least 1: C(5 - 1, 3 - 1) = C(4, 2) = 6 ways.

Useful Identities

9. Inclusion-Exclusion Principle

|A1 ∪ A2 ∪ ... ∪ An| = ∑|Ai| - ∑|Ai ∩ Aj| + ∑|Ai ∩ Aj ∩ Ak| - ... + (-1)n+1|A1 ∩ ... ∩ An|

Intuition: When you add up the sizes of individual sets, elements in multiple sets get counted multiple times. Subtract pairwise intersections -- but then elements in three sets got subtracted too much. Add triple intersections back, and so on. The alternating signs ensure every element is counted exactly once.

Template: Count Numbers 1..n NOT Divisible by Any Prime in a Set

C++
// Count numbers in [1, n] coprime to all primes in 'primes'
long long count_coprime(long long n, vector<long long> &primes) {
    int m = primes.size();
    long long result = 0;
    for (int mask = 1; mask < (1 << m); mask++) {
        long long prod = 1;
        int bits = 0;
        for (int i = 0; i < m; i++) {
            if (mask & (1 << i)) {
                prod *= primes[i];
                if (prod > n) break;
                bits++;
            }
        }
        if (prod > n) continue;
        if (bits % 2 == 1) result += n / prod;
        else result -= n / prod;
    }
    return n - result; // n minus those divisible by at least one prime
}
Python
def count_coprime(n, primes):
    m = len(primes)
    result = 0
    for mask in range(1, 1 << m):
        prod = 1
        bits = 0
        for i in range(m):
            if mask & (1 << i):
                prod *= primes[i]
                bits += 1
        if prod > n:
            continue
        if bits % 2 == 1:
            result += n // prod
        else:
            result -= n // prod
    return n - result

Derangements

A derangement is a permutation where no element is in its original position. The number of derangements D(n) can be derived using inclusion-exclusion.

D(n) = n! * ∑k=0n (-1)k / k!   ≈   n! / e (rounded to nearest integer)
C++
long long derangements(int n, long long mod) {
    // D(n) = (n-1) * (D(n-1) + D(n-2))
    if (n == 0) return 1;
    if (n == 1) return 0;
    long long prev2 = 1, prev1 = 0;
    for (int i = 2; i <= n; i++) {
        long long cur = (i - 1) * (prev1 + prev2) % mod;
        prev2 = prev1;
        prev1 = cur;
    }
    return prev1;
}

Deriving Euler's Totient via Inclusion-Exclusion

Let the prime factors of n be p1, ..., pk. Let Ai = numbers in [1, n] divisible by pi. Then φ(n) = n - |A1 ∪ ... ∪ Ak|. Apply inclusion-exclusion: φ(n) = n * (1 - 1/p1) * (1 - 1/p2) * ... * (1 - 1/pk). That is exactly the totient formula.

Complexity Note

Inclusion-exclusion over k elements requires iterating over 2k subsets. This is fine when k is small (number of distinct prime factors of n ≤ 15 for n ≤ 1018). For larger k, look for ways to avoid full enumeration.

10. Catalan Numbers

Cn = C(2n, n) / (n + 1) = C(2n, n) - C(2n, n + 1)

The first few: 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, ...

What Catalan Numbers Count

Catalan numbers appear everywhere. Here is an incomplete list:

Computing Cn

C++
long long catalan(int n) {
    // Using precomputed fact[] and ifact[]
    return fact[2 * n] % MOD * ifact[n] % MOD * ifact[n + 1] % MOD;
}

// Alternative: use the recurrence C(n) = C(2n,n) - C(2n,n+1)
long long catalan_alt(int n) {
    return (C(2 * n, n) - C(2 * n, n + 1) + MOD) % MOD;
}
Python
def catalan(n, fact, ifact):
    return fact[2 * n] * ifact[n] % MOD * ifact[n + 1] % MOD

Recurrence

C0 = 1  |  Cn+1 = ∑i=0n Ci * Cn-i

This recurrence comes from choosing where to split. For valid parentheses: the first '(' matches some ')' at position 2k+1, giving Ck ways inside and Cn-1-k ways outside.

The Ballot / Reflection Argument

The formula C(2n,n) - C(2n,n+1) comes from the reflection principle. Total paths from (0,0) to (2n,0) with +1/-1 steps = C(2n,n). Bad paths (those that go below 0) biject to paths to (2n,-2) via reflection, which is C(2n,n+1). Subtract to get Catalan.

CP Pattern: Grid Paths That Avoid a Diagonal

Paths from (0,0) to (n,m) using right/up steps that never cross above the line y = x: if m ≤ n, the answer is C(n+m, m) - C(n+m, m-1). This generalizes Catalan (n = m gives Cn).

11. Matrix Exponentiation

The big idea: Any linear recurrence can be expressed as a matrix multiplication. Once in matrix form, use binary exponentiation on the matrix to compute the n-th term in O(k3 log n), where k is the matrix dimension.

Fibonacci in O(log n)

[F(n+1), F(n)] = [F(1), F(0)] * [[1, 1], [1, 0]]n
C++
typedef vector<vector<long long>> Matrix;

Matrix multiply(const Matrix &A, const Matrix &B, long long mod) {
    int n = A.size(), m = B[0].size(), k = B.size();
    Matrix C(n, vector<long long>(m, 0));
    for (int i = 0; i < n; i++)
        for (int j = 0; j < k; j++)
            if (A[i][j])
                for (int l = 0; l < m; l++)
                    C[i][l] = (C[i][l] + A[i][j] * B[j][l]) % mod;
    return C;
}

Matrix mat_pow(Matrix A, long long p, long long mod) {
    int n = A.size();
    Matrix result(n, vector<long long>(n, 0));
    for (int i = 0; i < n; i++) result[i][i] = 1; // identity
    while (p > 0) {
        if (p & 1) result = multiply(result, A, mod);
        A = multiply(A, A, mod);
        p >>= 1;
    }
    return result;
}

long long fibonacci(long long n, long long mod) {
    if (n <= 1) return n;
    Matrix M = {{1, 1}, {1, 0}};
    Matrix result = mat_pow(M, n - 1, mod);
    return result[0][0];
}
Python
def mat_mul(A, B, mod):
    n, m, k = len(A), len(B[0]), len(B)
    C = [[0] * m for _ in range(n)]
    for i in range(n):
        for j in range(k):
            if A[i][j]:
                for l in range(m):
                    C[i][l] = (C[i][l] + A[i][j] * B[j][l]) % mod
    return C

def mat_pow(M, p, mod):
    n = len(M)
    result = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
    while p > 0:
        if p & 1:
            result = mat_mul(result, M, mod)
        M = mat_mul(M, M, mod)
        p >>= 1
    return result

def fibonacci(n, mod):
    if n <= 1:
        return n
    result = mat_pow([[1, 1], [1, 0]], n - 1, mod)
    return result[0][0]

General Linear Recurrences

Any recurrence f(n) = c1f(n-1) + c2f(n-2) + ... + ckf(n-k) can be converted into a k x k matrix. The matrix is called the companion matrix:

Example: f(n) = 2f(n-1) + 3f(n-2) + f(n-3)

The companion matrix is:

[[2, 3, 1], [1, 0, 0], [0, 1, 0]]

Multiply [f(n-1), f(n-2), f(n-3)] by this matrix to get [f(n), f(n-1), f(n-2)].

Raise to the (n-2)th power and multiply by the initial state vector to get f(n).

When to Reach for Matrix Exponentiation

1) Linear recurrence with large n (up to 1018). 2) DP with small state space but huge number of transitions (e.g., tiling problems, string counting). 3) Graph problems: count paths of length exactly k (raise adjacency matrix to k-th power). 4) Any time you see "compute f(1018)" in a problem.

12. Mobius Function & Inversion

The Mobius Function μ(n)

μ(1) = 1  |  μ(n) = 0 if n has a squared prime factor  |  μ(n) = (-1)k if n is a product of k distinct primes

Intuition: μ is the "sign" function for inclusion-exclusion on divisors. It tells you whether to add or subtract when inverting a sum over divisors.

Computing μ via Sieve

C++
vector<int> mobius_sieve(int n) {
    vector<int> mu(n + 1, 0);
    vector<bool> is_prime(n + 1, true);
    vector<int> primes;
    mu[1] = 1;
    for (int i = 2; i <= n; i++) {
        if (is_prime[i]) {
            primes.push_back(i);
            mu[i] = -1; // prime has one factor
        }
        for (int j = 0; j < primes.size() && (long long)i * primes[j] <= n; j++) {
            is_prime[i * primes[j]] = false;
            if (i % primes[j] == 0) {
                mu[i * primes[j]] = 0; // squared factor
                break;
            }
            mu[i * primes[j]] = -mu[i];
        }
    }
    return mu;
}
Python
def mobius_sieve(n):
    mu = [0] * (n + 1)
    mu[1] = 1
    is_prime = [True] * (n + 1)
    primes = []
    for i in range(2, n + 1):
        if is_prime[i]:
            primes.append(i)
            mu[i] = -1
        for p in primes:
            if i * p > n:
                break
            is_prime[i * p] = False
            if i % p == 0:
                mu[i * p] = 0
                break
            mu[i * p] = -mu[i]
    return mu

Mobius Inversion Formula

If f(n) = ∑d|n g(d), then g(n) = ∑d|n μ(n/d) * f(d).

Why it matters: Many problems ask you to count pairs (i, j) with gcd(i, j) = k, or sum f(gcd(i, j)). Mobius inversion lets you reformulate these into sums you can compute efficiently.

Classic Application: Count Coprime Pairs

Problem: Count pairs (i, j) with 1 ≤ i ≤ n, 1 ≤ j ≤ m, gcd(i, j) = 1.

Answer = ∑d=1min(n,m) μ(d) * floor(n/d) * floor(m/d)
C++
// Count pairs with gcd = 1, using Mobius function
long long count_coprime_pairs(int n, int m, vector<int> &mu) {
    long long ans = 0;
    int lim = min(n, m);
    for (int d = 1; d <= lim; d++)
        ans += (long long)mu[d] * (n / d) * (m / d);
    return ans;
}

// Optimized with floor division blocks: O(sqrt(n) + sqrt(m))
long long count_coprime_fast(int n, int m, vector<long long> &prefix_mu) {
    long long ans = 0;
    int lim = min(n, m);
    for (int l = 1, r; l <= lim; l = r + 1) {
        r = min({lim, n / (n / l), m / (m / l)});
        ans += (prefix_mu[r] - prefix_mu[l - 1]) * (n / l) * (m / l);
    }
    return ans;
}
Floor Division Blocks

The values of floor(n/d) take at most O(2√n) distinct values as d goes from 1 to n. For each distinct value, the d's that produce it form a contiguous block. This lets you speed up any sum ∑ f(d) * floor(n/d) from O(n) to O(√n) by grouping. This optimization is essential for Mobius-based problems.

Key Property

d|n μ(d) = [n = 1]    (equals 1 if n = 1, else 0)

This is the foundation of why Mobius inversion works. It acts as the "identity" for the Dirichlet convolution.

13. FFT & Polynomial Multiplication

What FFT does: Multiplies two polynomials of degree n in O(n log n) instead of O(n2). This is not just for polynomial problems -- any convolution can be expressed as polynomial multiplication.

When You Need FFT in CP

The Idea (Intuition, Not Full Proof)

  1. Evaluate both polynomials at n special points (roots of unity) -- O(n log n) via divide and conquer
  2. Multiply the values pointwise -- O(n)
  3. Interpolate back to get the result polynomial -- O(n log n) (inverse FFT, which is almost the same as forward FFT)

NTT (Number Theoretic Transform) for Modular Arithmetic

In CP, you usually need results mod a prime p. NTT replaces complex roots of unity with primitive roots mod p. The standard MOD = 998244353 = 119 * 223 + 1 is specifically chosen because it has a primitive root and supports NTT up to 223 elements.

C++
const int MOD = 998244353;
const int g = 3; // primitive root of MOD

void ntt(vector<long long> &a, bool invert) {
    int n = a.size();
    for (int i = 1, j = 0; i < n; i++) {
        int bit = n >> 1;
        for (; j & bit; bit >>= 1) j ^= bit;
        j ^= bit;
        if (i < j) swap(a[i], a[j]);
    }
    for (int len = 2; len <= n; len <<= 1) {
        long long w = invert ? binpow(binpow(g, (MOD - 1) / len, MOD), MOD - 2, MOD)
                             : binpow(g, (MOD - 1) / len, MOD);
        for (int i = 0; i < n; i += len) {
            long long wn = 1;
            for (int j = 0; j < len / 2; j++) {
                long long u = a[i + j];
                long long v = a[i + j + len / 2] * wn % MOD;
                a[i + j] = (u + v) % MOD;
                a[i + j + len / 2] = (u - v + MOD) % MOD;
                wn = wn * w % MOD;
            }
        }
    }
    if (invert) {
        long long n_inv = binpow(n, MOD - 2, MOD);
        for (auto &x : a) x = x * n_inv % MOD;
    }
}

vector<long long> poly_multiply(vector<long long> a, vector<long long> b) {
    int result_size = a.size() + b.size() - 1;
    int n = 1;
    while (n < result_size) n <<= 1;
    a.resize(n); b.resize(n);
    ntt(a, false); ntt(b, false);
    for (int i = 0; i < n; i++) a[i] = a[i] * b[i] % MOD;
    ntt(a, true);
    a.resize(result_size);
    return a;
}
Python
# Python FFT using complex numbers (for non-modular problems)
import cmath

def fft(a, invert=False):
    n = len(a)
    if n == 1:
        return a
    a_even = fft(a[0::2], invert)
    a_odd = fft(a[1::2], invert)
    angle = 2 * cmath.pi / n * (-1 if invert else 1)
    w = 1
    wn = cmath.exp(1j * angle)
    result = [0] * n
    for i in range(n // 2):
        result[i] = a_even[i] + w * a_odd[i]
        result[i + n // 2] = a_even[i] - w * a_odd[i]
        if invert:
            result[i] /= 2
            result[i + n // 2] /= 2
        w *= wn
    return result

# For CP in Python, use numpy.fft or the NTT approach above translated to Python
Precision Issues with Complex FFT

Complex number FFT suffers from floating point errors. For exact integer results, round at the end. For modular results, always use NTT instead. In C++ CP, NTT is the standard. Python is too slow for FFT in most CP problems -- use C++ for FFT/NTT.

998244353 vs 109+7

If the problem uses MOD = 998244353, it is almost certainly hinting that NTT is needed. If MOD = 109+7, you might need to use three different NTT-friendly primes and combine results with CRT, or use complex FFT with care.

Bonus: Modular Arithmetic Cheat Sheet

This is the foundation everything above builds on. Internalize these rules and modular arithmetic becomes second nature.

Basic Rules

(a + b) mod m = ((a mod m) + (b mod m)) mod m
(a * b) mod m = ((a mod m) * (b mod m)) mod m
(a - b) mod m = ((a mod m) - (b mod m) + m) mod m
(a / b) mod m = (a * b-1) mod m   (b-1 is the modular inverse)
Division is NOT (a mod m) / (b mod m)

You cannot just divide the remainders. You must multiply by the modular inverse. This is the most common mistake beginners make with modular arithmetic.

The 109+7 Constant

Why do CP problems use 109+7 (1000000007)?

Common Modular Arithmetic Patterns in CP

C++
const long long MOD = 1e9 + 7;

// Safe addition
long long add(long long a, long long b) {
    return (a + b) % MOD;
}

// Safe subtraction (handles negative)
long long sub(long long a, long long b) {
    return (a - b % MOD + MOD) % MOD;
}

// Safe multiplication
long long mul(long long a, long long b) {
    return a % MOD * (b % MOD) % MOD;
}

// Safe division
long long divmod(long long a, long long b) {
    return mul(a, binpow(b, MOD - 2, MOD));
}

Exponent Towers: abc mod m

A classic hard problem. The key insight chains together Euler's theorem:

C++
// Compute a^b^c^... mod m using Euler's theorem recursively
// tower = [a, b, c, ...] read bottom to top
long long tower_mod(vector<long long> &tower, int idx, long long mod) {
    if (mod == 1) return 0;
    if (idx == tower.size()) return 1 % mod;
    long long phi_m = euler_totient(mod);
    long long exp = tower_mod(tower, idx + 1, phi_m);
    return binpow(tower[idx] % mod, exp + phi_m, mod);
    // +phi_m handles the case when gcd(a, mod) != 1
    // (generalized Euler's theorem: a^n ≡ a^(n mod φ(m) + φ(m)) when n ≥ log2(m))
}
Generalized Euler's Theorem

When gcd(a, m) is not 1, the standard Euler's theorem does not apply directly. But for n ≥ log2(m): an ≡ a(n mod φ(m)) + φ(m) (mod m). This is why the code above adds φ(m) to the exponent -- it ensures correctness even when a and m share factors.

Bonus: Primality Testing

Trial Division: O(√n)

Simple and sufficient for n up to about 1014 in CP.

C++
bool is_prime(long long n) {
    if (n < 2) return false;
    if (n < 4) return true;
    if (n % 2 == 0 || n % 3 == 0) return false;
    for (long long i = 5; i * i <= n; i += 6)
        if (n % i == 0 || n % (i + 2) == 0) return false;
    return true;
}
Python
def is_prime(n):
    if n < 2: return False
    if n < 4: return True
    if n % 2 == 0 or n % 3 == 0: return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0: return False
        i += 6
    return True

Miller-Rabin: O(k log2 n)

The idea: Based on Fermat's little theorem, but stronger. If n is prime, then for any a: either ad ≡ 1 (mod n), or ad*2r ≡ -1 (mod n) for some r, where n-1 = d * 2s.

Deterministic version for CP: For n < 3.3 * 1024, testing with bases {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37} is enough. For n < 3.2 * 1018 (fits in long long), bases {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37} give a deterministic answer.

C++
using u64 = unsigned long long;
using u128 = __uint128_t;

u64 mulmod(u64 a, u64 b, u64 m) {
    return (u128)a * b % m;
}

u64 powmod(u64 a, u64 b, u64 m) {
    u64 res = 1;
    a %= m;
    while (b > 0) {
        if (b & 1) res = mulmod(res, a, m);
        a = mulmod(a, a, m);
        b >>= 1;
    }
    return res;
}

bool miller_rabin(u64 n, u64 a) {
    if (n % a == 0) return n == a;
    u64 d = n - 1;
    int r = 0;
    while (d % 2 == 0) { d /= 2; r++; }
    u64 x = powmod(a, d, n);
    if (x == 1 || x == n - 1) return true;
    for (int i = 0; i < r - 1; i++) {
        x = mulmod(x, x, n);
        if (x == n - 1) return true;
    }
    return false;
}

bool is_prime_millerrabin(u64 n) {
    if (n < 2) return false;
    for (u64 a : {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37})
        if (!miller_rabin(n, a)) return false;
    return true;
}
When to Use Miller-Rabin

When n is too large for a sieve (say n up to 1018) and you need to check individual numbers for primality. Often paired with Pollard's rho for factorization of large numbers. If you only need primes up to 107, just use a sieve.

Bonus: Divisor Functions & Multiplicative Functions

Number of Divisors: d(n) / τ(n) / σ0(n)

If n = p1a1 * p2a2 * ... * pkak, then d(n) = (a1+1)(a2+1)...(ak+1).

C++
int count_divisors(long long n) {
    int count = 1;
    for (long long p = 2; p * p <= n; p++) {
        int e = 0;
        while (n % p == 0) { n /= p; e++; }
        count *= (e + 1);
    }
    if (n > 1) count *= 2; // remaining prime factor with exponent 1
    return count;
}

Sum of Divisors: σ(n) / σ1(n)

If n = p1a1 * ... * pkak, then σ(n) = ∏ (piai+1 - 1) / (pi - 1).

Listing All Divisors

C++
vector<long long> get_divisors(long long n) {
    vector<long long> divs;
    for (long long i = 1; i * i <= n; i++) {
        if (n % i == 0) {
            divs.push_back(i);
            if (i != n / i)
                divs.push_back(n / i);
        }
    }
    sort(divs.begin(), divs.end());
    return divs;
}
Python
def get_divisors(n):
    divs = []
    i = 1
    while i * i <= n:
        if n % i == 0:
            divs.append(i)
            if i != n // i:
                divs.append(n // i)
        i += 1
    return sorted(divs)

Sieve for Number of Divisors / Sum of Divisors (1 to n)

C++
// Count divisors for all numbers 1..n in O(n log n)
vector<int> divisor_count_sieve(int n) {
    vector<int> d(n + 1, 0);
    for (int i = 1; i <= n; i++)
        for (int j = i; j <= n; j += i)
            d[j]++;
    return d;
}

// Sum of divisors for all numbers 1..n in O(n log n)
vector<long long> divisor_sum_sieve(int n) {
    vector<long long> s(n + 1, 0);
    for (int i = 1; i <= n; i++)
        for (int j = i; j <= n; j += i)
            s[j] += i;
    return s;
}
Multiplicative Functions

A function f is multiplicative if f(ab) = f(a)f(b) when gcd(a,b) = 1. The functions φ, μ, d, σ are all multiplicative. This means you only need to know their values on prime powers, then multiply. The linear sieve can compute any multiplicative function in O(n).

Harmonic Series Trick

The sum ∑i=1n floor(n/i) = O(n log n). This is because floor(n/i) counts how many multiples of i are ≤ n. This sum appears constantly in number theory problems. Compute it in O(√n) using floor division blocks (same technique as in Mobius section).

14. Practice Problems

Organized by topic. Solve these and you will have solid number theory fundamentals for CP.

Sieve & Primes

GCD & Extended GCD

Modular Inverse & Combinatorics

Binary & Matrix Exponentiation

CRT

Inclusion-Exclusion

Catalan Numbers

Mobius Function

FFT / NTT

Study Order Recommendation

Start with sieve and binary exponentiation (you will use these in almost every contest). Then learn modular inverse and combinatorics (needed for 80% of math problems). CRT and Euler's totient come next. Matrix exponentiation unlocks a whole class of problems. Save Mobius inversion and FFT for after you are comfortable with everything else -- they appear mainly in Div 1 C/D and harder.

CP Template Tip

Keep a pre-written template with: binpow, modular inverse, factorial/inverse-factorial precomputation, sieve, and matrix exponentiation. In a contest, you should be able to paste these in and use them instantly. The time to learn them is now -- the contest is not the time to debug your number theory library.