Every number theory algorithm, technique, and trick you need to crush math-heavy CP problems. From modular arithmetic fundamentals to FFT and Mobius inversion -- with full code in C++ and Python.
The sieve is the bread and butter of number theory in CP. If a problem mentions primes and n ≤ 107, you almost certainly need a sieve.
Idea: Mark all composite numbers by iterating through each prime and crossing off its multiples. We only need to check up to √n because any composite number ≤ n has a prime factor ≤ √n.
C++
vector<bool> sieve(int n) {
vector<bool> is_prime(n + 1, true);
is_prime[0] = is_prime[1] = false;
for (int i = 2; i * i <= n; i++) {
if (is_prime[i]) {
for (int j = i * i; j <= n; j += i)
is_prime[j] = false;
}
}
return is_prime;
}
Python
def sieve(n):
is_prime = [True] * (n + 1)
is_prime[0] = is_prime[1] = False
i = 2
while i * i <= n:
if is_prime[i]:
for j in range(i * i, n + 1, i):
is_prime[j] = False
i += 1
return is_prime
Why it matters: The classic sieve marks some composites multiple times (e.g., 12 is crossed off by both 2 and 3). The linear sieve ensures each composite is marked exactly once by its smallest prime factor. This also gives you the smallest prime factor (SPF) array for free -- extremely useful for fast factorization.
C++
const int MAXN = 10000001;
int spf[MAXN]; // smallest prime factor
vector<int> primes;
void linear_sieve() {
for (int i = 2; i < MAXN; i++) {
if (spf[i] == 0) { // i is prime
spf[i] = i;
primes.push_back(i);
}
for (int j = 0; j < primes.size() && primes[j] <= spf[i] && (long long)i * primes[j] < MAXN; j++) {
spf[i * primes[j]] = primes[j];
}
}
}
Python
def linear_sieve(n):
spf = [0] * (n + 1)
primes = []
for i in range(2, n + 1):
if spf[i] == 0:
spf[i] = i
primes.append(i)
for p in primes:
if p > spf[i] or i * p > n:
break
spf[i * p] = p
return spf, primes
Once you have the SPF array, you can factorize any number up to n in O(log n) by repeatedly dividing by its smallest prime factor. This is a game-changer when you need to factorize many numbers.
C++
// Factorize x using the SPF array -- O(log x)
map<int, int> factorize(int x) {
map<int, int> factors;
while (x > 1) {
factors[spf[x]]++;
x /= spf[x];
}
return factors;
}
When to use: You need primes in a range [L, R] where R can be up to 1012, but R - L is small (say ≤ 106). You cannot allocate an array of size 1012, so you sieve only the segment [L, R].
Idea: First, sieve all primes up to √R using the normal sieve. Then, for each such prime p, mark all its multiples in the range [L, R].
C++
vector<bool> segmented_sieve(long long L, long long R) {
int lim = sqrt((double)R) + 1;
auto small_primes = sieve(lim); // basic sieve up to sqrt(R)
vector<bool> is_prime(R - L + 1, true);
if (L == 0) is_prime[0] = false;
if (L <= 1) is_prime[1 - L] = false;
for (int p = 2; p <= lim; p++) {
if (!small_primes[p]) continue;
// Find first multiple of p >= L
long long start = max((long long)p * p, ((L + p - 1) / p) * p);
for (long long j = start; j <= R; j += p)
is_prime[j - L] = false;
}
return is_prime;
}
In the segmented sieve, be careful with L = 0 or L = 1. Also ensure your starting multiple calculation does not overflow -- use long long throughout.
Intuition: gcd(a, b) = gcd(b, a mod b). Why? If d divides both a and b, it also divides a - kb = a mod b. So the set of common divisors stays the same, and we keep reducing until one side hits 0.
C++
// C++17 has __gcd and std::gcd, but know how to write it
int gcd(int a, int b) {
while (b) {
a %= b;
swap(a, b);
}
return a;
}
long long lcm(long long a, long long b) {
return a / gcd(a, b) * b; // divide first to avoid overflow
}
What it does: Given a and b, find integers x and y such that ax + by = gcd(a, b). This is called Bezout's identity, and it always has a solution.
Why it matters: This gives us the modular inverse. If gcd(a, m) = 1, then ax + my = 1, which means ax ≡ 1 (mod m), so x is the modular inverse of a mod m.
gcd(30, 21) → gcd(21, 9) → gcd(9, 3) → gcd(3, 0) = 3
Backtrack: 3 = 9 - 1*3 = 9 - 1*(21 - 2*9) = 3*9 - 1*21 = 3*(30 - 1*21) - 1*21 = 3*30 - 4*21
So x = 3, y = -4, and indeed 3*30 + (-4)*21 = 90 - 84 = 6... wait, that is not right. Let us be careful: 3*30 = 90, 4*21 = 84, 90 - 84 = 6? No, gcd(30,21) = 3. Recheck: 30 = 1*21 + 9, 21 = 2*9 + 3, 9 = 3*3 + 0. So gcd = 3. Back-substitute: 3 = 21 - 2*9 = 21 - 2*(30 - 1*21) = 3*21 - 2*30. So x = -2, y = 3: (-2)*30 + 3*21 = -60 + 63 = 3. Correct.
C++
int extgcd(int a, int b, int &x, int &y) {
if (b == 0) {
x = 1; y = 0;
return a;
}
int x1, y1;
int g = extgcd(b, a % b, x1, y1);
x = y1;
y = x1 - (a / b) * y1;
return g;
}
Python
def extgcd(a, b):
if b == 0:
return a, 1, 0
g, x1, y1 = extgcd(b, a % b)
return g, y1, x1 - (a // b) * y1
Finding the modular inverse is one of the most common operations in CP. If you need to compute a/b mod p, you actually compute a * b-1 mod p. The inverse of b mod p exists if and only if gcd(b, p) = 1.
If p is prime and gcd(a, p) = 1, then ap-1 ≡ 1 (mod p), so a-1 ≡ ap-2 (mod p). Use binary exponentiation to compute this in O(log p).
C++
long long power(long long base, long long exp, long long mod) {
long long result = 1;
base %= mod;
while (exp > 0) {
if (exp & 1) result = result * base % mod;
base = base * base % mod;
exp >>= 1;
}
return result;
}
long long mod_inverse_fermat(long long a, long long p) {
return power(a, p - 2, p); // only works when p is prime
}
This works whenever gcd(a, m) = 1, even if m is not prime.
C++
long long mod_inverse_extgcd(long long a, long long m) {
long long x, y;
long long g = extgcd(a, m, x, y);
if (g != 1) return -1; // no inverse
return (x % m + m) % m; // ensure positive
}
Python
def mod_inverse_extgcd(a, m):
g, x, _ = extgcd(a, m)
if g != 1:
return -1
return x % m
When to use: You need inverses of many consecutive numbers mod p (e.g., for computing many nCr values). Uses the recurrence: inv[i] = -(p // i) * inv[p % i] mod p.
C++
const int MOD = 1e9 + 7;
vector<long long> inv(MAXN);
void precompute_inverses(int n) {
inv[1] = 1;
for (int i = 2; i <= n; i++)
inv[i] = (MOD - MOD / i) * inv[MOD % i] % MOD;
}
Python
def precompute_inverses(n, mod):
inv = [0] * (n + 1)
inv[1] = 1
for i in range(2, n + 1):
inv[i] = (mod - mod // i) * inv[mod % i] % mod
return inv
In Python 3.8+, pow(a, -1, p) computes the modular inverse directly. Under the hood it uses an efficient algorithm. For CP, this is the fastest way in Python.
When computing (a - b) % MOD, the result can be negative in C++. Always do ((a - b) % MOD + MOD) % MOD. Python handles negative mod correctly, but C++ does not.
The idea: To compute an, instead of multiplying a by itself n times (O(n)), decompose n into its binary representation and square repeatedly. This gives O(log n).
13 in binary = 1101. So 313 = 38 * 34 * 31
We compute: 31 = 3, 32 = 9, 34 = 81, 38 = 6561
Result: 3 * 81 * 6561 = 1594323
Only 4 squarings and 3 multiplications instead of 12 multiplications.
C++
long long binpow(long long a, long long b, long long mod) {
a %= mod;
long long res = 1;
while (b > 0) {
if (b & 1)
res = res * a % mod;
a = a * a % mod;
b >>= 1;
}
return res;
}
Python
def binpow(a, b, mod):
a %= mod
res = 1
while b > 0:
if b & 1:
res = res * a % mod
a = a * a % mod
b >>= 1
return res
# Or just use: pow(a, b, mod) -- Python's built-in is optimized
In C++, a * a can overflow even with long long if a is near 1018. Use __int128 or reduce a modulo first. In most CP problems, MOD = 109+7 so a < 109 and a*a < 1018 which fits in long long.
Intuition: Consider the set {1, 2, 3, ..., p-1}. Multiply every element by a (mod p). Since gcd(a, p) = 1, this is a bijection -- you get the same set back in a different order. So the product of all elements stays the same: ap-1 * (p-1)! ≡ (p-1)! (mod p). Cancel (p-1)! (which is coprime to p) to get ap-1 ≡ 1.
Compute 21018 mod 109+7.
Since 109+7 is prime: 21018 mod (109+7) = 21018 mod (109+6) mod (109+7).
Compute 1018 mod (109+6) first, then use binary exponentiation.
Any time you see "answer mod 109+7" and division is involved, Fermat's theorem gives you the modular inverse. It is the most common method in CP because the modulus is almost always prime.
φ(n) counts how many integers from 1 to n are coprime to n. This is one of the most important functions in number theory.
Factorize n and apply the formula. Time: O(√n).
C++
int euler_totient(int n) {
int result = n;
for (int p = 2; p * p <= n; p++) {
if (n % p == 0) {
while (n % p == 0) n /= p;
result -= result / p;
}
}
if (n > 1) result -= result / n;
return result;
}
Python
def euler_totient(n):
result = n
p = 2
temp = n
while p * p <= temp:
if temp % p == 0:
while temp % p == 0:
temp //= p
result -= result // p
p += 1
if temp > 1:
result -= result // temp
return result
C++
vector<int> totient_sieve(int n) {
vector<int> phi(n + 1);
iota(phi.begin(), phi.end(), 0); // phi[i] = i
for (int i = 2; i <= n; i++) {
if (phi[i] == i) { // i is prime
for (int j = i; j <= n; j += i)
phi[j] -= phi[j] / i;
}
}
return phi;
}
Fermat's is the special case where m = p (prime), so φ(p) = p - 1. Euler's theorem works for any modulus, not just primes. This means a-1 ≡ aφ(m)-1 (mod m) when gcd(a, m) = 1.
an mod m = an mod φ(m) mod m (when gcd(a, m) = 1). This is crucial for problems like "compute a^b^c mod m" where you need to reduce the tower of exponents.
Problem: Given a system of congruences x ≡ a1 (mod m1), x ≡ a2 (mod m2), ..., find x.
Intuition: Think of it as reconstructing a number from its "remainders" in different modular systems. Like how knowing the time is 3 o'clock on a 12-hour clock and that it is 15:00 on a 24-hour clock together tell you more than either alone.
In practice, you almost always combine equations two at a time. Given x ≡ a1 (mod m1) and x ≡ a2 (mod m2):
C++
// Returns {r, m} such that x ≡ r (mod m), or {-1, -1} if no solution
pair<long long, long long> crt(long long a1, long long m1, long long a2, long long m2) {
long long x, y;
long long g = extgcd(m1, m2, x, y);
if ((a2 - a1) % g != 0) return {-1, -1}; // no solution
long long lcm = m1 / g * m2;
long long diff = (a2 - a1) / g;
long long r = (a1 + m1 * (diff % (m2 / g) * (x % (m2 / g)) % (m2 / g))) % lcm;
return {(r % lcm + lcm) % lcm, lcm};
}
Python
def crt(a1, m1, a2, m2):
"""Combine x ≡ a1 (mod m1) and x ≡ a2 (mod m2)"""
g, p, _ = extgcd(m1, m2)
if (a2 - a1) % g != 0:
return -1, -1 # no solution
lcm = m1 // g * m2
diff = (a2 - a1) // g
r = (a1 + m1 * (diff * p % (m2 // g))) % lcm
return r % lcm, lcm
# Combine multiple congruences
def crt_system(remainders, moduli):
cur_a, cur_m = remainders[0], moduli[0]
for i in range(1, len(remainders)):
cur_a, cur_m = crt(cur_a, cur_m, remainders[i], moduli[i])
if cur_m == -1:
return -1, -1
return cur_a, cur_m
x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7)
M = 3 * 5 * 7 = 105. Solution: x = 23. Verify: 23 mod 3 = 2, 23 mod 5 = 3, 23 mod 7 = 2.
1) Computing nCr mod a non-prime (factor it into prime powers, compute nCr mod each, combine with CRT). 2) Problems where constraints are given in different moduli. 3) Hash collision avoidance (use multiple mods).
Combinatorics is everywhere in CP. The core challenge is computing C(n, r) mod p efficiently.
The standard approach: Precompute fact[i] = i! mod p and ifact[i] = (i!)-1 mod p. Then C(n, r) = fact[n] * ifact[r] * ifact[n-r] mod p. This is O(1) per query after O(n) precomputation.
C++
const int MOD = 1e9 + 7;
const int MAXN = 2000005;
long long fact[MAXN], ifact[MAXN];
void precompute() {
fact[0] = 1;
for (int i = 1; i < MAXN; i++)
fact[i] = fact[i - 1] * i % MOD;
ifact[MAXN - 1] = power(fact[MAXN - 1], MOD - 2, MOD);
for (int i = MAXN - 2; i >= 0; i--)
ifact[i] = ifact[i + 1] * (i + 1) % MOD;
}
long long C(int n, int r) {
if (r < 0 || r > n) return 0;
return fact[n] % MOD * ifact[r] % MOD * ifact[n - r] % MOD;
}
Python
MOD = 10**9 + 7
def precompute(n):
fact = [1] * (n + 1)
for i in range(1, n + 1):
fact[i] = fact[i - 1] * i % MOD
ifact = [1] * (n + 1)
ifact[n] = pow(fact[n], MOD - 2, MOD)
for i in range(n - 1, -1, -1):
ifact[i] = ifact[i + 1] * (i + 1) % MOD
return fact, ifact
def C(n, r, fact, ifact):
if r < 0 or r > n:
return 0
return fact[n] * ifact[r] % MOD * ifact[n - r] % MOD
Instead of computing n inverse factorials with n modular exponentiations (O(n log p) total), compute only ifact[n] using one modular exponentiation, then get ifact[n-1] = ifact[n] * n, ifact[n-2] = ifact[n-1] * (n-1), etc. Total: O(n) + O(log p).
If n is small (≤ 5000), build Pascal's triangle with DP. Works with any modulus, not just primes.
C++
int C[5001][5001];
void build_pascal(int n, int mod) {
for (int i = 0; i <= n; i++) {
C[i][0] = 1;
for (int j = 1; j <= i; j++)
C[i][j] = (C[i-1][j-1] + C[i-1][j]) % mod;
}
}
When to use: n can be up to 1018 but p is a small prime (say ≤ 106). You need a table of C(a, b) for 0 ≤ a, b < p, then multiply digits.
C++
long long lucas(long long n, long long r, int p) {
if (r == 0) return 1;
return C(n % p, r % p) * lucas(n / p, r / p, p) % p;
}
Python
def lucas(n, r, p, fact, ifact):
if r == 0:
return 1
ni, ri = n % p, r % p
if ri > ni:
return 0
return C(ni, ri, fact, ifact) * lucas(n // p, r // p, p, fact, ifact) % p
Problem: Distribute n identical objects into k distinct bins.
Distribute 5 candies among 3 kids (each kid can get 0): C(5 + 3 - 1, 3 - 1) = C(7, 2) = 21 ways.
If each kid must get at least 1: C(5 - 1, 3 - 1) = C(4, 2) = 6 ways.
Intuition: When you add up the sizes of individual sets, elements in multiple sets get counted multiple times. Subtract pairwise intersections -- but then elements in three sets got subtracted too much. Add triple intersections back, and so on. The alternating signs ensure every element is counted exactly once.
C++
// Count numbers in [1, n] coprime to all primes in 'primes'
long long count_coprime(long long n, vector<long long> &primes) {
int m = primes.size();
long long result = 0;
for (int mask = 1; mask < (1 << m); mask++) {
long long prod = 1;
int bits = 0;
for (int i = 0; i < m; i++) {
if (mask & (1 << i)) {
prod *= primes[i];
if (prod > n) break;
bits++;
}
}
if (prod > n) continue;
if (bits % 2 == 1) result += n / prod;
else result -= n / prod;
}
return n - result; // n minus those divisible by at least one prime
}
Python
def count_coprime(n, primes):
m = len(primes)
result = 0
for mask in range(1, 1 << m):
prod = 1
bits = 0
for i in range(m):
if mask & (1 << i):
prod *= primes[i]
bits += 1
if prod > n:
continue
if bits % 2 == 1:
result += n // prod
else:
result -= n // prod
return n - result
A derangement is a permutation where no element is in its original position. The number of derangements D(n) can be derived using inclusion-exclusion.
C++
long long derangements(int n, long long mod) {
// D(n) = (n-1) * (D(n-1) + D(n-2))
if (n == 0) return 1;
if (n == 1) return 0;
long long prev2 = 1, prev1 = 0;
for (int i = 2; i <= n; i++) {
long long cur = (i - 1) * (prev1 + prev2) % mod;
prev2 = prev1;
prev1 = cur;
}
return prev1;
}
Let the prime factors of n be p1, ..., pk. Let Ai = numbers in [1, n] divisible by pi. Then φ(n) = n - |A1 ∪ ... ∪ Ak|. Apply inclusion-exclusion: φ(n) = n * (1 - 1/p1) * (1 - 1/p2) * ... * (1 - 1/pk). That is exactly the totient formula.
Inclusion-exclusion over k elements requires iterating over 2k subsets. This is fine when k is small (number of distinct prime factors of n ≤ 15 for n ≤ 1018). For larger k, look for ways to avoid full enumeration.
The first few: 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, ...
Catalan numbers appear everywhere. Here is an incomplete list:
C++
long long catalan(int n) {
// Using precomputed fact[] and ifact[]
return fact[2 * n] % MOD * ifact[n] % MOD * ifact[n + 1] % MOD;
}
// Alternative: use the recurrence C(n) = C(2n,n) - C(2n,n+1)
long long catalan_alt(int n) {
return (C(2 * n, n) - C(2 * n, n + 1) + MOD) % MOD;
}
Python
def catalan(n, fact, ifact):
return fact[2 * n] * ifact[n] % MOD * ifact[n + 1] % MOD
This recurrence comes from choosing where to split. For valid parentheses: the first '(' matches some ')' at position 2k+1, giving Ck ways inside and Cn-1-k ways outside.
The formula C(2n,n) - C(2n,n+1) comes from the reflection principle. Total paths from (0,0) to (2n,0) with +1/-1 steps = C(2n,n). Bad paths (those that go below 0) biject to paths to (2n,-2) via reflection, which is C(2n,n+1). Subtract to get Catalan.
Paths from (0,0) to (n,m) using right/up steps that never cross above the line y = x: if m ≤ n, the answer is C(n+m, m) - C(n+m, m-1). This generalizes Catalan (n = m gives Cn).
The big idea: Any linear recurrence can be expressed as a matrix multiplication. Once in matrix form, use binary exponentiation on the matrix to compute the n-th term in O(k3 log n), where k is the matrix dimension.
C++
typedef vector<vector<long long>> Matrix;
Matrix multiply(const Matrix &A, const Matrix &B, long long mod) {
int n = A.size(), m = B[0].size(), k = B.size();
Matrix C(n, vector<long long>(m, 0));
for (int i = 0; i < n; i++)
for (int j = 0; j < k; j++)
if (A[i][j])
for (int l = 0; l < m; l++)
C[i][l] = (C[i][l] + A[i][j] * B[j][l]) % mod;
return C;
}
Matrix mat_pow(Matrix A, long long p, long long mod) {
int n = A.size();
Matrix result(n, vector<long long>(n, 0));
for (int i = 0; i < n; i++) result[i][i] = 1; // identity
while (p > 0) {
if (p & 1) result = multiply(result, A, mod);
A = multiply(A, A, mod);
p >>= 1;
}
return result;
}
long long fibonacci(long long n, long long mod) {
if (n <= 1) return n;
Matrix M = {{1, 1}, {1, 0}};
Matrix result = mat_pow(M, n - 1, mod);
return result[0][0];
}
Python
def mat_mul(A, B, mod):
n, m, k = len(A), len(B[0]), len(B)
C = [[0] * m for _ in range(n)]
for i in range(n):
for j in range(k):
if A[i][j]:
for l in range(m):
C[i][l] = (C[i][l] + A[i][j] * B[j][l]) % mod
return C
def mat_pow(M, p, mod):
n = len(M)
result = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
while p > 0:
if p & 1:
result = mat_mul(result, M, mod)
M = mat_mul(M, M, mod)
p >>= 1
return result
def fibonacci(n, mod):
if n <= 1:
return n
result = mat_pow([[1, 1], [1, 0]], n - 1, mod)
return result[0][0]
Any recurrence f(n) = c1f(n-1) + c2f(n-2) + ... + ckf(n-k) can be converted into a k x k matrix. The matrix is called the companion matrix:
The companion matrix is:
[[2, 3, 1], [1, 0, 0], [0, 1, 0]]
Multiply [f(n-1), f(n-2), f(n-3)] by this matrix to get [f(n), f(n-1), f(n-2)].
Raise to the (n-2)th power and multiply by the initial state vector to get f(n).
1) Linear recurrence with large n (up to 1018). 2) DP with small state space but huge number of transitions (e.g., tiling problems, string counting). 3) Graph problems: count paths of length exactly k (raise adjacency matrix to k-th power). 4) Any time you see "compute f(1018)" in a problem.
Intuition: μ is the "sign" function for inclusion-exclusion on divisors. It tells you whether to add or subtract when inverting a sum over divisors.
C++
vector<int> mobius_sieve(int n) {
vector<int> mu(n + 1, 0);
vector<bool> is_prime(n + 1, true);
vector<int> primes;
mu[1] = 1;
for (int i = 2; i <= n; i++) {
if (is_prime[i]) {
primes.push_back(i);
mu[i] = -1; // prime has one factor
}
for (int j = 0; j < primes.size() && (long long)i * primes[j] <= n; j++) {
is_prime[i * primes[j]] = false;
if (i % primes[j] == 0) {
mu[i * primes[j]] = 0; // squared factor
break;
}
mu[i * primes[j]] = -mu[i];
}
}
return mu;
}
Python
def mobius_sieve(n):
mu = [0] * (n + 1)
mu[1] = 1
is_prime = [True] * (n + 1)
primes = []
for i in range(2, n + 1):
if is_prime[i]:
primes.append(i)
mu[i] = -1
for p in primes:
if i * p > n:
break
is_prime[i * p] = False
if i % p == 0:
mu[i * p] = 0
break
mu[i * p] = -mu[i]
return mu
If f(n) = ∑d|n g(d), then g(n) = ∑d|n μ(n/d) * f(d).
Why it matters: Many problems ask you to count pairs (i, j) with gcd(i, j) = k, or sum f(gcd(i, j)). Mobius inversion lets you reformulate these into sums you can compute efficiently.
Problem: Count pairs (i, j) with 1 ≤ i ≤ n, 1 ≤ j ≤ m, gcd(i, j) = 1.
C++
// Count pairs with gcd = 1, using Mobius function
long long count_coprime_pairs(int n, int m, vector<int> &mu) {
long long ans = 0;
int lim = min(n, m);
for (int d = 1; d <= lim; d++)
ans += (long long)mu[d] * (n / d) * (m / d);
return ans;
}
// Optimized with floor division blocks: O(sqrt(n) + sqrt(m))
long long count_coprime_fast(int n, int m, vector<long long> &prefix_mu) {
long long ans = 0;
int lim = min(n, m);
for (int l = 1, r; l <= lim; l = r + 1) {
r = min({lim, n / (n / l), m / (m / l)});
ans += (prefix_mu[r] - prefix_mu[l - 1]) * (n / l) * (m / l);
}
return ans;
}
The values of floor(n/d) take at most O(2√n) distinct values as d goes from 1 to n. For each distinct value, the d's that produce it form a contiguous block. This lets you speed up any sum ∑ f(d) * floor(n/d) from O(n) to O(√n) by grouping. This optimization is essential for Mobius-based problems.
This is the foundation of why Mobius inversion works. It acts as the "identity" for the Dirichlet convolution.
What FFT does: Multiplies two polynomials of degree n in O(n log n) instead of O(n2). This is not just for polynomial problems -- any convolution can be expressed as polynomial multiplication.
In CP, you usually need results mod a prime p. NTT replaces complex roots of unity with primitive roots mod p. The standard MOD = 998244353 = 119 * 223 + 1 is specifically chosen because it has a primitive root and supports NTT up to 223 elements.
C++
const int MOD = 998244353;
const int g = 3; // primitive root of MOD
void ntt(vector<long long> &a, bool invert) {
int n = a.size();
for (int i = 1, j = 0; i < n; i++) {
int bit = n >> 1;
for (; j & bit; bit >>= 1) j ^= bit;
j ^= bit;
if (i < j) swap(a[i], a[j]);
}
for (int len = 2; len <= n; len <<= 1) {
long long w = invert ? binpow(binpow(g, (MOD - 1) / len, MOD), MOD - 2, MOD)
: binpow(g, (MOD - 1) / len, MOD);
for (int i = 0; i < n; i += len) {
long long wn = 1;
for (int j = 0; j < len / 2; j++) {
long long u = a[i + j];
long long v = a[i + j + len / 2] * wn % MOD;
a[i + j] = (u + v) % MOD;
a[i + j + len / 2] = (u - v + MOD) % MOD;
wn = wn * w % MOD;
}
}
}
if (invert) {
long long n_inv = binpow(n, MOD - 2, MOD);
for (auto &x : a) x = x * n_inv % MOD;
}
}
vector<long long> poly_multiply(vector<long long> a, vector<long long> b) {
int result_size = a.size() + b.size() - 1;
int n = 1;
while (n < result_size) n <<= 1;
a.resize(n); b.resize(n);
ntt(a, false); ntt(b, false);
for (int i = 0; i < n; i++) a[i] = a[i] * b[i] % MOD;
ntt(a, true);
a.resize(result_size);
return a;
}
Python
# Python FFT using complex numbers (for non-modular problems)
import cmath
def fft(a, invert=False):
n = len(a)
if n == 1:
return a
a_even = fft(a[0::2], invert)
a_odd = fft(a[1::2], invert)
angle = 2 * cmath.pi / n * (-1 if invert else 1)
w = 1
wn = cmath.exp(1j * angle)
result = [0] * n
for i in range(n // 2):
result[i] = a_even[i] + w * a_odd[i]
result[i + n // 2] = a_even[i] - w * a_odd[i]
if invert:
result[i] /= 2
result[i + n // 2] /= 2
w *= wn
return result
# For CP in Python, use numpy.fft or the NTT approach above translated to Python
Complex number FFT suffers from floating point errors. For exact integer results, round at the end. For modular results, always use NTT instead. In C++ CP, NTT is the standard. Python is too slow for FFT in most CP problems -- use C++ for FFT/NTT.
If the problem uses MOD = 998244353, it is almost certainly hinting that NTT is needed. If MOD = 109+7, you might need to use three different NTT-friendly primes and combine results with CRT, or use complex FFT with care.
This is the foundation everything above builds on. Internalize these rules and modular arithmetic becomes second nature.
You cannot just divide the remainders. You must multiply by the modular inverse. This is the most common mistake beginners make with modular arithmetic.
Why do CP problems use 109+7 (1000000007)?
C++
const long long MOD = 1e9 + 7;
// Safe addition
long long add(long long a, long long b) {
return (a + b) % MOD;
}
// Safe subtraction (handles negative)
long long sub(long long a, long long b) {
return (a - b % MOD + MOD) % MOD;
}
// Safe multiplication
long long mul(long long a, long long b) {
return a % MOD * (b % MOD) % MOD;
}
// Safe division
long long divmod(long long a, long long b) {
return mul(a, binpow(b, MOD - 2, MOD));
}
A classic hard problem. The key insight chains together Euler's theorem:
C++
// Compute a^b^c^... mod m using Euler's theorem recursively
// tower = [a, b, c, ...] read bottom to top
long long tower_mod(vector<long long> &tower, int idx, long long mod) {
if (mod == 1) return 0;
if (idx == tower.size()) return 1 % mod;
long long phi_m = euler_totient(mod);
long long exp = tower_mod(tower, idx + 1, phi_m);
return binpow(tower[idx] % mod, exp + phi_m, mod);
// +phi_m handles the case when gcd(a, mod) != 1
// (generalized Euler's theorem: a^n ≡ a^(n mod φ(m) + φ(m)) when n ≥ log2(m))
}
When gcd(a, m) is not 1, the standard Euler's theorem does not apply directly. But for n ≥ log2(m): an ≡ a(n mod φ(m)) + φ(m) (mod m). This is why the code above adds φ(m) to the exponent -- it ensures correctness even when a and m share factors.
Simple and sufficient for n up to about 1014 in CP.
C++
bool is_prime(long long n) {
if (n < 2) return false;
if (n < 4) return true;
if (n % 2 == 0 || n % 3 == 0) return false;
for (long long i = 5; i * i <= n; i += 6)
if (n % i == 0 || n % (i + 2) == 0) return false;
return true;
}
Python
def is_prime(n):
if n < 2: return False
if n < 4: return True
if n % 2 == 0 or n % 3 == 0: return False
i = 5
while i * i <= n:
if n % i == 0 or n % (i + 2) == 0: return False
i += 6
return True
The idea: Based on Fermat's little theorem, but stronger. If n is prime, then for any a: either ad ≡ 1 (mod n), or ad*2r ≡ -1 (mod n) for some r, where n-1 = d * 2s.
Deterministic version for CP: For n < 3.3 * 1024, testing with bases {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37} is enough. For n < 3.2 * 1018 (fits in long long), bases {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37} give a deterministic answer.
C++
using u64 = unsigned long long;
using u128 = __uint128_t;
u64 mulmod(u64 a, u64 b, u64 m) {
return (u128)a * b % m;
}
u64 powmod(u64 a, u64 b, u64 m) {
u64 res = 1;
a %= m;
while (b > 0) {
if (b & 1) res = mulmod(res, a, m);
a = mulmod(a, a, m);
b >>= 1;
}
return res;
}
bool miller_rabin(u64 n, u64 a) {
if (n % a == 0) return n == a;
u64 d = n - 1;
int r = 0;
while (d % 2 == 0) { d /= 2; r++; }
u64 x = powmod(a, d, n);
if (x == 1 || x == n - 1) return true;
for (int i = 0; i < r - 1; i++) {
x = mulmod(x, x, n);
if (x == n - 1) return true;
}
return false;
}
bool is_prime_millerrabin(u64 n) {
if (n < 2) return false;
for (u64 a : {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37})
if (!miller_rabin(n, a)) return false;
return true;
}
When n is too large for a sieve (say n up to 1018) and you need to check individual numbers for primality. Often paired with Pollard's rho for factorization of large numbers. If you only need primes up to 107, just use a sieve.
If n = p1a1 * p2a2 * ... * pkak, then d(n) = (a1+1)(a2+1)...(ak+1).
C++
int count_divisors(long long n) {
int count = 1;
for (long long p = 2; p * p <= n; p++) {
int e = 0;
while (n % p == 0) { n /= p; e++; }
count *= (e + 1);
}
if (n > 1) count *= 2; // remaining prime factor with exponent 1
return count;
}
If n = p1a1 * ... * pkak, then σ(n) = ∏ (piai+1 - 1) / (pi - 1).
C++
vector<long long> get_divisors(long long n) {
vector<long long> divs;
for (long long i = 1; i * i <= n; i++) {
if (n % i == 0) {
divs.push_back(i);
if (i != n / i)
divs.push_back(n / i);
}
}
sort(divs.begin(), divs.end());
return divs;
}
Python
def get_divisors(n):
divs = []
i = 1
while i * i <= n:
if n % i == 0:
divs.append(i)
if i != n // i:
divs.append(n // i)
i += 1
return sorted(divs)
C++
// Count divisors for all numbers 1..n in O(n log n)
vector<int> divisor_count_sieve(int n) {
vector<int> d(n + 1, 0);
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j += i)
d[j]++;
return d;
}
// Sum of divisors for all numbers 1..n in O(n log n)
vector<long long> divisor_sum_sieve(int n) {
vector<long long> s(n + 1, 0);
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j += i)
s[j] += i;
return s;
}
A function f is multiplicative if f(ab) = f(a)f(b) when gcd(a,b) = 1. The functions φ, μ, d, σ are all multiplicative. This means you only need to know their values on prime powers, then multiply. The linear sieve can compute any multiplicative function in O(n).
The sum ∑i=1n floor(n/i) = O(n log n). This is because floor(n/i) counts how many multiples of i are ≤ n. This sum appears constantly in number theory problems. Compute it in O(√n) using floor division blocks (same technique as in Mobius section).
Organized by topic. Solve these and you will have solid number theory fundamentals for CP.
Start with sieve and binary exponentiation (you will use these in almost every contest). Then learn modular inverse and combinatorics (needed for 80% of math problems). CRT and Euler's totient come next. Matrix exponentiation unlocks a whole class of problems. Save Mobius inversion and FFT for after you are comfortable with everything else -- they appear mainly in Div 1 C/D and harder.
Keep a pre-written template with: binpow, modular inverse, factorial/inverse-factorial precomputation, sieve, and matrix exponentiation. In a contest, you should be able to paste these in and use them instantly. The time to learn them is now -- the contest is not the time to debug your number theory library.