The branch of mathematics built specifically for computer science. Logic, sets, graphs, counting, proofs -- this is where math and programming truly merge.
If you only study one branch of math for computer science, make it discrete math. Every if statement is logic. Every database query is set theory. Every algorithm analysis is Big-O. Every recursive function needs induction to prove correct. This page covers the mathematical foundations you will use every single day as a programmer.
Logic is the foundation of all mathematical reasoning -- and all programming. A proposition is a declarative statement that is either true or false, never both, never neither.
If you've written an if statement, you already do logic. Discrete math just gives you the formal vocabulary for what you're already doing:
# This code IS logic:
if user.is_admin and (not user.is_banned or user.has_override):
grant_access()
# In math notation: A ∧ (¬B ∨ C)
# Same thing, just different symbols
Every if/elif/else chain is an implication. Every while condition is a proposition evaluated each iteration. Boolean algebra IS programming logic -- the math just lets you prove things about it.
Not propositions: "What time is it?" (question), "Close the door" (command), "x + 5 = 10" (depends on x -- this is a predicate, not a proposition).
We combine propositions using logical operators (also called connectives). If you have written an if statement, you already know these.
| Operator | Symbol | Python | Meaning |
|---|---|---|---|
| AND | ∧ | and |
True only when both are true |
| OR | ∨ | or |
True when at least one is true |
| NOT | ¬ | not |
Flips true to false, false to true |
| XOR | ⊕ | ^ |
True when exactly one is true |
| Implication | → | -- | If A then B |
| Biconditional | ↔ | == |
A if and only if B |
A truth table shows every possible combination of inputs and the resulting output. Here are the truth tables for the core operators.
| A | B | A ∧ B |
|---|---|---|
| T | T | T |
| T | F | F |
| F | T | F |
| F | F | F |
| A | B | A ∨ B |
|---|---|---|
| T | T | T |
| T | F | T |
| F | T | T |
| F | F | F |
| A | ¬A |
|---|---|
| T | F |
| F | T |
| A | B | A ⊕ B |
|---|---|---|
| T | T | F |
| T | F | T |
| F | T | T |
| F | F | F |
The implication A → B means "if A, then B." It is only false when A is true but B is false. Think of it as a promise: "If it rains, I will bring an umbrella." The promise is only broken if it rains and you don't bring one.
| A | B | A → B |
|---|---|---|
| T | T | T |
| T | F | F |
| F | T | T |
| F | F | T |
When A is false, the implication A → B is always true regardless of B. This feels counterintuitive but makes logical sense: a promise made under a false premise cannot be broken. "If pigs fly, then I will give you a million dollars" -- pigs don't fly, so you can't claim I broke my promise.
The biconditional A ↔ B means "A if and only if B" -- both must have the same truth value.
| A | B | A ↔ B |
|---|---|---|
| T | T | T |
| T | F | F |
| F | T | F |
| F | F | T |
Two expressions are logically equivalent if they have the same truth value for every possible input. The most famous equivalences are De Morgan's Laws:
In English: "not (A and B)" is the same as "not A or not B." And "not (A or B)" is the same as "not A and not B."
# These two are equivalent:
if not (is_admin and is_logged_in):
deny_access()
# Same as:
if (not is_admin) or (not is_logged_in):
deny_access()
Other important equivalences:
Every if statement, every while loop condition, and every boolean expression in your code is propositional logic. De Morgan's Laws help you simplify complex conditions. Understanding implication helps you write correct assertions and contracts. Logic gates (AND, OR, NOT) are the physical building blocks of every CPU. Formal logic is also the foundation of SQL's WHERE clauses, type systems, and automated theorem proving.
A set is an unordered collection of distinct elements. Sets are one of the most fundamental concepts in all of mathematics -- and they show up everywhere in programming.
Sets are written with curly braces. Order does not matter, and duplicates are ignored.
We write x ∈ A to say "x is an element of A," and x ∉ A to say "x is not in A."
If A = {1, 2, 3}, then 2 ∈ A is true and 7 ∈ A is false.
| Operation | Symbol | Meaning | Python |
|---|---|---|---|
| Union | ∪ | Everything in A or B (or both) | A | B |
| Intersection | ∩ | Only elements in both A and B | A & B |
| Difference | \ | Elements in A but not in B | A - B |
| Complement | Ac | Everything not in A (relative to universe) | U - A |
Given A = {1, 2, 3, 4, 5} and B = {3, 4, 5, 6, 7}:
A ⊆ B means "A is a subset of B" -- every element of A is also in B. A ⊂ B means "A is a proper subset" -- A is a subset but A ≠ B.
A Venn diagram visualizes sets as overlapping circles. Imagine two circles, one for set A and one for set B, that partially overlap. The overlapping region represents A ∩ B (intersection). The entire area covered by both circles is A ∪ B (union). The area in A's circle but not in B's is A \ B (difference). This visual tool makes set relationships immediately intuitive.
A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
print(A | B) # Union: {1, 2, 3, 4, 5, 6, 7}
print(A & B) # Intersection: {3, 4, 5}
print(A - B) # Difference: {1, 2}
print(A ^ B) # Symmetric difference (XOR): {1, 2, 6, 7}
print(3 in A) # Membership: True
print(len(A)) # Cardinality: 5
Sets are everywhere in computing. Database queries are set operations (SQL's UNION, INTERSECT, EXCEPT map directly to ∪, ∩, \). Type systems model types as sets of values. Python has a built-in set type with O(1) membership testing. Hash sets and hash maps are the most-used data structures in real-world code. Even IP address ranges and permission systems are modeled as sets.
A relation from set A to set B is any subset of the Cartesian product A × B (all possible ordered pairs). A relation describes how elements of A connect to elements of B.
Let A = {1, 2, 3} and B = {a, b}. The Cartesian product A × B is:
{(1,a), (1,b), (2,a), (2,b), (3,a), (3,b)}
A relation R might be: R = {(1,a), (2,b), (3,a)} -- this picks out specific connections.
Relations can have special properties:
A function f: A → B is a special relation where every element of A maps to exactly one element of B. A is the domain, B is the codomain.
Functions come in three important flavors:
| Type | Definition | Intuition |
|---|---|---|
| Injective (one-to-one) | Different inputs always give different outputs. If f(a) = f(b), then a = b. | No two inputs share an output. |
| Surjective (onto) | Every element of B is mapped to by some element of A. | Every output is "hit" by some input. |
| Bijective (one-to-one and onto) | Both injective and surjective. | Perfect pairing -- every input maps to a unique output, and every output is used. |
f(x) = 2x from integers to integers:
If you have f: A → B and g: B → C, the composition g ∘ f (read "g of f") is defined as (g ∘ f)(x) = g(f(x)). Apply f first, then g.
Composition is not commutative: g ∘ f ≠ f ∘ g in general. In the example above, (x + 1)² ≠ x² + 1. This is like how in programming, the order of function calls matters: g(f(x)) is different from f(g(x)).
If f: A → B is bijective, then there exists an inverse function f-1: B → A such that f-1(f(x)) = x and f(f-1(y)) = y. The inverse "undoes" the original function.
f(x) = 2x + 3. To find f-1, solve y = 2x + 3 for x:
x = (y - 3) / 2, so f-1(y) = (y - 3) / 2
Check: f-1(f(5)) = f-1(13) = (13 - 3)/2 = 5. It works.
Bijections are essential in cryptography -- encryption must be reversible (bijective) for decryption to work. Hash functions are surjective (many inputs can map to the same hash). Function composition is the basis of functional programming and middleware pipelines. Understanding injectivity helps you reason about whether a mapping can be reversed (critical for encoding/decoding).
Proofs are how mathematicians verify that something is always true, not just sometimes. As a programmer, proof techniques help you reason about correctness -- proving your algorithm works for all inputs, not just the ones you tested.
Start with what you know (premises), apply logical steps, arrive at the conclusion. The most straightforward approach.
Claim: If n is even, then n² is even.
Proof:
Assume the opposite of what you want to prove, then show that leads to something impossible (a contradiction). Since the assumption caused a contradiction, it must be false, so the original statement must be true.
Claim: √2 is irrational.
Proof:
This is the most important proof technique for computer science. Induction proves that a statement holds for all natural numbers by proving two things:
Think of it like dominoes: if the first one falls (base case) and each domino knocks over the next (inductive step), then all dominoes fall.
Claim: The sum 1 + 2 + 3 + ... + n = n(n+1)/2 for all n ≥ 1.
Base case (n = 1):
Left side: 1. Right side: 1(1+1)/2 = 1. They match. Base case holds.
Inductive hypothesis: Assume the formula holds for n = k:
Inductive step: Prove it holds for n = k + 1:
The formula holds for k + 1. By induction, it holds for all n ≥ 1. □
Instead of proving "if A then B," prove the logically equivalent statement "if not B then not A." Sometimes this direction is easier.
Claim: If n² is even, then n is even.
Contrapositive: If n is odd, then n² is odd.
Proof: If n is odd, then n = 2k + 1. So n² = (2k+1)² = 4k² + 4k + 1 = 2(2k² + 2k) + 1, which is odd. □
Induction is how you prove recursive algorithms are correct. Every time you write a recursive function, you are implicitly doing induction: the base case is your recursion base case, and the recursive call assumes the function works for smaller inputs (inductive hypothesis). Loop invariants use induction to prove loops are correct. Proof by contradiction is used to prove impossibility results, like "you can't sort faster than O(n log n) with comparisons."
Combinatorics is the mathematics of counting. How many ways can something happen? This is essential for analyzing algorithms, understanding probability, and solving optimization problems.
If you have m ways to do one thing and n ways to do another, there are m × n ways to do both.
A restaurant has 3 appetizers and 5 entrees. How many different meals (one appetizer + one entree) can you order?
3 × 5 = 15 meals
In CS terms: A 4-bit number has 2 × 2 × 2 × 2 = 24 = 16 possible values. An 8-character password using 26 lowercase letters has 268 = 208 billion possibilities.
The factorial of n (written n!) is the product of all positive integers up to n:
Factorials grow incredibly fast: 10! = 3,628,800 and 20! = 2,432,902,008,176,640,000.
A permutation is an arrangement where order matters. How many ways can you arrange r items chosen from n items?
How many ways can 3 students win gold, silver, and bronze medals from a class of 10?
P(10, 3) = 10! / 7! = 10 × 9 × 8 = 720 ways
Order matters because gold-silver-bronze is a different outcome than bronze-silver-gold.
A combination is a selection where order does NOT matter. How many ways can you choose r items from n items?
Also written as (nr) and read "n choose r."
How many ways can you pick a committee of 3 people from 10?
C(10, 3) = 10! / (3! × 7!) = 720 / 6 = 120 ways
Order doesn't matter because {Alice, Bob, Carol} is the same committee as {Carol, Alice, Bob}.
Ask yourself: "Does the order matter?"
If you put n + 1 items into n containers, at least one container must hold more than one item. It sounds obvious, but it has powerful consequences.
Algorithm analysis relies on counting: "How many operations does this algorithm perform?" Combinatorics tells you the size of a search space (e.g., how many possible passwords exist), which determines whether brute force is feasible. Probability is built on counting (probability = favorable outcomes / total outcomes). The pigeonhole principle proves hash collisions are unavoidable, that lossless compression can't compress all files, and many other impossibility results.
How many ways can you distribute n identical items into k distinct boxes?
How many ways can you put 10 identical balls into 4 different boxes?
C(10 + 4 - 1, 4 - 1) = C(13, 3) = 286 ways
Visualization: Imagine 10 balls in a row: ○○○○○○○○○○. Adding 3 dividers creates 4 groups. We're choosing where to place 3 dividers among 13 positions.
When counting elements in overlapping sets, use inclusion-exclusion to avoid double-counting:
In a class of 100 students, 60 take Math, 50 take Physics, and 20 take both. How many take at least one?
|Math ∪ Physics| = 60 + 50 - 20 = 90 students
The -20 removes those who were counted twice (in both classes).
How many arrangements of n objects where some are identical?
How many ways can you arrange the letters in "MISSISSIPPI"?
Total: 11 letters. M: 1, I: 4, S: 4, P: 2
11! / (1! × 4! × 4! × 2!) = 39,916,800 / (1 × 24 × 24 × 2) = 34,650 arrangements
Choosing r items from n types, where you can choose the same type multiple times:
An ice cream shop has 5 flavors. How many ways can you choose 3 scoops (repeats allowed)?
C(5 + 3 - 1, 3) = C(7, 3) = 35 ways
This includes choices like (chocolate, chocolate, vanilla) or (strawberry, strawberry, strawberry).
A derangement is a permutation where no element appears in its original position.
5 people put their hats in a pile and randomly pick one. How many ways can NO ONE get their own hat?
D(5) = 5! × (1 - 1 + 1/2 - 1/6 + 1/24 - 1/120)
= 120 × (0 + 0.5 - 0.167 + 0.042 - 0.008)
= 120 × 0.367 = 44 derangements
Fun fact: As n gets large, approximately 1/e (≈ 36.8%) of all permutations are derangements.
The binomial theorem expands (a + b)ⁿ:
= C(4,0)x⁴ + C(4,1)x³y + C(4,2)x²y² + C(4,3)xy³ + C(4,4)y⁴
= x⁴ + 4x³y + 6x²y² + 4xy³ + y⁴
Pascal's Triangle: The coefficients (1, 4, 6, 4, 1) form row 4 of Pascal's triangle!
| Identity | Meaning |
|---|---|
| C(n, k) = C(n, n-k) | Choosing k to include = choosing n-k to exclude |
| C(n, 0) = C(n, n) = 1 | One way to choose nothing or everything |
| C(n, k) = C(n-1, k-1) + C(n-1, k) | Pascal's Triangle rule |
| Σ C(n, k) = 2ⁿ | Total subsets of n items |
A set of n elements has exactly 2ⁿ subsets (including empty set and the full set). This is why iterating over all subsets of n items takes O(2ⁿ) time -- exponential! For n=30, that's over a billion subsets. Understanding this helps you recognize when a brute-force approach is infeasible.
A graph G = (V, E) consists of vertices (nodes) V and edges (connections) E. Graphs are one of the most important data structures in computer science -- they model networks, relationships, maps, dependencies, and much more.
| Type | Description | Example |
|---|---|---|
| Undirected | Edges have no direction. If A connects to B, then B connects to A. | Facebook friendships |
| Directed | Edges have direction. A → B does not mean B → A. | Twitter follows, web links |
| Weighted | Edges have values (costs, distances, capacities). | Road maps with distances |
| Unweighted | All edges are equal -- no values attached. | Social connections |
There are two main ways to store a graph in code:
Adjacency Matrix -- a 2D array where matrix[i][j] = 1 if there is an edge from vertex i to vertex j.
# Graph: A--B, A--C, B--C
# A B C
# A [ 0, 1, 1 ]
# B [ 1, 0, 1 ]
# C [ 1, 1, 0 ]
matrix = [
[0, 1, 1],
[1, 0, 1],
[1, 1, 0]
]
Space: O(V²). Edge lookup: O(1). Best for dense graphs.
Adjacency List -- each vertex stores a list of its neighbors.
# Same graph: A--B, A--C, B--C
graph = {
'A': ['B', 'C'],
'B': ['A', 'C'],
'C': ['A', 'B']
}
Space: O(V + E). Edge lookup: O(degree). Best for sparse graphs (most real-world graphs).
Two fundamental ways to explore a graph:
| Algorithm | Strategy | Data Structure | Use Case |
|---|---|---|---|
| BFS (Breadth-First Search) | Explore all neighbors first, then their neighbors | Queue | Shortest path (unweighted), level-order |
| DFS (Depth-First Search) | Go as deep as possible, then backtrack | Stack / recursion | Cycle detection, topological sort, maze solving |
from collections import deque
def bfs(graph, start):
visited = set()
queue = deque([start])
visited.add(start)
while queue:
vertex = queue.popleft()
print(vertex, end=' ')
for neighbor in graph[vertex]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
print(start, end=' ')
for neighbor in graph[start]:
if neighbor not in visited:
dfs(graph, neighbor, visited)
An Euler path visits every edge exactly once. An Euler circuit is an Euler path that starts and ends at the same vertex.
This is the problem that founded graph theory. The city of Konigsberg had 7 bridges connecting 4 land masses. Euler proved it was impossible to walk across every bridge exactly once because more than two vertices had odd degree.
Graphs are arguably the most important data structure in CS. The internet is a graph (pages are vertices, links are edges). Social networks are graphs. GPS navigation uses weighted graphs with shortest-path algorithms (Dijkstra's). Package managers (npm, pip) use directed acyclic graphs for dependency resolution. Compilers build abstract syntax trees and control flow graphs. Databases use B-trees (tree-structured graphs) for indexing. If you learn one data structure deeply, make it graphs.
Number theory studies the properties of integers. It might seem abstract, but it is the mathematical backbone of cryptography, hashing, and error detection.
We say a divides b (written a | b) if b = a × k for some integer k. For example, 3 | 12 because 12 = 3 × 4.
A prime number is greater than 1 and divisible only by 1 and itself. The first primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29...
The Fundamental Theorem of Arithmetic says every integer greater than 1 can be written as a unique product of primes:
The Greatest Common Divisor GCD(a, b) is the largest number that divides both a and b. The Least Common Multiple LCM(a, b) is the smallest number that both a and b divide.
An efficient method to compute GCD using repeated division. This is one of the oldest algorithms known (over 2300 years old) and is still used today.
Find GCD(252, 105):
def gcd(a, b):
while b:
a, b = b, a % b
return a
print(gcd(252, 105)) # 21
a mod n gives the remainder when a is divided by n. We write a ≡ b (mod n) to say "a and b have the same remainder when divided by n."
These properties allow you to compute modular arithmetic on very large numbers without overflow:
Compute (987654321 × 123456789) mod 1000000007 without overflow:
# Instead of computing the huge product first:
a = 987654321 % 1000000007
b = 123456789 % 1000000007
result = (a * b) % 1000000007
print(result) # Safe from overflow in languages with fixed-width integers
RSA encryption is built entirely on number theory: large prime numbers, modular exponentiation, and the difficulty of factoring. Hash functions use modular arithmetic extensively (e.g., hash(key) = key mod table_size). Checksums and error-detecting codes use modular arithmetic. Random number generators use modular arithmetic (linear congruential generators). Competitive programming problems frequently require modular arithmetic to handle large numbers (mod 109 + 7 is the most common modulus).
Boolean algebra is the mathematics of true/false values (or equivalently, 1/0). It is the direct mathematical foundation of digital electronics and every computer ever built.
A boolean variable can only be 0 (false) or 1 (true). The three fundamental operations are:
| Operation | Notation | Logic Gate | Rule |
|---|---|---|---|
| AND | A · B or AB | AND gate | 1 only if both inputs are 1 |
| OR | A + B | OR gate | 1 if at least one input is 1 |
| NOT | A̅ or ¬A | NOT gate (inverter) | Flips 0 to 1, 1 to 0 |
Boolean algebra follows laws similar to regular algebra, with some unique ones:
| Law | AND Form | OR Form |
|---|---|---|
| Identity | A · 1 = A | A + 0 = A |
| Null | A · 0 = 0 | A + 1 = 1 |
| Idempotent | A · A = A | A + A = A |
| Complement | A · ¬A = 0 | A + ¬A = 1 |
| Commutative | A · B = B · A | A + B = B + A |
| Associative | (AB)C = A(BC) | (A+B)+C = A+(B+C) |
| Distributive | A(B+C) = AB + AC | A+(BC) = (A+B)(A+C) |
| De Morgan's | ¬(AB) = ¬A + ¬B | ¬(A+B) = ¬A · ¬B |
| Absorption | A(A+B) = A | A + AB = A |
Simplify: F = AB + A¬B + ¬AB
We reduced a complex expression to A + B (just an OR gate).
In hardware, boolean operations are implemented as logic gates. Every CPU is built from billions of these gates.
A half adder adds two single-bit numbers. It needs two outputs: the sum and the carry.
| A | B | Sum | Carry |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
This is literally how your CPU adds numbers -- chains of logic gates.
Boolean algebra is the math of hardware design. Every CPU, every memory chip, every digital circuit is built from boolean logic gates. Compiler optimizations use boolean simplification to reduce the number of instructions. Database query optimizers simplify WHERE clauses using boolean algebra. Understanding boolean algebra helps you write simpler, faster conditional logic in your code. The absorption law alone (A + AB = A) can simplify many complex if statements.
Big-O notation describes the upper bound on the growth rate of an algorithm's time (or space) as the input size grows. It answers: "How does this algorithm scale?"
We say f(n) = O(g(n)) if there exist constants c and n0 such that f(n) ≤ c · g(n) for all n ≥ n0. In plain English: for large enough inputs, f(n) grows no faster than g(n), up to a constant factor.
Big-O drops constants and lower-order terms because we care about scalability, not exact counts:
| Big-O | Name | Example | n=1000 |
|---|---|---|---|
| O(1) | Constant | Array index lookup, hash map get | 1 |
| O(log n) | Logarithmic | Binary search | ~10 |
| O(n) | Linear | Linear search, single loop | 1,000 |
| O(n log n) | Linearithmic | Merge sort, heap sort | ~10,000 |
| O(n²) | Quadratic | Bubble sort, nested loops | 1,000,000 |
| O(2n) | Exponential | Brute-force subsets | ~10301 |
| O(n!) | Factorial | Brute-force permutations | ~102567 |
Three rules of thumb:
def get_first(arr):
return arr[0] # Always one operation, regardless of array size
def find_max(arr):
max_val = arr[0]
for num in arr: # Loops through n elements
if num > max_val:
max_val = num
return max_val # O(n) -- one loop over the data
def has_duplicate(arr):
for i in range(len(arr)): # n iterations
for j in range(i + 1, len(arr)): # n iterations (roughly)
if arr[i] == arr[j]:
return True
return False # O(n^2) -- nested loop
def binary_search(arr, target):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
lo = mid + 1 # Eliminate half the data
else:
hi = mid - 1 # Eliminate half the data
return -1 # O(log n) -- halving each step
Why log n? If you halve n repeatedly: n → n/2 → n/4 → ... → 1. How many halves until you reach 1? That's log₂(n). This is the *inverse* of exponential growth.
When you have two inputs of different sizes, you need two variables. This is crucial and often confuses beginners.
def all_pairs(arr1, arr2):
"""Find all pairs from two arrays"""
pairs = []
for a in arr1: # N iterations (arr1 has N elements)
for b in arr2: # M iterations (arr2 has M elements)
pairs.append((a, b))
return pairs # O(N × M) -- every element of arr1 pairs with every element of arr2
# If arr1 has 100 elements and arr2 has 50 elements: 100 × 50 = 5,000 pairs
def process_both(arr1, arr2):
"""Process both arrays separately"""
total = 0
for a in arr1: # N iterations
total += a
for b in arr2: # M iterations (after arr1 loop finishes)
total += b
return total # O(N + M) -- sequential, not nested!
# If arr1 has 100 elements and arr2 has 50 elements: 100 + 50 = 150 operations
Nested loops = multiply. The inner loop runs completely for each iteration of the outer loop.
Sequential loops = add. One finishes, then the other runs.
For N=M=1000: O(N×M) = 1,000,000 operations. O(N+M) = 2,000 operations. That's a 500× difference!
If both arrays have the same size n, then O(N×M) = O(n×n) = O(n²). That's why nested loops over the same array are O(n²).
def same_array_pairs(arr):
for i in arr: # n iterations
for j in arr: # n iterations
print(i, j) # O(n²) because N = M = n
def merge_sorted(arr1, arr2):
"""Merge two sorted arrays into one sorted array"""
result = []
i, j = 0, 0
# Each element is visited at most once
while i < len(arr1) and j < len(arr2):
if arr1[i] <= arr2[j]:
result.append(arr1[i])
i += 1
else:
result.append(arr2[j])
j += 1
# Add remaining elements
result.extend(arr1[i:])
result.extend(arr2[j:])
return result
# Total: each element from arr1 and arr2 processed once = O(N + M)
Key insight: Total iterations ≤ N + M because we increment i or j each time, and they can only reach N and M respectively.
def sum_matrix(matrix):
"""Matrix with N rows and M columns"""
total = 0
for row in matrix: # N rows
for val in row: # M columns per row
total += val
return total # O(N × M) or O(rows × cols)
For a square matrix where N = M, this is O(n²). For a 100×3 matrix, it's O(100 × 3) = O(300) = O(N × M).
def check_windows(arr, k):
"""Check all windows of size k in array of size n"""
for i in range(len(arr) - k + 1): # N - K + 1 ≈ N windows
for j in range(k): # K elements per window
print(arr[i + j])
# O(N × K) total
# If K is a constant (like 3), this simplifies to O(N)
# If K could be as large as N, worst case is O(N²)
Time complexity analysis is fundamentally about counting operations. This connects directly to what we learned in sequences and series:
def triangular_loop(n):
for i in range(n):
for j in range(i): # j goes from 0 to i-1
print(i, j)
# Iterations: 0 + 1 + 2 + ... + (n-1) = n(n-1)/2 = O(n²)
Even though the inner loop varies, the total is the sum of 0 + 1 + 2 + ... + (n-1), which equals n(n-1)/2. Using the formula for arithmetic series (covered in Sequences and Series), this is O(n²).
Recursion requires special analysis. The key is to count how many times the function is called and how much work each call does.
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)
# Calls: factorial(n) → factorial(n-1) → ... → factorial(1)
# That's n calls, each doing O(1) work
# Total: O(n)
Recurrence: T(n) = T(n-1) + O(1), which solves to T(n) = O(n)
def fib(n):
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
# Each call spawns TWO more calls!
# Call tree grows exponentially: 1 → 2 → 4 → 8 → ...
# Total calls ≈ 2ⁿ, so O(2ⁿ)
Recurrence: T(n) = T(n-1) + T(n-2) + O(1). The solution is O(φⁿ) where φ ≈ 1.618 (golden ratio), but we approximate as O(2ⁿ).
Why this is terrible: For n=50, this makes about 10¹⁵ calls. Dynamic programming fixes this to O(n)!
def binary_search_recursive(arr, target, lo, hi):
if lo > hi:
return -1
mid = (lo + hi) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
return binary_search_recursive(arr, target, mid + 1, hi)
else:
return binary_search_recursive(arr, target, lo, mid - 1)
# Each call halves the search space
# n → n/2 → n/4 → ... → 1 = log₂(n) calls
# O(log n)
Recurrence: T(n) = T(n/2) + O(1), which solves to T(n) = O(log n)
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid]) # T(n/2)
right = merge_sort(arr[mid:]) # T(n/2)
return merge(left, right) # O(n) to merge
# Recurrence: T(n) = 2T(n/2) + O(n)
# By Master Theorem: O(n log n)
Visual intuition: We split the array log(n) times (levels of recursion). At each level, we do O(n) total work merging. So: O(n) × O(log n) = O(n log n).
| Pattern | Recurrence | Complexity | Example |
|---|---|---|---|
| Single call, subtract 1 | T(n) = T(n-1) + O(1) | O(n) | Factorial, linear recursion |
| Single call, divide by 2 | T(n) = T(n/2) + O(1) | O(log n) | Binary search |
| Two calls, divide by 2 | T(n) = 2T(n/2) + O(1) | O(n) | Tree traversal (no extra work) |
| Two calls, divide by 2, linear work | T(n) = 2T(n/2) + O(n) | O(n log n) | Merge sort |
| Two calls, subtract 1 | T(n) = 2T(n-1) + O(1) | O(2ⁿ) | Towers of Hanoi |
| Fibonacci-style | T(n) = T(n-1) + T(n-2) | O(2ⁿ) | Naive Fibonacci |
Recursive time complexity is fundamentally about geometric series:
Understanding series formulas lets you analyze these patterns mathematically.
Recursive functions also use space for the call stack. Each recursive call adds a frame to the stack.
Big-O gives the upper bound (worst case). There are two related notations:
People often say "this algorithm is O(n)" when they mean Θ(n). Big-O is an upper bound, so it's technically true that linear search is O(n²) (it's never worse than quadratic), but that's not helpful. In practice, use Big-O for the tightest upper bound you can give.
Big-O is the language engineers use to discuss performance. When someone says "binary search is O(log n)," they're telling you it scales incredibly well. When your database query is O(n²), you know it will crawl on large datasets. Every technical interview expects you to analyze the time and space complexity of your solutions. Understanding Big-O is the difference between writing code that works on 100 items and code that works on 100 million items.
A recurrence relation defines a sequence where each term depends on previous terms. In CS, recurrences describe the running time of recursive algorithms.
Every recursive function has a recurrence relation hiding inside it. The recurrence literally says: "The time to solve a problem of size n equals the time to solve smaller pieces, plus the work to combine them."
If you can write a recursive function, you can write its recurrence. If you can solve the recurrence, you know the function's Big-O.
| Part | Meaning | In the Code |
|---|---|---|
| 2T(...) | Make 2 recursive calls | merge_sort(left); merge_sort(right); |
| T(n/2) | Each call gets half the input | Splitting array into two halves |
| + n | Do n work outside the recursion | Merging two sorted halves takes O(n) |
def merge_sort(arr):
if len(arr) <= 1: # T(1) = 1 (base case)
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid]) # T(n/2) (first recursive call)
right = merge_sort(arr[mid:]) # T(n/2) (second recursive call)
return merge(left, right) # + O(n) (combining step)
# Total: T(n) = 2T(n/2) + n
| Code Pattern | Recurrence | Solution |
|---|---|---|
f(n-1) + O(1) work |
T(n) = T(n-1) + 1 | O(n) |
f(n-1) + O(n) work |
T(n) = T(n-1) + n | O(n²) |
f(n/2) + O(1) work |
T(n) = T(n/2) + 1 | O(log n) |
2 × f(n/2) + O(n) work |
T(n) = 2T(n/2) + n | O(n log n) |
f(n-1) + f(n-2) |
T(n) = T(n-1) + T(n-2) | O(2n) |
2 × f(n-1) |
T(n) = 2T(n-1) | O(2n) |
The simplest method: keep substituting until you see a pattern, then find the closed form.
This is like factorial's call structure: one recursive call, O(1) work each time.
This is like selection sort: one recursive call, but O(n) work each time (shrinking).
See how the recurrence naturally produced an arithmetic series? That's the summation 1+2+...+n from the sequences page!
This is the most important one. Let's unroll it carefully:
Intuition: There are log2(n) levels of recursion. At each level, the total work across all calls is n. So total work = n × log n.
Intuition: Each call halves the problem and does O(1) work. How many halvings until size 1? log2(n).
For recurrences of the form T(n) = aT(n/b) + O(nd), the Master Theorem gives the answer directly. No expansion needed -- just identify a, b, d and compare:
| Condition | Result | Intuition |
|---|---|---|
| logb(a) < d | T(n) = O(nd) | Combine work dominates. Recursion is "cheap" relative to the merge step. |
| logb(a) = d | T(n) = O(nd log n) | Work is evenly spread. Every level contributes equally. |
| logb(a) > d | T(n) = O(nlogb(a)) | Splitting dominates. The sheer number of subproblems is the bottleneck. |
Merge sort: T(n) = 2T(n/2) + O(n). a=2, b=2, d=1.
log2(2) = 1 = d → Case 2 → T(n) = O(n1 log n) = O(n log n)
Binary search: T(n) = T(n/2) + O(1). a=1, b=2, d=0.
log2(1) = 0 = d → Case 2 → T(n) = O(n0 log n) = O(log n)
Karatsuba multiplication: T(n) = 3T(n/2) + O(n). a=3, b=2, d=1.
log2(3) ≈ 1.58 > 1 = d → Case 3 → T(n) = O(n1.58)
Strassen's matrix multiply: T(n) = 7T(n/2) + O(n²). a=7, b=2, d=2.
log2(7) ≈ 2.81 > 2 = d → Case 3 → T(n) = O(n2.81)
| Algorithm | Recurrence | a, b, d | Case | Result |
|---|---|---|---|---|
| Binary Search | T(n/2) + 1 | 1, 2, 0 | log=d | O(log n) |
| Merge Sort | 2T(n/2) + n | 2, 2, 1 | log=d | O(n log n) |
| QuickSelect avg | T(n/2) + n | 1, 2, 1 | log<d | O(n) |
| Balanced BST search | T(n/2) + 1 | 1, 2, 0 | log=d | O(log n) |
The Master Theorem only works for T(n) = aT(n/b) + O(nd) -- divide-and-conquer recurrences where the problem size is divided by b.
It does NOT work for:
For these, use the expansion method or the recursion tree method.
Every recursive algorithm has a running time described by a recurrence relation. Merge sort, quicksort, binary search, tree traversals, divide-and-conquer -- all analyzed using recurrences. The Master Theorem lets you determine the Big-O of most divide-and-conquer algorithms by inspection. Understanding recurrences is understanding how recursive code scales.
Test your understanding of discrete math. Click on an answer to check it.