Why This Is the Most Important Math for CS

If you only study one branch of math for computer science, make it discrete math. Every if statement is logic. Every database query is set theory. Every algorithm analysis is Big-O. Every recursive function needs induction to prove correct. This page covers the mathematical foundations you will use every single day as a programmer.

Table of Contents

1. Logic and Propositions 2. Sets 3. Functions and Relations 4. Proof Techniques 5. Counting and Combinatorics 6. Graph Theory 7. Number Theory and Modular Arithmetic 8. Boolean Algebra 9. Big-O Notation 10. Recurrence Relations 11. Practice Quiz

1. Logic and Propositions

Logic is the foundation of all mathematical reasoning -- and all programming. A proposition is a declarative statement that is either true or false, never both, never neither.

You Already Know This

If you've written an if statement, you already do logic. Discrete math just gives you the formal vocabulary for what you're already doing:

# This code IS logic:
if user.is_admin and (not user.is_banned or user.has_override):
    grant_access()

# In math notation: A ∧ (¬B ∨ C)
# Same thing, just different symbols

Every if/elif/else chain is an implication. Every while condition is a proposition evaluated each iteration. Boolean algebra IS programming logic -- the math just lets you prove things about it.

Examples of Propositions
  • "5 is greater than 3" -- this is true
  • "The Earth is flat" -- this is false
  • "2 + 2 = 4" -- this is true

Not propositions: "What time is it?" (question), "Close the door" (command), "x + 5 = 10" (depends on x -- this is a predicate, not a proposition).

Logical Operators

We combine propositions using logical operators (also called connectives). If you have written an if statement, you already know these.

Operator Symbol Python Meaning
AND and True only when both are true
OR or True when at least one is true
NOT ¬ not Flips true to false, false to true
XOR ^ True when exactly one is true
Implication -- If A then B
Biconditional == A if and only if B

Truth Tables

A truth table shows every possible combination of inputs and the resulting output. Here are the truth tables for the core operators.

AND (∧) -- both must be true

ABA ∧ B
TTT
TFF
FTF
FFF

OR (∨) -- at least one must be true

ABA ∨ B
TTT
TFT
FTT
FFF

NOT (¬) -- flip the value

A¬A
TF
FT

XOR (⊕) -- exactly one must be true

ABA ⊕ B
TTF
TFT
FTT
FFF

Implication (→) and Biconditional (↔)

The implication A → B means "if A, then B." It is only false when A is true but B is false. Think of it as a promise: "If it rains, I will bring an umbrella." The promise is only broken if it rains and you don't bring one.

Implication (→)

ABA → B
TTT
TFF
FTT
FFT
Tricky Part

When A is false, the implication A → B is always true regardless of B. This feels counterintuitive but makes logical sense: a promise made under a false premise cannot be broken. "If pigs fly, then I will give you a million dollars" -- pigs don't fly, so you can't claim I broke my promise.

The biconditional A ↔ B means "A if and only if B" -- both must have the same truth value.

Biconditional (↔)

ABA ↔ B
TTT
TFF
FTF
FFT

Logical Equivalences

Two expressions are logically equivalent if they have the same truth value for every possible input. The most famous equivalences are De Morgan's Laws:

¬(A ∧ B) = ¬A ∨ ¬B
¬(A ∨ B) = ¬A ∧ ¬B

In English: "not (A and B)" is the same as "not A or not B." And "not (A or B)" is the same as "not A and not B."

De Morgan's in Python
# These two are equivalent:
if not (is_admin and is_logged_in):
    deny_access()

# Same as:
if (not is_admin) or (not is_logged_in):
    deny_access()

Other important equivalences:

Why CS Cares -- Logic

Every if statement, every while loop condition, and every boolean expression in your code is propositional logic. De Morgan's Laws help you simplify complex conditions. Understanding implication helps you write correct assertions and contracts. Logic gates (AND, OR, NOT) are the physical building blocks of every CPU. Formal logic is also the foundation of SQL's WHERE clauses, type systems, and automated theorem proving.

Propositional Logic Laws (Axioms):

Identity: A ∧ T = A,   A ∨ F = A
Domination: A ∧ F = F,   A ∨ T = T
Idempotent: A ∧ A = A,   A ∨ A = A
Double Negation: ¬(¬A) = A
Complement: A ∧ ¬A = F,   A ∨ ¬A = T
Commutative: A ∧ B = B ∧ A,   A ∨ B = B ∨ A
Associative: (A ∧ B) ∧ C = A ∧ (B ∧ C)
Distributive: A ∧ (B ∨ C) = (A ∧ B) ∨ (A ∧ C)
De Morgan's: ¬(A ∧ B) = ¬A ∨ ¬B,   ¬(A ∨ B) = ¬A ∧ ¬B

2. Sets

A set is an unordered collection of distinct elements. Sets are one of the most fundamental concepts in all of mathematics -- and they show up everywhere in programming.

Set Notation

Sets are written with curly braces. Order does not matter, and duplicates are ignored.

A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
C = {apple, banana, cherry}

We write x ∈ A to say "x is an element of A," and x ∉ A to say "x is not in A."

Example

If A = {1, 2, 3}, then 2 ∈ A is true and 7 ∈ A is false.

Set Operations

Operation Symbol Meaning Python
Union Everything in A or B (or both) A | B
Intersection Only elements in both A and B A & B
Difference \ Elements in A but not in B A - B
Complement Ac Everything not in A (relative to universe) U - A
Example -- Set Operations

Given A = {1, 2, 3, 4, 5} and B = {3, 4, 5, 6, 7}:

  • A ∪ B = {1, 2, 3, 4, 5, 6, 7} -- everything from both
  • A ∩ B = {3, 4, 5} -- only the overlap
  • A \ B = {1, 2} -- in A but not B
  • B \ A = {6, 7} -- in B but not A

Subsets and Special Sets

A ⊆ B means "A is a subset of B" -- every element of A is also in B. A ⊂ B means "A is a proper subset" -- A is a subset but A ≠ B.

|P(S)| = 2|S|

Venn Diagrams

A Venn diagram visualizes sets as overlapping circles. Imagine two circles, one for set A and one for set B, that partially overlap. The overlapping region represents A ∩ B (intersection). The entire area covered by both circles is A ∪ B (union). The area in A's circle but not in B's is A \ B (difference). This visual tool makes set relationships immediately intuitive.

Sets in Python
A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}

print(A | B)   # Union:        {1, 2, 3, 4, 5, 6, 7}
print(A & B)   # Intersection: {3, 4, 5}
print(A - B)   # Difference:   {1, 2}
print(A ^ B)   # Symmetric difference (XOR): {1, 2, 6, 7}

print(3 in A)  # Membership: True
print(len(A))  # Cardinality: 5
Why CS Cares -- Sets

Sets are everywhere in computing. Database queries are set operations (SQL's UNION, INTERSECT, EXCEPT map directly to ∪, ∩, \). Type systems model types as sets of values. Python has a built-in set type with O(1) membership testing. Hash sets and hash maps are the most-used data structures in real-world code. Even IP address ranges and permission systems are modeled as sets.

Set Identity Laws:

Identity: A ∪ ∅ = A,   A ∩ U = A
Domination: A ∪ U = U,   A ∩ ∅ = ∅
Complement: A ∪ A' = U,   A ∩ A' = ∅
De Morgan's: (A ∪ B)' = A' ∩ B',   (A ∩ B)' = A' ∪ B'
Distributive: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

3. Functions and Relations

Relations

A relation from set A to set B is any subset of the Cartesian product A × B (all possible ordered pairs). A relation describes how elements of A connect to elements of B.

Example

Let A = {1, 2, 3} and B = {a, b}. The Cartesian product A × B is:

{(1,a), (1,b), (2,a), (2,b), (3,a), (3,b)}

A relation R might be: R = {(1,a), (2,b), (3,a)} -- this picks out specific connections.

Relations can have special properties:

Functions

A function f: A → B is a special relation where every element of A maps to exactly one element of B. A is the domain, B is the codomain.

Functions come in three important flavors:

Type Definition Intuition
Injective (one-to-one) Different inputs always give different outputs. If f(a) = f(b), then a = b. No two inputs share an output.
Surjective (onto) Every element of B is mapped to by some element of A. Every output is "hit" by some input.
Bijective (one-to-one and onto) Both injective and surjective. Perfect pairing -- every input maps to a unique output, and every output is used.
Example

f(x) = 2x from integers to integers:

  • Injective? Yes -- different inputs give different outputs (if 2a = 2b then a = b).
  • Surjective? No -- odd numbers like 3 are never output by f.
  • Bijective? No -- not surjective.

Composition of Functions

If you have f: A → B and g: B → C, the composition g ∘ f (read "g of f") is defined as (g ∘ f)(x) = g(f(x)). Apply f first, then g.

f(x) = x + 1,   g(x) = x²
(g ∘ f)(x) = g(f(x)) = g(x + 1) = (x + 1)²
(f ∘ g)(x) = f(g(x)) = f(x²) = x² + 1
Order Matters

Composition is not commutative: g ∘ f ≠ f ∘ g in general. In the example above, (x + 1)² ≠ x² + 1. This is like how in programming, the order of function calls matters: g(f(x)) is different from f(g(x)).

Inverse Functions

If f: A → B is bijective, then there exists an inverse function f-1: B → A such that f-1(f(x)) = x and f(f-1(y)) = y. The inverse "undoes" the original function.

Example

f(x) = 2x + 3. To find f-1, solve y = 2x + 3 for x:

x = (y - 3) / 2, so f-1(y) = (y - 3) / 2

Check: f-1(f(5)) = f-1(13) = (13 - 3)/2 = 5. It works.

Why CS Cares -- Functions

Bijections are essential in cryptography -- encryption must be reversible (bijective) for decryption to work. Hash functions are surjective (many inputs can map to the same hash). Function composition is the basis of functional programming and middleware pipelines. Understanding injectivity helps you reason about whether a mapping can be reversed (critical for encoding/decoding).

4. Proof Techniques

Proofs are how mathematicians verify that something is always true, not just sometimes. As a programmer, proof techniques help you reason about correctness -- proving your algorithm works for all inputs, not just the ones you tested.

Direct Proof

Start with what you know (premises), apply logical steps, arrive at the conclusion. The most straightforward approach.

Example -- Direct Proof

Claim: If n is even, then n² is even.

Proof:

  1. Assume n is even. By definition, n = 2k for some integer k.
  2. Then n² = (2k)² = 4k² = 2(2k²).
  3. Since 2k² is an integer, n² = 2(integer), so n² is even. □

Proof by Contradiction

Assume the opposite of what you want to prove, then show that leads to something impossible (a contradiction). Since the assumption caused a contradiction, it must be false, so the original statement must be true.

Example -- Proof by Contradiction

Claim: √2 is irrational.

Proof:

  1. Assume √2 is rational. Then √2 = a/b where a, b are integers with no common factors.
  2. Squaring both sides: 2 = a²/b², so a² = 2b².
  3. This means a² is even, so a must be even. Write a = 2c.
  4. Then (2c)² = 2b², so 4c² = 2b², so b² = 2c².
  5. This means b² is even, so b is even.
  6. Contradiction: Both a and b are even, so they share factor 2. But we said they have no common factors.
  7. Therefore √2 is irrational. □

Proof by Induction

This is the most important proof technique for computer science. Induction proves that a statement holds for all natural numbers by proving two things:

  1. Base case: Show the statement is true for n = 1 (or n = 0).
  2. Inductive step: Assume the statement is true for some arbitrary n = k (the inductive hypothesis), then prove it is true for n = k + 1.

Think of it like dominoes: if the first one falls (base case) and each domino knocks over the next (inductive step), then all dominoes fall.

Example -- Proof by Induction

Claim: The sum 1 + 2 + 3 + ... + n = n(n+1)/2 for all n ≥ 1.

Base case (n = 1):

Left side: 1. Right side: 1(1+1)/2 = 1. They match. Base case holds.

Inductive hypothesis: Assume the formula holds for n = k:

1 + 2 + 3 + ... + k = k(k+1)/2

Inductive step: Prove it holds for n = k + 1:

1 + 2 + ... + k + (k+1)
= k(k+1)/2 + (k+1)       // using the inductive hypothesis
= k(k+1)/2 + 2(k+1)/2
= (k+1)(k+2)/2
= (k+1)((k+1)+1)/2     // this is the formula with n = k+1

The formula holds for k + 1. By induction, it holds for all n ≥ 1. □

Proof by Contrapositive

Instead of proving "if A then B," prove the logically equivalent statement "if not B then not A." Sometimes this direction is easier.

Example

Claim: If n² is even, then n is even.

Contrapositive: If n is odd, then n² is odd.

Proof: If n is odd, then n = 2k + 1. So n² = (2k+1)² = 4k² + 4k + 1 = 2(2k² + 2k) + 1, which is odd. □

Why CS Cares -- Proofs

Induction is how you prove recursive algorithms are correct. Every time you write a recursive function, you are implicitly doing induction: the base case is your recursion base case, and the recursive call assumes the function works for smaller inputs (inductive hypothesis). Loop invariants use induction to prove loops are correct. Proof by contradiction is used to prove impossibility results, like "you can't sort faster than O(n log n) with comparisons."

5. Counting and Combinatorics

Combinatorics is the mathematics of counting. How many ways can something happen? This is essential for analyzing algorithms, understanding probability, and solving optimization problems.

The Multiplication Principle

If you have m ways to do one thing and n ways to do another, there are m × n ways to do both.

Example

A restaurant has 3 appetizers and 5 entrees. How many different meals (one appetizer + one entree) can you order?

3 × 5 = 15 meals

In CS terms: A 4-bit number has 2 × 2 × 2 × 2 = 24 = 16 possible values. An 8-character password using 26 lowercase letters has 268 = 208 billion possibilities.

Factorials

The factorial of n (written n!) is the product of all positive integers up to n:

n! = n × (n-1) × (n-2) × ... × 2 × 1

5! = 5 × 4 × 3 × 2 × 1 = 120
0! = 1   (by definition)

Factorials grow incredibly fast: 10! = 3,628,800 and 20! = 2,432,902,008,176,640,000.

Permutations -- Order Matters

A permutation is an arrangement where order matters. How many ways can you arrange r items chosen from n items?

P(n, r) = n! / (n - r)!
Example

How many ways can 3 students win gold, silver, and bronze medals from a class of 10?

P(10, 3) = 10! / 7! = 10 × 9 × 8 = 720 ways

Order matters because gold-silver-bronze is a different outcome than bronze-silver-gold.

Combinations -- Order Doesn't Matter

A combination is a selection where order does NOT matter. How many ways can you choose r items from n items?

C(n, r) = n! / (r! × (n - r)!)

Also written as (nr) and read "n choose r."

Example

How many ways can you pick a committee of 3 people from 10?

C(10, 3) = 10! / (3! × 7!) = 720 / 6 = 120 ways

Order doesn't matter because {Alice, Bob, Carol} is the same committee as {Carol, Alice, Bob}.

Permutations vs. Combinations

Ask yourself: "Does the order matter?"

  • Arranging books on a shelf → Permutation (order matters)
  • Choosing a team from a group → Combination (order doesn't matter)
  • Creating a password → Permutation ("abc" ≠ "cba")
  • Dealing a poker hand → Combination (same cards regardless of deal order)

The Pigeonhole Principle

If you put n + 1 items into n containers, at least one container must hold more than one item. It sounds obvious, but it has powerful consequences.

Examples
  • In any group of 367 people, at least two share a birthday (366 possible days + 1).
  • If you have 5 pairs of socks in a dark room and grab 6, you must have at least one matching pair.
  • CS application: If a hash function maps to 232 values, then with 232 + 1 inputs, at least two must collide (same hash). This is why hash collisions are inevitable.
Why CS Cares -- Counting

Algorithm analysis relies on counting: "How many operations does this algorithm perform?" Combinatorics tells you the size of a search space (e.g., how many possible passwords exist), which determines whether brute force is feasible. Probability is built on counting (probability = favorable outcomes / total outcomes). The pigeonhole principle proves hash collisions are unavoidable, that lossless compression can't compress all files, and many other impossibility results.

Stars and Bars (Distributing Identical Items)

How many ways can you distribute n identical items into k distinct boxes?

Number of ways = C(n + k - 1, k - 1) = C(n + k - 1, n)
Example

How many ways can you put 10 identical balls into 4 different boxes?

C(10 + 4 - 1, 4 - 1) = C(13, 3) = 286 ways

Visualization: Imagine 10 balls in a row: ○○○○○○○○○○. Adding 3 dividers creates 4 groups. We're choosing where to place 3 dividers among 13 positions.

The Inclusion-Exclusion Principle

When counting elements in overlapping sets, use inclusion-exclusion to avoid double-counting:

|A ∪ B| = |A| + |B| - |A ∩ B|

|A ∪ B ∪ C| = |A| + |B| + |C| - |A ∩ B| - |A ∩ C| - |B ∩ C| + |A ∩ B ∩ C|
Example

In a class of 100 students, 60 take Math, 50 take Physics, and 20 take both. How many take at least one?

|Math ∪ Physics| = 60 + 50 - 20 = 90 students

The -20 removes those who were counted twice (in both classes).

Permutations with Repetition

How many arrangements of n objects where some are identical?

n! / (n₁! × n₂! × ... × nₖ!)

Where n₁, n₂, ..., nₖ are the counts of each identical type
Example

How many ways can you arrange the letters in "MISSISSIPPI"?

Total: 11 letters. M: 1, I: 4, S: 4, P: 2

11! / (1! × 4! × 4! × 2!) = 39,916,800 / (1 × 24 × 24 × 2) = 34,650 arrangements

Combinations with Repetition

Choosing r items from n types, where you can choose the same type multiple times:

C(n + r - 1, r)    (same as stars and bars)
Example

An ice cream shop has 5 flavors. How many ways can you choose 3 scoops (repeats allowed)?

C(5 + 3 - 1, 3) = C(7, 3) = 35 ways

This includes choices like (chocolate, chocolate, vanilla) or (strawberry, strawberry, strawberry).

Derangements

A derangement is a permutation where no element appears in its original position.

D(n) = n! × (1 - 1/1! + 1/2! - 1/3! + ... + (-1)ⁿ/n!)
Example

5 people put their hats in a pile and randomly pick one. How many ways can NO ONE get their own hat?

D(5) = 5! × (1 - 1 + 1/2 - 1/6 + 1/24 - 1/120)

= 120 × (0 + 0.5 - 0.167 + 0.042 - 0.008)

= 120 × 0.367 = 44 derangements

Fun fact: As n gets large, approximately 1/e (≈ 36.8%) of all permutations are derangements.

Binomial Theorem

The binomial theorem expands (a + b)ⁿ:

(a + b)ⁿ = Σ C(n, k) × aⁿ⁻ᵏ × bᵏ    for k = 0 to n
Example: Expand (x + y)⁴

= C(4,0)x⁴ + C(4,1)x³y + C(4,2)x²y² + C(4,3)xy³ + C(4,4)y⁴

= x⁴ + 4x³y + 6x²y² + 4xy³ + y⁴

Pascal's Triangle: The coefficients (1, 4, 6, 4, 1) form row 4 of Pascal's triangle!

Combinatorial Identities

IdentityMeaning
C(n, k) = C(n, n-k)Choosing k to include = choosing n-k to exclude
C(n, 0) = C(n, n) = 1One way to choose nothing or everything
C(n, k) = C(n-1, k-1) + C(n-1, k)Pascal's Triangle rule
Σ C(n, k) = 2ⁿTotal subsets of n items
CS Application: Subsets and Power Sets

A set of n elements has exactly 2ⁿ subsets (including empty set and the full set). This is why iterating over all subsets of n items takes O(2ⁿ) time -- exponential! For n=30, that's over a billion subsets. Understanding this helps you recognize when a brute-force approach is infeasible.

6. Graph Theory

A graph G = (V, E) consists of vertices (nodes) V and edges (connections) E. Graphs are one of the most important data structures in computer science -- they model networks, relationships, maps, dependencies, and much more.

Types of Graphs

Type Description Example
Undirected Edges have no direction. If A connects to B, then B connects to A. Facebook friendships
Directed Edges have direction. A → B does not mean B → A. Twitter follows, web links
Weighted Edges have values (costs, distances, capacities). Road maps with distances
Unweighted All edges are equal -- no values attached. Social connections

Graph Representations

There are two main ways to store a graph in code:

Adjacency Matrix -- a 2D array where matrix[i][j] = 1 if there is an edge from vertex i to vertex j.

Adjacency Matrix
# Graph: A--B, A--C, B--C
#     A  B  C
# A [ 0, 1, 1 ]
# B [ 1, 0, 1 ]
# C [ 1, 1, 0 ]

matrix = [
    [0, 1, 1],
    [1, 0, 1],
    [1, 1, 0]
]

Space: O(V²). Edge lookup: O(1). Best for dense graphs.

Adjacency List -- each vertex stores a list of its neighbors.

Adjacency List
# Same graph: A--B, A--C, B--C
graph = {
    'A': ['B', 'C'],
    'B': ['A', 'C'],
    'C': ['A', 'B']
}

Space: O(V + E). Edge lookup: O(degree). Best for sparse graphs (most real-world graphs).

Key Graph Concepts

Graph Traversals

Two fundamental ways to explore a graph:

Algorithm Strategy Data Structure Use Case
BFS (Breadth-First Search) Explore all neighbors first, then their neighbors Queue Shortest path (unweighted), level-order
DFS (Depth-First Search) Go as deep as possible, then backtrack Stack / recursion Cycle detection, topological sort, maze solving
BFS and DFS in Python
from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])
    visited.add(start)
    while queue:
        vertex = queue.popleft()
        print(vertex, end=' ')
        for neighbor in graph[vertex]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)

def dfs(graph, start, visited=None):
    if visited is None:
        visited = set()
    visited.add(start)
    print(start, end=' ')
    for neighbor in graph[start]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited)

Euler Paths and Circuits

An Euler path visits every edge exactly once. An Euler circuit is an Euler path that starts and ends at the same vertex.

The Konigsberg Bridge Problem

This is the problem that founded graph theory. The city of Konigsberg had 7 bridges connecting 4 land masses. Euler proved it was impossible to walk across every bridge exactly once because more than two vertices had odd degree.

Why CS Cares -- Graphs

Graphs are arguably the most important data structure in CS. The internet is a graph (pages are vertices, links are edges). Social networks are graphs. GPS navigation uses weighted graphs with shortest-path algorithms (Dijkstra's). Package managers (npm, pip) use directed acyclic graphs for dependency resolution. Compilers build abstract syntax trees and control flow graphs. Databases use B-trees (tree-structured graphs) for indexing. If you learn one data structure deeply, make it graphs.

Handshaking Lemma:

In any undirected graph: Σ deg(v) = 2|E|

The sum of all vertex degrees equals twice the number of edges. Every edge contributes 1 to each of its endpoints. Consequence: the number of vertices with odd degree is always even.

7. Number Theory and Modular Arithmetic

Number theory studies the properties of integers. It might seem abstract, but it is the mathematical backbone of cryptography, hashing, and error detection.

Divisibility

We say a divides b (written a | b) if b = a × k for some integer k. For example, 3 | 12 because 12 = 3 × 4.

Prime Numbers and Factorization

A prime number is greater than 1 and divisible only by 1 and itself. The first primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29...

The Fundamental Theorem of Arithmetic says every integer greater than 1 can be written as a unique product of primes:

60 = 2² × 3 × 5
84 = 2² × 3 × 7
100 = 2² × 5²

GCD and LCM

The Greatest Common Divisor GCD(a, b) is the largest number that divides both a and b. The Least Common Multiple LCM(a, b) is the smallest number that both a and b divide.

GCD(12, 18) = 6
LCM(12, 18) = 36

Useful identity: GCD(a, b) × LCM(a, b) = a × b

The Euclidean Algorithm

An efficient method to compute GCD using repeated division. This is one of the oldest algorithms known (over 2300 years old) and is still used today.

GCD(a, b) = GCD(b, a mod b)     until b = 0
Example -- Euclidean Algorithm

Find GCD(252, 105):

  • GCD(252, 105): 252 mod 105 = 42, so GCD(105, 42)
  • GCD(105, 42): 105 mod 42 = 21, so GCD(42, 21)
  • GCD(42, 21): 42 mod 21 = 0, so GCD = 21
def gcd(a, b):
    while b:
        a, b = b, a % b
    return a

print(gcd(252, 105))  # 21

Modular Arithmetic

a mod n gives the remainder when a is divided by n. We write a ≡ b (mod n) to say "a and b have the same remainder when divided by n."

17 mod 5 = 2    (17 = 3×5 + 2)
23 mod 7 = 2    (23 = 3×7 + 2)
100 mod 3 = 1   (100 = 33×3 + 1)

Properties of Modular Arithmetic

These properties allow you to compute modular arithmetic on very large numbers without overflow:

(a + b) mod n = ((a mod n) + (b mod n)) mod n
(a × b) mod n = ((a mod n) × (b mod n)) mod n
(a - b) mod n = ((a mod n) - (b mod n) + n) mod n
Example -- Why This Matters

Compute (987654321 × 123456789) mod 1000000007 without overflow:

# Instead of computing the huge product first:
a = 987654321 % 1000000007
b = 123456789 % 1000000007
result = (a * b) % 1000000007
print(result)  # Safe from overflow in languages with fixed-width integers
Why CS Cares -- Number Theory

RSA encryption is built entirely on number theory: large prime numbers, modular exponentiation, and the difficulty of factoring. Hash functions use modular arithmetic extensively (e.g., hash(key) = key mod table_size). Checksums and error-detecting codes use modular arithmetic. Random number generators use modular arithmetic (linear congruential generators). Competitive programming problems frequently require modular arithmetic to handle large numbers (mod 109 + 7 is the most common modulus).

8. Boolean Algebra

Boolean algebra is the mathematics of true/false values (or equivalently, 1/0). It is the direct mathematical foundation of digital electronics and every computer ever built.

Boolean Variables and Operations

A boolean variable can only be 0 (false) or 1 (true). The three fundamental operations are:

Operation Notation Logic Gate Rule
AND A · B or AB AND gate 1 only if both inputs are 1
OR A + B OR gate 1 if at least one input is 1
NOT A̅ or ¬A NOT gate (inverter) Flips 0 to 1, 1 to 0

Boolean Laws and Simplification

Boolean algebra follows laws similar to regular algebra, with some unique ones:

LawAND FormOR Form
IdentityA · 1 = AA + 0 = A
NullA · 0 = 0A + 1 = 1
IdempotentA · A = AA + A = A
ComplementA · ¬A = 0A + ¬A = 1
CommutativeA · B = B · AA + B = B + A
Associative(AB)C = A(BC)(A+B)+C = A+(B+C)
DistributiveA(B+C) = AB + ACA+(BC) = (A+B)(A+C)
De Morgan's¬(AB) = ¬A + ¬B¬(A+B) = ¬A · ¬B
AbsorptionA(A+B) = AA + AB = A
Example -- Boolean Simplification

Simplify: F = AB + A¬B + ¬AB

  • F = A(B + ¬B) + ¬AB    // Factor out A
  • F = A(1) + ¬AB    // Complement law: B + ¬B = 1
  • F = A + ¬AB    // Identity law: A · 1 = A
  • F = A + B    // Absorption: A + ¬AB = A + B

We reduced a complex expression to A + B (just an OR gate).

Logic Gates and Circuits

In hardware, boolean operations are implemented as logic gates. Every CPU is built from billions of these gates.

Half Adder -- Adding Two Bits

A half adder adds two single-bit numbers. It needs two outputs: the sum and the carry.

Sum = A ⊕ B    (XOR)
Carry = A · B    (AND)
ABSumCarry
0000
0110
1010
1101

This is literally how your CPU adds numbers -- chains of logic gates.

Why CS Cares -- Boolean Algebra

Boolean algebra is the math of hardware design. Every CPU, every memory chip, every digital circuit is built from boolean logic gates. Compiler optimizations use boolean simplification to reduce the number of instructions. Database query optimizers simplify WHERE clauses using boolean algebra. Understanding boolean algebra helps you write simpler, faster conditional logic in your code. The absorption law alone (A + AB = A) can simplify many complex if statements.

9. Big-O Notation

Big-O notation describes the upper bound on the growth rate of an algorithm's time (or space) as the input size grows. It answers: "How does this algorithm scale?"

What Big-O Means

We say f(n) = O(g(n)) if there exist constants c and n0 such that f(n) ≤ c · g(n) for all n ≥ n0. In plain English: for large enough inputs, f(n) grows no faster than g(n), up to a constant factor.

f(n) = O(g(n)) means: f grows at most as fast as g

Big-O drops constants and lower-order terms because we care about scalability, not exact counts:

Common Complexities

Big-O Name Example n=1000
O(1) Constant Array index lookup, hash map get 1
O(log n) Logarithmic Binary search ~10
O(n) Linear Linear search, single loop 1,000
O(n log n) Linearithmic Merge sort, heap sort ~10,000
O(n²) Quadratic Bubble sort, nested loops 1,000,000
O(2n) Exponential Brute-force subsets ~10301
O(n!) Factorial Brute-force permutations ~102567

How to Determine Big-O

Three rules of thumb:

  1. Drop constants: O(3n) = O(n)
  2. Keep the dominant term: O(n² + n) = O(n²)
  3. Multiply nested operations: A loop inside a loop = multiply their complexities

Examples with Code

O(1) -- Constant Time
def get_first(arr):
    return arr[0]  # Always one operation, regardless of array size
O(n) -- Linear Time
def find_max(arr):
    max_val = arr[0]
    for num in arr:       # Loops through n elements
        if num > max_val:
            max_val = num
    return max_val       # O(n) -- one loop over the data
O(n²) -- Quadratic Time
def has_duplicate(arr):
    for i in range(len(arr)):         # n iterations
        for j in range(i + 1, len(arr)):  # n iterations (roughly)
            if arr[i] == arr[j]:
                return True
    return False                       # O(n^2) -- nested loop
O(log n) -- Logarithmic Time
def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1           # Eliminate half the data
        else:
            hi = mid - 1           # Eliminate half the data
    return -1                       # O(log n) -- halving each step

Why log n? If you halve n repeatedly: n → n/2 → n/4 → ... → 1. How many halves until you reach 1? That's log₂(n). This is the *inverse* of exponential growth.

O(N × M) vs O(N + M) -- Two Variables!

When you have two inputs of different sizes, you need two variables. This is crucial and often confuses beginners.

O(N × M) -- Nested Loops Over Different Inputs
def all_pairs(arr1, arr2):
    """Find all pairs from two arrays"""
    pairs = []
    for a in arr1:          # N iterations (arr1 has N elements)
        for b in arr2:      # M iterations (arr2 has M elements)
            pairs.append((a, b))
    return pairs             # O(N × M) -- every element of arr1 pairs with every element of arr2

# If arr1 has 100 elements and arr2 has 50 elements: 100 × 50 = 5,000 pairs
O(N + M) -- Sequential Loops Over Different Inputs
def process_both(arr1, arr2):
    """Process both arrays separately"""
    total = 0
    for a in arr1:           # N iterations
        total += a
    for b in arr2:           # M iterations (after arr1 loop finishes)
        total += b
    return total              # O(N + M) -- sequential, not nested!

# If arr1 has 100 elements and arr2 has 50 elements: 100 + 50 = 150 operations
Critical Difference: × vs +

Nested loops = multiply. The inner loop runs completely for each iteration of the outer loop.

Sequential loops = add. One finishes, then the other runs.

For N=M=1000: O(N×M) = 1,000,000 operations. O(N+M) = 2,000 operations. That's a 500× difference!

O(N²) vs O(N×M) -- When N = M

If both arrays have the same size n, then O(N×M) = O(n×n) = O(n²). That's why nested loops over the same array are O(n²).

def same_array_pairs(arr):
    for i in arr:       # n iterations
        for j in arr:   # n iterations
            print(i, j)  # O(n²) because N = M = n

Understanding O(N + M) in Real Code

Merging Two Sorted Arrays -- O(N + M)
def merge_sorted(arr1, arr2):
    """Merge two sorted arrays into one sorted array"""
    result = []
    i, j = 0, 0

    # Each element is visited at most once
    while i < len(arr1) and j < len(arr2):
        if arr1[i] <= arr2[j]:
            result.append(arr1[i])
            i += 1
        else:
            result.append(arr2[j])
            j += 1

    # Add remaining elements
    result.extend(arr1[i:])
    result.extend(arr2[j:])
    return result
    # Total: each element from arr1 and arr2 processed once = O(N + M)

Key insight: Total iterations ≤ N + M because we increment i or j each time, and they can only reach N and M respectively.

Analyzing More Complex Patterns

O(N × M) -- Matrix Traversal
def sum_matrix(matrix):
    """Matrix with N rows and M columns"""
    total = 0
    for row in matrix:           # N rows
        for val in row:          # M columns per row
            total += val
    return total                  # O(N × M) or O(rows × cols)

For a square matrix where N = M, this is O(n²). For a 100×3 matrix, it's O(100 × 3) = O(300) = O(N × M).

O(N × K) -- Inner Loop of Fixed Size
def check_windows(arr, k):
    """Check all windows of size k in array of size n"""
    for i in range(len(arr) - k + 1):  # N - K + 1 ≈ N windows
        for j in range(k):              # K elements per window
            print(arr[i + j])
    # O(N × K) total
    # If K is a constant (like 3), this simplifies to O(N)
    # If K could be as large as N, worst case is O(N²)

The Math Behind It: Summing Operations

Time complexity analysis is fundamentally about counting operations. This connects directly to what we learned in sequences and series:

Single loop (1 to n): 1 + 1 + 1 + ... + 1 = n times → O(n)

Nested loops: n × n = n² operations → O(n²)

Sum of 1 to n: 1 + 2 + 3 + ... + n = n(n+1)/2 → O(n²)
Why Triangular Patterns are O(n²)
def triangular_loop(n):
    for i in range(n):
        for j in range(i):  # j goes from 0 to i-1
            print(i, j)
    # Iterations: 0 + 1 + 2 + ... + (n-1) = n(n-1)/2 = O(n²)

Even though the inner loop varies, the total is the sum of 0 + 1 + 2 + ... + (n-1), which equals n(n-1)/2. Using the formula for arithmetic series (covered in Sequences and Series), this is O(n²).

Recursive Time Complexity

Recursion requires special analysis. The key is to count how many times the function is called and how much work each call does.

O(n) Recursion -- Linear
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)
    # Calls: factorial(n) → factorial(n-1) → ... → factorial(1)
    # That's n calls, each doing O(1) work
    # Total: O(n)

Recurrence: T(n) = T(n-1) + O(1), which solves to T(n) = O(n)

O(2ⁿ) Recursion -- Exponential (Naive Fibonacci)
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
    # Each call spawns TWO more calls!
    # Call tree grows exponentially: 1 → 2 → 4 → 8 → ...
    # Total calls ≈ 2ⁿ, so O(2ⁿ)

Recurrence: T(n) = T(n-1) + T(n-2) + O(1). The solution is O(φⁿ) where φ ≈ 1.618 (golden ratio), but we approximate as O(2ⁿ).

Why this is terrible: For n=50, this makes about 10¹⁵ calls. Dynamic programming fixes this to O(n)!

O(log n) Recursion -- Binary Search
def binary_search_recursive(arr, target, lo, hi):
    if lo > hi:
        return -1
    mid = (lo + hi) // 2
    if arr[mid] == target:
        return mid
    elif arr[mid] < target:
        return binary_search_recursive(arr, target, mid + 1, hi)
    else:
        return binary_search_recursive(arr, target, lo, mid - 1)
    # Each call halves the search space
    # n → n/2 → n/4 → ... → 1 = log₂(n) calls
    # O(log n)

Recurrence: T(n) = T(n/2) + O(1), which solves to T(n) = O(log n)

O(n log n) Recursion -- Merge Sort
def merge_sort(arr):
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left = merge_sort(arr[:mid])     # T(n/2)
    right = merge_sort(arr[mid:])    # T(n/2)
    return merge(left, right)         # O(n) to merge

    # Recurrence: T(n) = 2T(n/2) + O(n)
    # By Master Theorem: O(n log n)

Visual intuition: We split the array log(n) times (levels of recursion). At each level, we do O(n) total work merging. So: O(n) × O(log n) = O(n log n).

Recognizing Recursive Patterns

Pattern Recurrence Complexity Example
Single call, subtract 1 T(n) = T(n-1) + O(1) O(n) Factorial, linear recursion
Single call, divide by 2 T(n) = T(n/2) + O(1) O(log n) Binary search
Two calls, divide by 2 T(n) = 2T(n/2) + O(1) O(n) Tree traversal (no extra work)
Two calls, divide by 2, linear work T(n) = 2T(n/2) + O(n) O(n log n) Merge sort
Two calls, subtract 1 T(n) = 2T(n-1) + O(1) O(2ⁿ) Towers of Hanoi
Fibonacci-style T(n) = T(n-1) + T(n-2) O(2ⁿ) Naive Fibonacci
The Math Connection

Recursive time complexity is fundamentally about geometric series:

  • O(2ⁿ): Each level doubles the work: 1 + 2 + 4 + 8 + ... + 2ⁿ = 2ⁿ⁺¹ - 1 ≈ 2ⁿ
  • O(n log n): We have log(n) levels, each doing O(n) work
  • O(log n): Dividing by 2 each time means log₂(n) steps to reach 1

Understanding series formulas lets you analyze these patterns mathematically.

Space Complexity of Recursion

Recursive functions also use space for the call stack. Each recursive call adds a frame to the stack.

Space complexity = maximum depth of recursion × space per call
Space Complexity Examples
  • factorial(n): Depth is n, each frame is O(1) → O(n) space
  • binary_search: Depth is log(n), each frame is O(1) → O(log n) space
  • merge_sort: Depth is log(n), but we also create new arrays → O(n) space
  • naive fib: Maximum depth is n (following one branch) → O(n) space

Big-Omega and Big-Theta

Big-O gives the upper bound (worst case). There are two related notations:

Common Mistake

People often say "this algorithm is O(n)" when they mean Θ(n). Big-O is an upper bound, so it's technically true that linear search is O(n²) (it's never worse than quadratic), but that's not helpful. In practice, use Big-O for the tightest upper bound you can give.

Why CS Cares -- Big-O

Big-O is the language engineers use to discuss performance. When someone says "binary search is O(log n)," they're telling you it scales incredibly well. When your database query is O(n²), you know it will crawl on large datasets. Every technical interview expects you to analyze the time and space complexity of your solutions. Understanding Big-O is the difference between writing code that works on 100 items and code that works on 100 million items.

10. Recurrence Relations

A recurrence relation defines a sequence where each term depends on previous terms. In CS, recurrences describe the running time of recursive algorithms.

The Core Idea

Every recursive function has a recurrence relation hiding inside it. The recurrence literally says: "The time to solve a problem of size n equals the time to solve smaller pieces, plus the work to combine them."

If you can write a recursive function, you can write its recurrence. If you can solve the recurrence, you know the function's Big-O.

Reading Recurrences Like Code

T(n) = 2T(n/2) + n    // Merge sort
T(n) = T(n-1) + 1    // Linear recursion (like factorial)
T(n) = 2T(n-1) + 1    // Towers of Hanoi
T(n) = T(n-1) + T(n-2)    // Fibonacci
How to Read T(n) = 2T(n/2) + n
PartMeaningIn the Code
2T(...)Make 2 recursive callsmerge_sort(left); merge_sort(right);
T(n/2)Each call gets half the inputSplitting array into two halves
+ nDo n work outside the recursionMerging two sorted halves takes O(n)
def merge_sort(arr):
    if len(arr) <= 1:                  # T(1) = 1  (base case)
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])        # T(n/2)    (first recursive call)
    right = merge_sort(arr[mid:])       # T(n/2)    (second recursive call)
    return merge(left, right)            # + O(n)    (combining step)
    # Total: T(n) = 2T(n/2) + n
More Code → Recurrence Translations
Code PatternRecurrenceSolution
f(n-1) + O(1) work T(n) = T(n-1) + 1 O(n)
f(n-1) + O(n) work T(n) = T(n-1) + n O(n²)
f(n/2) + O(1) work T(n) = T(n/2) + 1 O(log n)
2 × f(n/2) + O(n) work T(n) = 2T(n/2) + n O(n log n)
f(n-1) + f(n-2) T(n) = T(n-1) + T(n-2) O(2n)
2 × f(n-1) T(n) = 2T(n-1) O(2n)

Solving Recurrences by Expansion (Unrolling)

The simplest method: keep substituting until you see a pattern, then find the closed form.

Example 1 -- T(n) = T(n-1) + 1, T(1) = 1

This is like factorial's call structure: one recursive call, O(1) work each time.

  • T(n) = T(n-1) + 1
  • = [T(n-2) + 1] + 1 = T(n-2) + 2
  • = [T(n-3) + 1] + 2 = T(n-3) + 3
  • Pattern spotted: T(n) = T(n-k) + k
  • Base case when n-k = 1, so k = n-1:
  • T(n) = T(1) + (n-1) = 1 + (n-1) = n = O(n)
Example 2 -- T(n) = T(n-1) + n, T(1) = 1

This is like selection sort: one recursive call, but O(n) work each time (shrinking).

  • T(n) = T(n-1) + n
  • = [T(n-2) + (n-1)] + n = T(n-2) + (n-1) + n
  • = T(n-3) + (n-2) + (n-1) + n
  • Pattern: T(n) = T(1) + 2 + 3 + ... + n
  • = 1 + (2 + 3 + ... + n) = n(n+1)/2 = O(n²)

See how the recurrence naturally produced an arithmetic series? That's the summation 1+2+...+n from the sequences page!

Example 3 -- T(n) = 2T(n/2) + n, T(1) = 1 (Merge Sort)

This is the most important one. Let's unroll it carefully:

  • T(n) = 2T(n/2) + n
  • = 2[2T(n/4) + n/2] + n = 4T(n/4) + n + n = 4T(n/4) + 2n
  • = 4[2T(n/8) + n/4] + 2n = 8T(n/8) + n + 2n = 8T(n/8) + 3n
  • Pattern: After k expansions: T(n) = 2k × T(n/2k) + kn
  • Base case when n/2k = 1, so k = log2(n):
  • T(n) = n × T(1) + n × log2(n) = n + n log n = O(n log n)

Intuition: There are log2(n) levels of recursion. At each level, the total work across all calls is n. So total work = n × log n.

Example 4 -- T(n) = T(n/2) + 1, T(1) = 1 (Binary Search)
  • T(n) = T(n/2) + 1
  • = T(n/4) + 1 + 1 = T(n/4) + 2
  • = T(n/8) + 3
  • Pattern: T(n) = T(n/2k) + k
  • Base case when n/2k = 1, so k = log2(n):
  • T(n) = T(1) + log2(n) = 1 + log n = O(log n)

Intuition: Each call halves the problem and does O(1) work. How many halvings until size 1? log2(n).

The Expansion Method -- Step by Step
  1. Write out T(n) in terms of T(smaller)
  2. Expand T(smaller) using the same rule -- do this 2-3 times
  3. Spot the pattern -- how does the expression look after k expansions?
  4. Find k -- what value of k hits the base case?
  5. Substitute k and simplify to get the closed form

The Master Theorem (Simplified)

For recurrences of the form T(n) = aT(n/b) + O(nd), the Master Theorem gives the answer directly. No expansion needed -- just identify a, b, d and compare:

T(n) = a × T(n/b) + O(nd)

a = number of recursive calls
b = factor by which input shrinks
d = exponent of work done outside recursion

Compare logb(a) to d:
Condition Result Intuition
logb(a) < d T(n) = O(nd) Combine work dominates. Recursion is "cheap" relative to the merge step.
logb(a) = d T(n) = O(nd log n) Work is evenly spread. Every level contributes equally.
logb(a) > d T(n) = O(nlogb(a)) Splitting dominates. The sheer number of subproblems is the bottleneck.
Applying the Master Theorem

Merge sort: T(n) = 2T(n/2) + O(n). a=2, b=2, d=1.

log2(2) = 1 = d → Case 2 → T(n) = O(n1 log n) = O(n log n)

Binary search: T(n) = T(n/2) + O(1). a=1, b=2, d=0.

log2(1) = 0 = d → Case 2 → T(n) = O(n0 log n) = O(log n)

Karatsuba multiplication: T(n) = 3T(n/2) + O(n). a=3, b=2, d=1.

log2(3) ≈ 1.58 > 1 = d → Case 3 → T(n) = O(n1.58)

Strassen's matrix multiply: T(n) = 7T(n/2) + O(n²). a=7, b=2, d=2.

log2(7) ≈ 2.81 > 2 = d → Case 3 → T(n) = O(n2.81)

Master Theorem Cheat: Common Algorithms
AlgorithmRecurrencea, b, dCaseResult
Binary SearchT(n/2) + 11, 2, 0log=dO(log n)
Merge Sort2T(n/2) + n2, 2, 1log=dO(n log n)
QuickSelect avgT(n/2) + n1, 2, 1log<dO(n)
Balanced BST searchT(n/2) + 11, 2, 0log=dO(log n)
When the Master Theorem Does NOT Apply

The Master Theorem only works for T(n) = aT(n/b) + O(nd) -- divide-and-conquer recurrences where the problem size is divided by b.

It does NOT work for:

  • T(n) = T(n-1) + T(n-2) -- subtracting, not dividing (Fibonacci)
  • T(n) = T(n-1) + n -- subtracting, not dividing (selection sort)
  • T(n) = T(n/3) + T(2n/3) + n -- unequal splits

For these, use the expansion method or the recursion tree method.

Why CS Cares -- Recurrences

Every recursive algorithm has a running time described by a recurrence relation. Merge sort, quicksort, binary search, tree traversals, divide-and-conquer -- all analyzed using recurrences. The Master Theorem lets you determine the Big-O of most divide-and-conquer algorithms by inspection. Understanding recurrences is understanding how recursive code scales.

11. Practice Quiz

Test your understanding of discrete math. Click on an answer to check it.

1. What is the value of ¬(A ∧ B) when A = True and B = False?

A ∧ B = True ∧ False = False. Then ¬(False) = True. This is also consistent with De Morgan's Law: ¬(A ∧ B) = ¬A ∨ ¬B = False ∨ True = True.

2. How many elements are in the power set of {a, b, c, d}?

The power set has 2|S| elements. |{a, b, c, d}| = 4, so |P(S)| = 24 = 16. The power set contains all subsets: the empty set, all singletons, all pairs, all triples, and the full set itself.

3. What is the time complexity of an algorithm with the recurrence T(n) = 2T(n/2) + n?

Using the Master Theorem: a=2, b=2, d=1. Since log2(2) = 1 = d, T(n) = O(nd log n) = O(n log n). This is the recurrence for merge sort.

4. In how many ways can you choose a committee of 3 people from a group of 8?

Order doesn't matter (a committee, not a ranking), so use combinations: C(8, 3) = 8! / (3! × 5!) = (8 × 7 × 6) / (3 × 2 × 1) = 336 / 6 = 56.

5. What is GCD(48, 18) using the Euclidean algorithm?

GCD(48, 18): 48 mod 18 = 12, so GCD(18, 12). 18 mod 12 = 6, so GCD(12, 6). 12 mod 6 = 0, so GCD = 6.
← Calculus Linear Algebra →