Vectors, matrices, and transformations -- the math that powers everything from 3D graphics to neural networks.
Linear algebra is the branch of mathematics that deals with vectors (ordered lists of numbers), matrices (grids of numbers), and linear transformations (functions that move, stretch, and rotate space). If regular algebra is about solving for one unknown number, linear algebra is about solving for many unknowns at once -- and understanding the structure of the space they live in.
Think of it this way: a single variable x is just a number on a number line. But in the real world, data almost never comes as a single number. An image is a grid of pixel values. A point in 3D space has three coordinates. A neural network has millions of parameters. Linear algebra gives you the tools to work with all of that at once, efficiently and elegantly.
Regular algebra deals with single numbers. Linear algebra deals with lists and grids of numbers. That's it at its core. A vector is a list, a matrix is a grid, and the operations tell you how to combine and transform them.
| CS Field | How Linear Algebra is Used |
|---|---|
| Machine Learning / AI | Neural networks are built from matrix multiplications. Training involves gradient vectors. Data lives in high-dimensional vector spaces. |
| Computer Graphics | Every rotation, scaling, and projection in 3D graphics is a matrix operation. GPUs are literally designed for matrix math. |
| Game Development | Character positions are vectors. Camera angles, physics simulations, and collision detection all use linear algebra. |
| Data Science | Datasets are matrices. PCA (dimensionality reduction) uses eigenvectors. Recommendation systems use matrix factorization. |
| Image Processing | Images are matrices of pixels. Filters (blur, sharpen, edge detect) are matrix operations called convolutions. |
| Cryptography | Many encryption schemes (like Hill cipher) rely on matrix operations and modular arithmetic. |
Linear algebra has a reputation for being abstract and hard. That reputation comes from how it's traditionally taught -- theorem-proof style with no applications. On this page we focus on what things mean and how they're used in code. You will understand this.
A vector is an ordered list of numbers. That's the whole definition. In 2D, a vector has two components. In 3D, three. In machine learning, vectors can have thousands or millions of components.
Geometrically, you can think of a 2D or 3D vector as an arrow pointing from the origin to a point. The vector [3, 2] points 3 units right and 2 units up. But vectors don't have to be spatial -- they can represent anything: colors, audio samples, user preferences, word meanings.
To add or subtract vectors, you just add or subtract their corresponding components. Both vectors must have the same number of components.
Problem: Add vectors a = [2, 5, -1] and b = [3, -2, 4]
a + b = [2+3, 5+(-2), -1+4]
a + b = [5, 3, 3]
CS context: If a player moves [2, 5, -1] then [3, -2, 4], their total displacement is [5, 3, 3].
A scalar is just a regular number (as opposed to a vector). Multiplying a vector by a scalar multiplies every component by that number.
Problem: Multiply v = [4, -2, 6] by scalar 3
3 * v = [3*4, 3*(-2), 3*6]
3 * v = [12, -6, 18]
CS context: Scaling a velocity vector by 3 makes the object move 3 times faster in the same direction.
The magnitude (or length or norm) of a vector tells you how long it is. It's the distance from the origin to the point the vector represents.
Problem: Find the magnitude of v = [3, 4]
||v|| = √(3² + 4²)
||v|| = √(9 + 16)
||v|| = √25
||v|| = 5
CS context: If v represents a velocity, then ||v|| = 5 is the speed (how fast, ignoring direction).
A unit vector is a vector with magnitude 1. It represents a pure direction with no scaling. To make any vector into a unit vector, divide it by its magnitude.
Problem: Find the unit vector of v = [3, 4]
||v|| = 5 (from previous example)
v̂ = [3/5, 4/5] = [0.6, 0.8]
Check: √(0.6² + 0.8²) = √(0.36 + 0.64) = √1 = 1
Unit vectors come up constantly in game dev and graphics. When you want to move a character "toward the enemy" at a fixed speed, you compute the direction vector (enemy_pos - player_pos), normalize it to a unit vector, then multiply by the speed. Direction times speed equals velocity.
The dot product (or scalar product) takes two vectors and returns a single number. You multiply corresponding components and add them up.
Problem: Find the dot product of a = [2, 3, -1] and b = [4, -1, 5]
a · b = (2)(4) + (3)(-1) + (-1)(5)
a · b = 8 + (-3) + (-5)
a · b = 0
Note: When the dot product is 0, the vectors are perpendicular (orthogonal).
The dot product tells you how much two vectors point in the same direction. It connects to the angle between them through this formula:
Problem: Find the angle between a = [1, 0] and b = [1, 1]
a · b = (1)(1) + (0)(1) = 1
||a|| = √(1 + 0) = 1
||b|| = √(1 + 1) = √2
cos(θ) = 1 / (1 * √2) = 1/√2
θ = 45 degrees
The dot product is everywhere in CS. In lighting calculations, you dot the surface normal with the light direction to figure out how bright a surface is. In recommendation engines, you dot a user preference vector with a product feature vector to predict how much they'll like it. In NLP, cosine similarity (which uses dot product) measures how similar two word embeddings are.
The cross product takes two 3D vectors and returns a new vector that is perpendicular to both inputs. It's mainly used in 3D graphics for computing surface normals.
Problem: Find the cross product of a = [1, 0, 0] and b = [0, 1, 0]
a × b = [(0)(0) - (0)(1), (0)(0) - (1)(0), (1)(1) - (0)(0)]
a × b = [0, 0, 1]
CS context: The x-axis crossed with the y-axis gives the z-axis. This is how 3D coordinate systems are defined.
The cross product is not commutative: a × b = -(b × a). The order matters -- it flips the direction of the result. Also, the cross product only works in 3D.
A matrix is a rectangular grid of numbers arranged in rows and columns. Just as a vector is a list, a matrix is a table. We describe a matrix by its dimensions: an m × n matrix has m rows and n columns.
Matrices show up everywhere in CS: an image is a matrix of pixel values, a spreadsheet is a matrix, a neural network layer is defined by a weight matrix, and a 3D transformation is a 4×4 matrix.
Just like vectors, you add/subtract matrices element by element. Both matrices must have the same dimensions.
Problem: Add matrices A and B
A = | 1 2 | B = | 5 6 |
| 3 4 | | 7 8 |
A + B = | 1+5 2+6 | = | 6 8 |
| 3+7 4+8 | | 10 12 |
Multiply every element in the matrix by the scalar.
3 * | 1 2 | = | 3 6 |
| 4 5 | | 12 15 |
This is the big one. Matrix multiplication is not element-wise. It uses a "row times column" pattern. Each entry in the result is the dot product of a row from the first matrix with a column from the second matrix.
Matrix multiplication requires the inner dimensions to match. A (2×3) matrix can multiply a (3×4) matrix, giving a (2×4) result. But a (2×3) CANNOT multiply a (2×4) because 3 does not equal 2.
Problem: Multiply A (2×3) by B (3×2)
A = | 1 2 3 | B = | 7 8 |
| 4 5 6 | | 9 10 |
| 11 12 |
Result will be 2×2.
Position (1,1): Row 1 of A · Col 1 of B
= (1)(7) + (2)(9) + (3)(11) = 7 + 18 + 33 = 58
Position (1,2): Row 1 of A · Col 2 of B
= (1)(8) + (2)(10) + (3)(12) = 8 + 20 + 36 = 64
Position (2,1): Row 2 of A · Col 1 of B
= (4)(7) + (5)(9) + (6)(11) = 28 + 45 + 66 = 139
Position (2,2): Row 2 of A · Col 2 of B
= (4)(8) + (5)(10) + (6)(12) = 32 + 50 + 72 = 154
AB = | 58 64 |
| 139 154 |
Matrix multiplication is NOT commutative: AB does not equal BA. In fact, even if AB is defined, BA might not be (because the dimensions might not match the other way). This matters in graphics -- applying rotation then translation gives a different result than translation then rotation.
The identity matrix (I) is the matrix equivalent of the number 1. It's a square matrix with 1s on the diagonal and 0s everywhere else. Multiplying any matrix by I leaves it unchanged.
In 3D graphics, the identity matrix represents "no transformation." When you reset a model's transform, you set it to the identity matrix. Every transformation starts from identity.
The transpose of a matrix (written AT) flips rows and columns. Row 1 becomes column 1, row 2 becomes column 2, and so on. An m×n matrix becomes n×m.
A = | 1 2 3 | AT = | 1 4 |
| 4 5 6 | | 2 5 |
| 3 6 |
The 2×3 matrix became a 3×2 matrix.
Problem: Given A = | 2 1 | and B = | 0 3 |, find 2A - B
| 3 4 | | 1 2 |
Step 1: Compute 2A
2A = | 4 2 |
| 6 8 |
Step 2: Subtract B
2A - B = | 4-0 2-3 | = | 4 -1 |
| 6-1 8-2 | | 5 6 |
One of the most powerful uses of matrices is representing and solving systems of linear equations. Instead of writing out multiple equations, you pack everything into a single matrix equation.
System of equations:
2x + 3y = 7
4x - y = 1
Matrix form Ax = b:
| 2 3 | | x | | 7 |
| 4 -1 | * | y | = | 1 |
A = coefficient matrix, x = unknowns vector, b = constants vector
This isn't just a notation trick. Phrasing problems as Ax = b lets you use matrix operations to solve them -- and computers are incredibly fast at matrix operations. This is how physics simulators solve thousands of equations simultaneously, and how machine learning models find optimal parameters.
Gaussian elimination is a systematic method for solving systems of equations by transforming the matrix into a simpler form. You create an augmented matrix (A with b appended) and perform row operations until the solution is clear.
The three allowed row operations are:
Problem: Solve the system
x + y + z = 6
2x + 3y + z = 14
x + y + 2z = 9
Step 1: Write the augmented matrix
| 1 1 1 | 6 |
| 2 3 1 | 14 |
| 1 1 2 | 9 |
Step 2: R2 = R2 - 2*R1 (eliminate x from row 2)
| 1 1 1 | 6 |
| 0 1 -1 | 2 |
| 1 1 2 | 9 |
Step 3: R3 = R3 - R1 (eliminate x from row 3)
| 1 1 1 | 6 |
| 0 1 -1 | 2 |
| 0 0 1 | 3 |
Step 4: Back-substitute from the bottom
Row 3: z = 3
Row 2: y - z = 2 → y = 2 + 3 = 5
Row 1: x + y + z = 6 → x = 6 - 5 - 3 = -2
Solution: x = -2, y = 5, z = 3
Verify: (-2) + 5 + 3 = 6, 2(-2) + 3(5) + 3 = 14, (-2) + 5 + 2(3) = 9
The goal of Gaussian elimination is to reach row echelon form, where:
In practice, you rarely hand-compute Gaussian elimination. Libraries like NumPy handle it. But understanding the process is essential for debugging numerical issues, understanding computational complexity (it's O(n³)), and knowing when a system has no solution or infinite solutions.
The determinant is a single number computed from a square matrix that tells you important things about the matrix. Think of it as a measure of how much the matrix "stretches" or "squishes" space.
Problem: Find det(A) where A = | 3 2 |
| 1 4 |
det(A) = (3)(4) - (2)(1)
det(A) = 12 - 2
det(A) = 10
Geometrically, the determinant represents the scaling factor for area (2D) or volume (3D) when you apply the matrix as a transformation.
Scaling by 2 in x and 3 in y:
A = | 2 0 |
| 0 3 |
det(A) = (2)(3) - (0)(0) = 6
A 1×1 unit square becomes a 2×3 rectangle. Area goes from 1 to 6. The determinant is 6.
A = | 2 4 |
| 1 2 |
det(A) = (2)(2) - (4)(1) = 4 - 4 = 0
This matrix has no inverse. Row 1 is just 2 times row 2 -- the rows are "linearly dependent." The transformation collapses 2D space into a line.
For larger matrices, you expand along a row or column (cofactor expansion). For a 3×3 matrix:
A = | 1 2 3 |
| 4 5 6 |
| 7 8 0 |
det(A) = 1(5*0 - 6*8) - 2(4*0 - 6*7) + 3(4*8 - 5*7)
= 1(0 - 48) - 2(0 - 42) + 3(32 - 35)
= -48 + 84 + (-9)
= 27
Computing determinants by cofactor expansion gets extremely expensive for large matrices -- it's O(n!) in the naive approach. Real libraries use LU decomposition (based on Gaussian elimination) which is O(n³). You should understand what determinants mean, but let NumPy compute them.
The inverse of a matrix A, written A-1, is the matrix that "undoes" A. When you multiply A by its inverse, you get the identity matrix.
Think of it like division for matrices. If multiplication by A transforms space in some way, multiplication by A-1 reverses that transformation exactly.
A matrix has an inverse if and only if its determinant is not zero. If det(A) = 0, the matrix is called singular and has no inverse. This makes intuitive sense: if A collapses a dimension (det = 0), you can't recover the lost information.
Problem: Find A-1 where A = | 4 7 |
| 2 6 |
Step 1: det(A) = (4)(6) - (7)(2) = 24 - 14 = 10
Step 2: Apply formula
A-1 = (1/10) * | 6 -7 |
| -2 4 |
A-1 = | 0.6 -0.7 |
| -0.2 0.4 |
Verify: A * A-1 should equal I
| 4 7 | * | 0.6 -0.7 | = | 4(0.6)+7(-0.2) 4(-0.7)+7(0.4) |
| 2 6 | | -0.2 0.4 | | 2(0.6)+6(-0.2) 2(-0.7)+6(0.4) |
= | 2.4-1.4 -2.8+2.8 | = | 1 0 |
| 1.2-1.2 -1.4+2.4 | | 0 1 |
If you have the equation Ax = b and you know A-1, you can solve for x directly:
Problem: Solve using the inverse from above
4x + 7y = 5
2x + 6y = 4
x = A-1b = | 0.6 -0.7 | * | 5 |
| -0.2 0.4 | | 4 |
x = | 0.6(5) + (-0.7)(4) | = | 3 - 2.8 | = | 0.2 |
| -0.2(5) + 0.4(4) | | -1 + 1.6 | | 0.6 |
Solution: x = 0.2, y = 0.6
In practice, solving Ax = b by computing A-1 explicitly is inefficient and numerically unstable. Real code uses LU decomposition or other factorization methods. In NumPy, use np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b. But the inverse concept is still essential for understanding the theory.
This is the concept that makes most students panic, but the idea is surprisingly simple. When you multiply most vectors by a matrix, they change both direction and magnitude. But some special vectors only get scaled -- they keep pointing in the same direction. These are eigenvectors, and the scaling factor is the eigenvalue.
In words: "When I apply transformation A to vector v, I get back the same vector v, just scaled by λ."
Starting from Av = λv, we rearrange:
Problem: Find eigenvalues and eigenvectors of
A = | 4 1 |
| 2 3 |
Step 1: Set up A - λI
A - λI = | 4-λ 1 |
| 2 3-λ |
Step 2: Set determinant to zero
det(A - λI) = (4-λ)(3-λ) - (1)(2) = 0
12 - 4λ - 3λ + λ² - 2 = 0
λ² - 7λ + 10 = 0
(λ - 5)(λ - 2) = 0
Eigenvalues: λ1 = 5, λ2 = 2
Step 3: Find eigenvectors for each λ
For λ1 = 5: Solve (A - 5I)v = 0
| -1 1 | * | v1 | = | 0 |
| 2 -2 | | v2 | | 0 |
-v1 + v2 = 0 → v2 = v1
Eigenvector: v1 = [1, 1] (or any scalar multiple)
For λ2 = 2: Solve (A - 2I)v = 0
| 2 1 | * | v1 | = | 0 |
| 2 1 | | v2 | | 0 |
2v1 + v2 = 0 → v2 = -2v1
Eigenvector: v2 = [1, -2] (or any scalar multiple)
Verification for λ1 = 5, v = [1, 1]:
Av = | 4 1 | * | 1 | = | 5 | = 5 * | 1 | = λv
| 2 3 | | 1 | | 5 | | 1 |
| Application | How Eigenvalues/Eigenvectors Are Used |
|---|---|
| Google PageRank | Web pages are nodes in a giant matrix. The principal eigenvector of the link matrix gives page importance rankings. |
| PCA (Data Science) | Eigenvectors of the covariance matrix point in the directions of greatest variance. This lets you reduce dimensionality while keeping the most important patterns. |
| Stability Analysis | If all eigenvalues of a system's matrix have magnitude < 1, the system is stable. Used in control systems and simulation. |
| Image Compression | SVD (closely related to eigendecomposition) lets you approximate images using only the most significant components. |
| Graph Algorithms | Eigenvalues of the adjacency matrix reveal graph properties like connectivity and clustering structure (spectral graph theory). |
Think of eigenvectors as the "natural axes" of a transformation. A matrix might do complicated things to most vectors, but along its eigenvectors, the action is simple: just stretching. Eigenvalues tell you how much stretching happens along each axis. This is why they simplify so many problems -- they reveal the matrix's fundamental behavior.
Every matrix represents a linear transformation -- a function that takes vectors in, moves/stretches/rotates them, and outputs new vectors. When you multiply a vector by a matrix, you're applying that transformation. This is the connection between abstract matrix math and real visual/spatial effects.
| Transformation | Matrix | Effect |
|---|---|---|
| Rotation by θ | | cosθ -sinθ | | sinθ cosθ | |
Rotates all points by angle θ around the origin |
| Scaling | | sx 0 | | 0 sy | |
Stretches x by sx and y by sy |
| Reflection (x-axis) | | 1 0 | | 0 -1 | |
Flips vertically (negates y) |
| Reflection (y-axis) | | -1 0 | | 0 1 | |
Flips horizontally (negates x) |
| Shear (horizontal) | | 1 k | | 0 1 | |
Slants by factor k along x-axis |
Problem: Rotate the point [3, 1] by 90 degrees counterclockwise
θ = 90 degrees, so cos(90) = 0, sin(90) = 1
R = | 0 -1 |
| 1 0 |
R * [3, 1] = | 0*3 + (-1)*1 | = | -1 |
| 1*3 + 0*1 | | 3 |
The point [3, 1] rotated to [-1, 3]. You can verify this is correct: the distance from origin is preserved (√10 in both cases), and the angle increased by 90 degrees.
Problem: Scale point [2, 3] by 2x horizontally and 0.5x vertically
S = | 2 0 |
| 0 0.5 |
S * [2, 3] = | 2*2 + 0*3 | = | 4 |
| 0*2 + 0.5*3 | | 1.5 |
The x-coordinate doubled and the y-coordinate halved.
Here's the elegant part: applying two transformations in sequence is the same as multiplying their matrices together. If you want to first scale, then rotate, you compute R * S and use the resulting matrix.
When composing transformations, read right to left. In T = R * S, the vector first gets multiplied by S (scaling), then the result gets multiplied by R (rotation). The rightmost matrix acts first. This trips up everyone at first.
In 3D graphics, objects go through a chain of transformations: Model (position in world) → View (camera perspective) → Projection (3D to 2D screen). The MVP matrix (Model-View-Projection) is the product of all three. GPUs compute millions of these matrix multiplications per frame -- it's what they're built for.
A neural network layer is fundamentally a matrix multiplication followed by an activation function. The weights between layers form a matrix. The forward pass (computing the output) is just repeated matrix-vector multiplication.
Training adjusts the weights using gradients (vectors of partial derivatives), and the gradient computation involves matrix transposes and chain-rule multiplications. The entire deep learning stack is linear algebra plus calculus.
A grayscale image is literally a matrix where each entry is a pixel brightness (0-255). Color images are three matrices stacked (R, G, B channels). Convolution -- the core operation in image filters and CNNs -- slides a small matrix (the kernel) across the image and computes dot products at each position.
Netflix, Spotify, and Amazon use matrix factorization for recommendations. You have a giant matrix of users × items, mostly empty (users have only rated a few things). The trick: decompose this into two smaller matrices -- one capturing user preferences, one capturing item features. The product approximates the full matrix, filling in the blanks with predictions.
Every vertex in a 3D scene goes through a chain of 4×4 matrix transformations. Using 4×4 matrices (instead of 3×3) through a technique called homogeneous coordinates lets you represent translation as a matrix multiplication too.
Pythonimport numpy as np
# --- Vectors ---
a = np.array([2, 3, -1])
b = np.array([4, -1, 5])
print("Addition:", a + b) # [6, 2, 4]
print("Dot product:", np.dot(a, b)) # 0 (perpendicular!)
print("Magnitude:", np.linalg.norm(a)) # 3.742
# --- Matrices ---
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
print("Matrix multiply:", A @ B) # @ is matrix multiply
print("Transpose:", A.T)
print("Determinant:", np.linalg.det(A)) # -2.0
print("Inverse:", np.linalg.inv(A))
# --- Solving Ax = b ---
A = np.array([[2, 3],
[4, -1]])
b = np.array([7, 1])
x = np.linalg.solve(A, b) # Better than inv(A) @ b
print("Solution:", x) # [1. 1.667]
# --- Eigenvalues ---
A = np.array([[4, 1],
[2, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues) # [5. 2.]
print("Eigenvectors:", eigenvectors) # columns are eigenvectors
# --- 2D Rotation ---
import math
theta = math.radians(90)
R = np.array([[math.cos(theta), -math.sin(theta)],
[math.sin(theta), math.cos(theta)]])
point = np.array([3, 1])
print("Rotated:", R @ point) # [-1, 3]
NumPy's @ operator is matrix multiplication (same as np.matmul). The * operator on arrays is element-wise multiplication, which is NOT the same thing. This distinction trips up beginners constantly. Use @ for matrix math, * for element-wise operations.
Test your understanding of the key linear algebra concepts. Click the answer you think is correct.