Chapter 1

Vectors & Spaces

A vector is more than an arrow — it is a direction paired with a magnitude. Vectors are the fundamental objects of linear algebra: they can be added, scaled, and combined. Understanding how they behave unlocks the geometry behind machine learning, physics, computer graphics, and more.

Vector Arithmetic

A vector in $\mathbb{R}^n$ is an ordered list of $n$ real numbers. In two dimensions we write $\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \end{pmatrix}$, which you can picture as an arrow from the origin to the point $(v_1, v_2)$. The numbers $v_1$ and $v_2$ are its components.

Two operations define everything: addition and scalar multiplication. Both work component-wise:

\mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ u_2 + v_2 \end{pmatrix}, \qquad c\mathbf{v} = \begin{pmatrix} cv_1 \\ cv_2 \end{pmatrix}

Geometrically, addition places vectors tip-to-tail: slide the tail of $\mathbf{v}$ to the tip of $\mathbf{u}$, and the sum $\mathbf{u} + \mathbf{v}$ is the arrow from the origin to where $\mathbf{v}$'s tip now lands. Scalar multiplication by $c > 1$ stretches the arrow; by $0 < c < 1$ it shrinks it; by $c < 0$ it flips its direction.

The magnitude (or length) of $\mathbf{v}$ is:

\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2}

This is just the Pythagorean theorem applied to the components. A vector with magnitude $1$ is called a unit vector. Any nonzero vector can be normalized to a unit vector by dividing by its magnitude: $\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}$.

Two Ways to See Every Operation

Every vector operation has an algebraic form (formulas on components) and a geometric interpretation (what happens to arrows). Fluency in linear algebra means moving freely between both. When you add vectors algebraically, you should simultaneously see the tip-to-tail picture in your head — and vice versa.

Dot Product

The dot product is a way to multiply two vectors that produces a single number (a scalar). For $\mathbf{u} = (u_1, u_2)$ and $\mathbf{v} = (v_1, v_2)$:

\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2

This has a beautiful geometric interpretation. If $\theta$ is the angle between $\mathbf{u}$ and $\mathbf{v}$, then:

\mathbf{u} \cdot \mathbf{v} = \|\mathbf{u}\|\,\|\mathbf{v}\|\cos\theta

The dot product measures how much the two vectors point in the same direction. When $\theta = 0°$ they're parallel and the dot product is maximized. When $\theta = 90°$, $\cos 90° = 0$, so $\mathbf{u} \cdot \mathbf{v} = 0$ — the vectors are orthogonal (perpendicular). When $\theta > 90°$ the dot product is negative, meaning the vectors point away from each other.

Notice that $\mathbf{v} \cdot \mathbf{v} = v_1^2 + v_2^2 = \|\mathbf{v}\|^2$, so the magnitude satisfies $\|\mathbf{v}\| = \sqrt{\mathbf{v} \cdot \mathbf{v}}$. The dot product gives us length for free.

Why Orthogonality Matters

Orthogonal vectors carry no shared information. In a coordinate system, the $x$- and $y$-axes are perpendicular precisely for this reason — knowing your $x$-coordinate tells you nothing about your $y$-coordinate. This principle drives everything from signal processing (separating frequencies) to statistics (independent variables).

Span & Basis

A linear combination of vectors $\mathbf{v}_1, \ldots, \mathbf{v}_k$ is any expression of the form $c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k$ where the $c_i$ are real numbers. The set of all possible linear combinations is called the span:

\text{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} = \{\, c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k \;\mid\; c_i \in \mathbb{R} \,\}

Think of span as the question: "what is every destination I can reach using these vectors as building blocks, with any scaling I choose?" Two parallel vectors in $\mathbb{R}^2$ only span a line — no matter how you scale and add them, you stay on that line. Two non-parallel vectors span all of $\mathbb{R}^2$ — you can reach any point in the plane.

A set of vectors is linearly independent if no vector in the set can be written as a linear combination of the others — none is redundant. A basis is a linearly independent set that spans the whole space. It is the minimum information needed to describe every point.

The standard basis for $\mathbb{R}^2$: $\mathbf{e}_1 = (1,0)^T$, $\mathbf{e}_2 = (0,1)^T$ — the coordinate axes.
Any two non-parallel vectors in $\mathbb{R}^2$ form a valid basis.
Three vectors in $\mathbb{R}^2$ are always linearly dependent — one can always be expressed using the other two.

The number of vectors in any basis for a space is always the same, no matter which basis you choose. This number is the dimension of the space. $\mathbb{R}^2$ has dimension $2$; $\mathbb{R}^3$ has dimension $3$.

A Basis Is a Language

Choosing a basis is choosing a coordinate system — a language for describing the space. Change the basis and the coordinates of every point change, but the underlying geometry stays the same. This is why the choice of basis matters so much in applications: a well-chosen basis can simplify a problem dramatically; a poor one can obscure it.