Linear Algebra

Vector Basics
Matrices
- Matrix Transformations
- Types of Transformations
Composition + Combination
Determinants
Multiplying Matrices
Eigenvectors & Eigenvalues
- Calculating Eigenvectors
- Eigenbasis
Page Rank Algorithm
Resources

Vector Basics

Magnitude — \(||s||\)

This is the length of the vector. Literally just pythagorean theorem—adding the squares of all the components and taking the square root.

\[||s|| = \sqrt{s_{1}^{2} + s_{2}^{2} + ...s_{n}^{2}} = \sqrt{\sum_{i=1}^{n} s_{i}^{2}}\]

Dot Product — \(s \cdot r\)

The dot product is basically multiplying two vectors together, and then summing over all the individual products. Its called the dot product because both vectors are starting from the same dot origin. Alternative names: Inner product, scalar product. It requires two vectors of equal dimensions.

\[\sum_{i=1}^{n} s_{i}r_{i} = (s_{1} \cdot r_{1})+(s_{2} \cdot r_{2}) + ... (s_{n} \cdot r_{n})\] \[\alpha = s \cdot r = \langle s, r \rangle = s^{T}r = ||s|| \cdot ||r|| cos(\Theta_{sr})\]

The cool thing about it is that it gives you the angle \(\theta\) between the two vectors. If the angle is greater than 90 then the dot product is negative

to get the angle you can just:

\[cos(\theta) = \frac{s \cdot r}{|s| \cdot |r|}\] \[\theta = arccos\left(\frac{s \cdot r}{|s| \cdot |r|}\right)\]

Dot Product Properties

full list of properties here

distributive: 𝐚 ⋅ 𝐛 + 𝐚 ⋅ 𝐜 = 𝐚 ⋅ ( 𝐛 + 𝐜 )
(Scalar Multiplication Property) For any two vectors A and B and any real number c, (cA).B = A.(cB) = c(A.B)
(Distributive Property) For any 3 vectors A, B and C, A.(B+C) = A.B + A.C.

Scalar Projection — \(s\) onto \(r\)

This is the length of the adjacent side \(t\) in the right triangle formed by dropping a perpendicular onto \(r\) from \(s\) — like a shadow.

\[|s| \cdot cos(\theta) = \left(\frac{s \cdot r}{|r|}\right)\]

Vector Projection — \(s\) onto \(r\)

This is the vector being projected on, scaled. the answer should be a vector in the same direction but a different length, the length of the shadow

\[r \left(\frac{s \cdot r}{r \cdot r}\right) = r \left(\frac{s \cdot r}{|r| \cdot |r|}\right)\]

Changing Basis

\(\newcommand{\ihat}{\mathbf {\hat{\imath}}} \newcommand{\jhat}{\mathbf {\hat{\jmath}}} \newcommand{\vect}{\mathbf} \newcommand\mycolv[1]{\begin{bmatrix}#1\end{bmatrix}}\) A vector is basically defined as the sum of \(\ihat\) and \(\jhat\) basis vectors scaled by the position of the end point of the vector, for example the unit vectors \(\mycolv{1\\0}\) and \(\mycolv{0\\1}\) in a basis \(e\).

\[v_{e} = \mycolv{ v_{\ihat} \\ v_{\jhat} } = v\ihat + v\jhat\]

To change from one basis to another, such as from \(e\) to \(b\), you get the vector projection of each basis vector from the new basis vectors (assuming they are orthogonal to each other, otherwise you need to use matrix transformations)

\[v_{b_{1}} = \frac{v_{e} \cdot b_{1}}{|b_{1}|^{2}}\] \[v_{b_{2}} = \frac{v_{e} \cdot b_{2}}{|b_{2}|^{2}}\]

The new b-based vector \(v\) will be the sum of the two newly-gotten basis vectors:

\[v_{b} = \mycolv{v_{b_{1}} \\ v_{b_{2}}}\]

Matrices

Matrix Transformations

\[\]

A vector is just the sum of the two basis vectors, scaled. therefore, you can kind of factor the vector when you transform it by a matrix.

Assumption: \(\mycolv{ X \\ Y } = X\mycolv{ 1 \\ 0 } + Y\mycolv{ 0 \\ 1 }\)

Therefore: \(\begin{bmatrix}a & b\\c & d\end{bmatrix} \mycolv{ X \\ Y } = X \mycolv{ a \\ c } + Y\mycolv{ b \\ d }\)

Giving us: \(\mycolv{ Xa + Yb \\ Xc + Yd }\)

Types of Transformations

https://www.thechalkface.net/resources/matrix_transformations.pdf

I(dentity Matrix) — doesn’t change anything, x by basis vectors: \(\begin{bmatrix}1 & 0\\0 & 1\end{bmatrix}\)

Flip in Place — switch the two angle vectors, keeping theta same: \(\begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}\)

Scale Matrix — squishes/stretches space: \(\begin{bmatrix}3 & 0\\0 & 2\end{bmatrix}\)

Horizontal Reflection: \(\begin{bmatrix}-1 & 0\\0 & 2\end{bmatrix}\)

Diagonal Reflection — (horz+vert): \(\begin{bmatrix}-1 & 0\\0 & -1\end{bmatrix}\)

Diagonal Flip/Inversion — (horz+vert): \(\begin{bmatrix}-1 & 0\\0 & -1\end{bmatrix}\)

Shear — unit square becomes parallelogram : \(\begin{bmatrix}1 & 1\\0 & 1\end{bmatrix}\)

Rotation — (horz+vert): \(\begin{bmatrix}0 & -1\\1& 0\end{bmatrix}\)

Rotation by \(\theta\) — (in 2D): \(\begin{bmatrix} cos(\theta) & -sin(\theta) \\ sin(\theta) & cos(\theta) \end{bmatrix}\)

Rotation by \(\theta\) — (in 3D, preserving \(Z\) axis): \(\begin{bmatrix} cos(\theta) & -sin(\theta) & 0 \\ sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 1 \end{bmatrix}\)

Composition + Combination

\[\begin{bmatrix} a & b \\ c & d \end{bmatrix}\begin{bmatrix} W & X \\ Y & Z \end{bmatrix} = \begin{bmatrix} aW+bY & aX+bX \\ cW+dY & cZ+dZ \end{bmatrix}\]

Order Matters!!!!! Matrix multiplication is NOT commutative! But it IS associative!

\[A \cdot B != B \cdot A\] \[A (B \cdot C) = (A \cdot B) C\]

Converting to Echelon Form - Gaussian Elimination

Echelon form is when it looks like \(\begin{bmatrix} 1 & X & X \\ 0 & 1 & X \\ 0 & 0 & 1 \end{bmatrix}\)

You can get there by progressively subtracting one of the rows from another row, and then dividing the row by whatever you need to scale it down to 1. (i.e. \(\begin{bmatrix}0 & 0 & -7\end{bmatrix}\begin{bmatrix}14\end{bmatrix}\) becomes \(\begin{bmatrix}0 & 0 & 1\end{bmatrix}\begin{bmatrix}-2\end{bmatrix}\)) Remember, the numbers in the matrix represent scalars of A, B, C etc.

Backsubstitution

Once the matrix is in echelon form it’s trivial to then find what the values of A, B, C, etc. are by just plugging them into the next row starting from the bottom.

Finding the Inverse Matrix

\[A \cdot A' = I\] \[\begin{bmatrix} A_{1} & B_{1} & C_{1} \\ A_{2} & B_{2} & C_{2} \\ A_{3} & B_{3} & C_{3} \end{bmatrix} \begin{bmatrix} ? & ? & ? \\ ? & ? & ? \\ ? & ? & ? \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\]

The inverse of a matrix is the matrix that will get you to the identity matrix when you multiply it with the original matrix. This is also a special case where the order doesn’t matter: \(A \cdot A' = I = A' \cdot A\)

The way to get the inverse is to hold the two matrices, A and I side by side, and whatever you do to A, you also do to I—a kind of symmetry. You first get A into Echelon form, then you go past that, and you turn it into the identity matrix.

\[Start: \begin{bmatrix} A_{1} & B_{1} & C_{1} \\ A_{2} & B_{2} & C_{2} \\ A_{3} & B_{3} & C_{3} \end{bmatrix} | \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\] \[Step 1: Echelon Form \begin{bmatrix} 1 & B_{A} & C_{1} \\ 0 & 1 & C_{2} \\ 0 & 0 & 1 \end{bmatrix} | \begin{bmatrix} ? & ? & ? \\ ? & ? & ? \\ ? & ? & ? \end{bmatrix}\] \[Step 2: Identity Form \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} | \begin{bmatrix} ? & ? & ? \\ ? & ? & ? \\ ? & ? & ? \end{bmatrix}\]

Once A is in the identity matrix form, all the transformations have also been done to the identity matrix, and the I matrix will be the inverse of A.

Determinants

The determinant is basically the area of the matrix transformation? you can do it by hand but mostly u can just use built in functions like det(A) . If the matrix isn’t linearly independent, the determinant is 0 (e.g. 2d is just a line scaled, or 3d is just a plane scaled with no volume), and the function also doesn’t have an inverse—transformations on it can’t be undone.

Multiplying Matrices

\[\begin{bmatrix} A_{11} & A_{12} & A_{13} \\A_{21} & A_{22} & A_{23} \\ A_{31} & A_{32} & A_{33} \end{bmatrix} \cdot \begin{bmatrix} B_{11} & B_{12} & B_{13} \\B_{21} & B_{22} & B_{23} \\ B_{31} & B_{32} & B_{33} \end{bmatrix}\]

If you have a matrix \(A\) & \(B\) and you want to multiply them to get element \((AB)_{23}\), what you do is multiply row \(A2\) by column \(B3\) — Row 2, Column 3 — remember, order matters!
\((AB)_{23} = A_{21}B_{13} + A_{22}B_{23} + A_{23}B_{33}\)

\[(AB)_{ik} = \sum_{j}a_{ij}b_{jk}\]

Linear Weighted Combination

Also known as “linear mixture”, “weighted combination”,(coefficient). Multiply different vectors by different scalars and then add the vectors together to produce a single vector.

\[w = \lambda_{1} v_{1} + \lambda_{2} v_{2} =... \lambda_{n} v_{n}\]

Outer Product

take a row vector and column vector and multiply them to create an MxN matrix

Einstein Summation Notation

The most horribly pretentious and useless fuckign shit ever jesus christ. fuck this shit. the name literally gives me the cringe heebie-jeebies. This notation simplifies the previous equation into just:

\[(AB)_{ik} = A_{ij}B_{jk}\]

Whish is literally not even a simplification. Here’s some other useless syntax from this notation:

Set Notation: \(\{e_{i}\}_{i} = \{ e_{1}, e_{2}, ... e_{i} \}\)

Multiplying Non-Square Matrices

Suppose you have two differently-shaped matrices:

\[\begin{bmatrix} A_{11} & A_{12} & A_{13} \\A_{21} & A_{22} & A_{23} \end{bmatrix} \cdot \begin{bmatrix} B_{11} & B_{12} & B_{13} & B_{14} \\B_{21} & B_{22} & B_{23} & B_{24} \\ B_{31} & B_{32} & B_{33} & B_{34} \end{bmatrix}\]

It’s still possible to multiply them, as long as the row in the first one is the same length as the column in the second one (same \(j\) from the Einstein Summation). The answer will have the same number of rows as the first matrix, and the same number of columns as the second.

If you have a horizontal matrix multiplied by a vertical matrix, then you get just a number, but if you have a vertical multiplied by a horizontal, then you get another matrix that is their height and width.

Multiplication vs Dot Product

The dot product is a single number whereas matrix multiplication produces another matrix.

Changing Basis With Matrices

\(R = \frac{1}{\sqrt{2}} \mycolv{1 & -1 \\ 1 & 1 }\) \(B = \mycolv{3 & 1 \\ 1 & 1 }\) \(B' = \mycolv{1 & -1 \\ -1 & 3 }\)

Using matrix transformations is necessary when the basis vectors are not orthonormal and you can’t just do a projection. For example, you have a vector \(\mycolv{X \\ Y }\) that’s in a weird basis \(B\), and you want to do a 45 degree rotation \(R\) with it. You don’t know what the transformation matrix to do a 45 degree rotation is in that basis, so what you can do is convert it into a standard 1/1 basis, then run the rotation transformation, then convert it back into the weird basis using the inverse. But in order of transformations this happens backwards, like so:

\[B' R B = R_{B}\]

(lol) The transformation matrix \(R\) is essentially wrapped around by \(B\) and \(B'\).

Transposition

\[A_{ij}^{T} = A_{ji}\]

This is basically just flipping the columns and rows like in google sheets.

What’s cool about this is that if A is orthonormal (orthogonal and unit vector length/square), then \(A^{T}\) is actually the inverse of A! When you multiply them, you get the identity matrix.

Gram-Schmidt Process

For example you want to make an orthnormal basis but you have 3 random vectors.

First you take the first vector and divide it by its length in order to make it a unit vector
Then you get the second vector in terms of the first vector and the projection of the second vector onto the first
???
Profit

\[T= ET_{E}E^{−1}\]

Eigenvectors & Eigenvalues

Whenever you do a transformation, it’s possible that some of the vectors will remain unchanged—either in length or in direction. For example, if you stretched a square vertically, the horizontal vector would remain unchanged, but the vertical one would be (for example) doubled in length but retaining the same span. The eigenvalue of the horizontal vector would be 1, but the vertical vector is still an eigenvalue because it’s still pointing in the same direction, but because its length has doubled, it has an eigenvalue of 2.

Calculating Eigenvectors

\[A x = \lambda x\]

Where x is a vector, A is an n-dimensional (square) transformation matrix and and \(\lambda\) is also an n-dimensional vector. The above equation can be rearranged, adding in the Identity matrix in order to perform the subtraction operation.

\[(A - \lambda I)x = 0\]

In this equation, either x must be 0 or the contents of the brackets must be 0. But if x is 0, then it has no length or direction, and it’s just a trivial solution, so the only interesting case is the former. You can test if the matrix operation will result in a 0 by calculating its determinant:

\[det(A-\lambda I) = 0\]

General Example (2D)

\[A = \mycolv{a&b\\c&d}\] \[det \left( \mycolv{a&b\\c&d} - \mycolv{\lambda & 0 \\ 0 & \lambda} \right) = 0\]

When you evaluate this determinant, you get a “characteristic polynomial”

\[\lambda^{2} - (a+d)\lambda +ad - bc = 0\]

The eigenvalues are the solutions to this equation, and when you plug them back into the original expression, you get the eigenvectors.

Specific Example (2D)

Using a transformation where the eigenvectors are known, for example, a vertical stretch:

\[A = \mycolv{1 & 0 \\ 0 & 2}\] \[det \left( \mycolv{ 1-\lambda & 0\\0 & 2 - \lambda} \right) = (1 - \lambda)(2-\lambda) = 0\]

This equation must have solutions at \(\lambda = 1\) and \(\lambda = 2\)

When you plug this back into the equation \((A - \lambda I)x = 0\) you get:

\[@ \lambda = 1: \mycolv{1-1 & 0\\ 0 & 2-1} \mycolv{x_{1}\\x_{2} } = \mycolv{0&0\\0&1}\mycolv{x_{1}\\x_{2}} = \mycolv{0\\x_{2}} = 0\] \[@ \lambda = 2: \mycolv{1-2 & 0\\ 0 & 2-2} \mycolv{x_{1}\\x_{2}} = \mycolv{-1&0\\0&0}\mycolv{x_{1}\\x_{2}} = \mycolv{-x_{1}\\0} = 0\]

In the first one, the x_2 term must be 0, but the x_1 term can be anything. This means that any vertical vector lying along that span will be an eigenvector, and same with the other solution:

\[@ \lambda = 1: x = \begin{pmatrix}t\\0\end{pmatrix}\] \[@\lambda = 2: x = \begin{pmatrix}0\\t\end{pmatrix}\]

Another Example (no eigenvalues)

A 90 rotation anti-clockwise: \(A = \mycolv{0 & -1 \\ 1 & 0}\)

Eigenbasis

There may be many situations where you have to repeatedly/recursively perform the same transformation on a vector, for example in a time series.

Imagine you have a particle in position \(v_{0}\), and you have a transformation \(T = \mycolv{a&b\\c&d}\) that describes how much it moves in one second, or one step. To get the position of \(v\) at a given moment \(n\) seconds after the starting time, you are basically repeating applying the transformation \(n\) times.

To get the first step, you apply \(T\) to \(v_{0}\), resulting in \(v_{1}\).

\[v_{1} = Tv_{0}\]

The next step of that series would be to apply the same \(T\) to \(v_{1}\) to get \(v_{2}\), and so on, for every second passed.

\[v_{n} = T^{n}v_{0}\]

With a normal matrix, this is hard to do, but if you have it in a diagonal form, it becomes very easy:

\[T^{n} = \mycolv{a&0&0\\0&b&0\\0&0&c}^{n} = \mycolv{a^{n}&0&0\\0&b^{n}&0\\0&0&c^{n}}\]

The way to change it into this form is to change it into an eigenbasis/diagonal form. In order to get the conversion matrix, you plug in each of the eigenvectors as columns (remember, each column of a transform matrix just represents the new locations of the original unit vectors):

\[C = \mycolv{\lambda_{1} & \lambda_{2} &\lambda_{3} \\ . & . & . \\. & . & . }\]

When you apply this to \(T\) you get the diagonal form \(D = \mycolv{\lambda_{1}&0&0\\0&\lambda_{2}&0\\0&0&\lambda_{3}}\)

So basically, you change it into the eigenbasis, then apply the transformation, and then turn it back into the original basis.

\[T = CDC^{-1}\]

But this represents only one step in the recursive function. If you chain them, it becomes:

\[T^{2} = CDC^{-1}CDC^{-1}\]

But since \(C^{-1}C\) just gets you back to where you started, you can remove them and condense it to just \(CDDC^{-1}\), or \(CD^{2}C^{-1}\). The generic formula for n times is then:

\[T^{n}= CD^{n}C^{-1}\]

Page Rank Algorithm

\[r^{i+1} = d(Lr^{i}) - \frac{1-d}{n}\]

Resources

Khan Academy videos
LinAlg for ML - Coursera
Linear Algebra with Javascript
3Blue1Brown video series
Wolfram - 5hr course w/ wolfram language
How to Master Linar Algebra (pathway)
Linear Algebra Done Right - videos to accompany textbook
MIT Gilbert Strang Lectures
https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/
https://www.youtube.com/watch?v=nbBvuuNVfco - SVD

Table of Contents