@[toc]

Scalars, Vectors, Matrices, and Tensors

  • Scalar: A scalar is a single number, distinct from most other objects studied in linear algebra (which are usually arrays of numbers). We denote scalars with italic lowercase letters, e.g., \(x\).
  • Vector: A vector is a column of numbers. These numbers are ordered, and each individual number can be identified by its position (index) in the sequence. Vectors are typically denoted with bold lowercase letters, e.g., \({\bf x}\).
    $$
    {\bf x} = \left[ \begin{matrix} x_1 \ x_2 \ \vdots \ x_n \end{matrix} \right] \tag{1}
    $$
  • Matrix: A matrix is a 2D array where each element is determined by two indices (instead of one). Matrices are usually denoted with bold uppercase letters, e.g., \({\bf A}\).
    $$
    {\bf A} = \left[ \begin{matrix} A_{1,1} & A_{1,2} \ A_{2,1} & A_{2,2} \end{matrix} \right] \tag{2}
    $$
  • Tensor: In some contexts, we consider arrays with more than two dimensions. Generally, an array whose elements are distributed on a regular grid of coordinates across multiple dimensions is called a tensor, denoted by \(\sf A\). The element of tensor \(\sf A\) at coordinates \((x,y,z)\) is written as \(A_{x,y,z}\).

Python Code Implementation

Creating a regular 2D matrix

import numpy as np

m = np.mat([[1, 2, 3], [4, 5, 6]])
print(m)

Output:

[[1 2 3]
 [4 5 6]]

Creating a \(3 \times 2\) zero matrix using zeros, and a one matrix using ones

from numpy import *
import numpy as np

m = np.mat(zeros((3, 2)))
print(m)

Output:

[[0. 0.]
 [0. 0.]
 [0. 0.]]

Creating an identity matrix (see the section on identity matrices and inverse matrices for details)

from numpy import *
import numpy as np

m = np.mat(eye(3, 3, dtype=int))
print(m)

Output:

[[1 0 0]
 [0 1 0]
 [0 0 1]]

Transpose

Transpose is a crucial operation on matrices. The transpose of a matrix is its mirror image across the main diagonal (the diagonal from the top-left to bottom-right). The transpose of matrix \(\bf A\) is denoted \(\bf A^\tau\), defined by:
$$
({\bf A}^\tau){i,j} = A} \tag{3
$$
A scalar can be viewed as a matrix with a single element. Thus, the transpose of a scalar equals itself: \(a = a^\tau\).
$$
{\bf A} = \left[ \begin{matrix} A_{1,1} & A_{1,2} \ A_{2,1} & A_{2,2} \ A_{3,1} & A_{3,2} \end{matrix} \right] \Rightarrow {\bf A}^\tau = \left[ \begin{matrix} A_{1,1} & A_{2,1} & A_{3,1} \ A_{1,2} & A_{2,2} & A_{3,2} \end{matrix} \right] \tag{4}
$$

The transpose of a matrix is an operation satisfying the following rules (assuming all operations are valid):
1. \(({\bf A}^\tau)^\tau = {\bf A}\)
2. \(({\bf A} + {\bf B})^\tau = {\bf A}^\tau + {\bf B}^\tau\)
3. \((\lambda {\bf A})^\tau = \lambda {\bf A}^\tau\)
4. \(({\bf A}{\bf B})^\tau = {\bf B}^\tau {\bf A}^\tau\)

In deep learning, we use unconventional notation where matrix addition with a vector is allowed: \({\bf C} = {\bf A} + {\bf a}\), where \(C_{i,j} = A_{i,j} + a_j\). This means the vector \(\bf a\) is added to each row of matrix \(\bf A\). This implicit replication of vector \(\bf a\) to many rows is called broadcasting.

Python Code Implementation

Matrix Transpose

# coding=utf-8
import numpy as np

m = np.mat([[1, 2, 3], [4, 5, 6]])
print('Transpose before:\n%s' % m)
t = m.T
print('Transpose after:\n%s' % t)

Output:

Transpose before:
[[1 2 3]
 [4 5 6]]
Transpose after:
[[1 4]
 [2 5]
 [3 6]]

Matrix Operations

1. Matrix Addition

Definition: Let \(\bf A = (a_{i,j})\) and \(\bf B = (b_{i,j})\) be two \(m \times n\) matrices. Their sum \(\bf A + B\) is defined as:
$$
{\bf A} + {\bf B} = \left[ \begin{matrix} a_{1,1} + b_{1,1} & a_{1,2} + b_{1,2} & \cdots & a_{1,n} + b_{1,n} \ a_{2,1} + b_{2,1} & a_{2,2} + b_{2,2} & \cdots & a_{2,n} + b_{2,n} \ \vdots & \vdots & \vdots & \vdots \ a_{m,1} + b_{m,1} & a_{m,2} + b_{m,2} & \cdots & a_{m,n} + b_{m,n} \end{matrix} \right] \tag{5}
$$

Note: Two matrices can only be added if they have the same dimensions.
Matrix addition satisfies the following laws (let \(\bf A, B, C\) be \(m \times n\) matrices):
1. \({\bf A} + {\bf B} = {\bf B} + {\bf A}\)
2. \(({\bf A} + {\bf B}) + {\bf C} = {\bf A} + ({\bf B} + {\bf C})\)

Python Code Implementation

Adding two matrices of the same size

import numpy as np

m1 = np.mat([[1, 2, 3], [4, 5, 6]])
m2 = np.mat([[11, 12, 13], [14, 15, 16]])
print("m1 + m2 = \n%s " % (m1 + m2))

Output:

m1 + m2 = 
[[12 14 16]
 [18 20 22]] 

2. Matrix Multiplication

Definition of Scalar-Matrix Multiplication: The product of a scalar \(\lambda\) and matrix \(\bf A\) is denoted \(\lambda {\bf A}\) or \({\bf A} \lambda\), defined by:
$$
\lambda {\bf A} = {\bf A} \lambda = \left[ \begin{matrix} \lambda a_{1,1} & \lambda a_{1,2} & \cdots & \lambda a_{1,n} \ \lambda a_{2,1} & \lambda a_{2,2} & \cdots & \lambda a_{2,n} \ \vdots & \vdots & \vdots & \vdots \ \lambda a_{m,1} & \lambda a_{m,2} & \cdots & \lambda a_{m,n} \end{matrix} \right] \tag{6}
$$

Scalar multiplication satisfies:
1. \((\lambda \mu){\bf A} = \lambda (\mu {\bf A})\)
2. \((\lambda + \mu){\bf A} = \lambda {\bf A} + \mu {\bf A}\)
3. \(\lambda({\bf A} + {\bf B}) = \lambda {\bf A} + \lambda {\bf B}\)
Matrix addition and scalar multiplication are collectively called linear operations on matrices.

Definition of Matrix-Matrix Multiplication: Let \(\bf A = (a_{i,j})\) be an \(m \times s\) matrix and \(\bf B = (b_{i,j})\) be an \(s \times n\) matrix. Their product \(\bf C = {\bf A}{\bf B}\) is an \(m \times n\) matrix defined by:
$$
{\bf C} = {\bf A}{\bf B} \tag{7}
$$
where the element \(c_{i,j}\) is computed as:
$$
\left[ \begin{matrix} a_{i,1} & a_{i,2} & \cdots & a_{i,s} \end{matrix} \right] \left[ \begin{matrix} b_{1,j} \ b_{2,j} \ \vdots \ b_{s,j} \end{matrix} \right] = \sum_{k=1}^s a_{i,k} b_{k,j} = c_{i,j} \tag{8}
$$
For example:
$$
\left[ \begin{matrix} a_{1,1} & a_{1,2} & a_{1,3} \ a_{2,1} & a_{2,2} & a_{2,3} \end{matrix} \right] \left[ \begin{matrix} b_{1,1} & b_{1,2} \ b_{2,1} & b_{2,2} \ b_{3,1} & b_{3,2} \end{matrix} \right] = \left[ \begin{matrix} a_{1,1}b_{1,1} + a_{1,2}b_{2,1} + a_{1,3}b_{3,1} & a_{1,1}b_{1,2} + a_{1,2}b_{2,2} + a_{1,3}b_{3,2} \ a_{2,1}b_{1,1} + a_{2,2}b_{2,1} + a_{2,3}b_{3,1} & a_{2,1}b_{1,2} + a_{2,2}b_{2,2} + a_{2,3}b_{3,2} \end{matrix} \right] \tag{9}
$$

Matrices do not commute, but satisfy associativity and distributivity when operations are valid:
1. \(({\bf A}{\bf B}){\bf C} = {\bf A}({\bf B}{\bf C})\)
2. \(\lambda ({\bf A}{\bf B}) = (\lambda {\bf A}){\bf B} = {\bf A}(\lambda {\bf B})\) (where \(\lambda\) is a scalar)
3. \({\bf A}({\bf B} + {\bf C}) = {\bf A}{\bf B} + {\bf A}{\bf C}\), \(({\bf B} + {\bf C}){\bf A} = {\bf B}{\bf A} + {\bf C}{\bf A}\)

Python Code Implementation

Multiplying a \(2 \times 3\) matrix with a \(3 \times 2\) matrix

import numpy as np

m1 = np.mat([[1, 2, 3], [4, 5, 6]])
m2 = np.mat([[11, 12], [13, 14], [15, 16]])
print("m1 * m2 = \n%s " % (m1 * m2))

Output:

m1 * m2 = 
[[ 82  88]
 [199 214]] 

Identity Matrix and Inverse Matrix

The identity matrix \(\bf I_n\) is a square matrix with 1s on the main diagonal and 0s elsewhere:
$$
{\bf I}_3 = \left[ \begin{matrix} 1 & 0 & 0 \ 0 & 1 & 0 \ 0 & 0 & 1 \end{matrix} \right] \tag{10}
$$

Definition of Inverse Matrix: For an \(n \times n\) matrix \(\bf A\), if there exists an \(n \times n\) matrix \(\bf B\) such that:
$$
{\bf A}{\bf B} = {\bf B}{\bf A} = {\bf I} \tag{11}
$$
then \(\bf A\) is invertible, and \(\bf B\) is called the inverse of \(\bf A\), denoted \(\bf A^{-1}\). A matrix is invertible if and only if its determinant is non-zero (\(|{\bf A}| \neq 0\)), and:
$$
{\bf A}^{-1} = \frac{1}{|{\bf A}|} {\bf A}^\ast \tag{12}
$$
where \({\bf A}^\ast\) is the adjugate matrix of \(\bf A\).

Python Code Implementation

Calculating the identity matrix

from numpy import *
import numpy as np

m = np.mat(eye(3, 3, dtype=int))
print(m)

Output:

[[1 0 0]
 [0 1 0]
 [0 0 1]]

Calculating the inverse of a \(3 \times 3\) matrix

# coding=utf-8
import numpy as np

m = np.mat([[2, 0, 0], [0, 4, 0], [0, 0, 8]])
I = m.I
print('Matrix:\n%s\nInverse matrix:\n%s' % (m, I))

Output:

Matrix:
[[2 0 0]
 [0 4 0]
 [0 0 8]]
Inverse matrix:
[[0.5   0.    0.   ]
 [0.    0.25  0.   ]
 [0.    0.    0.125]]

Calculating the determinant of a \(3 \times 3\) matrix

# coding=utf-8
import numpy as np

m = np.mat([[2, 0, 0], [0, 4, 0], [0, 0, 8]])
d = np.linalg.det(m)
print(d)

Output:

64.0

Calculating the adjugate matrix of a \(3 \times 3\) matrix

import numpy as np

m = np.mat([[2, 0, 0], [0, 4, 0], [0, 0, 8]])
i = m.I
d = np.linalg.det(m)
a = i * d
print(a)

Output:

[[32.  0.  0.]
 [ 0. 16.  0.]
 [ 0.  0.  8.]]

Linear Dependence and Span

Linear Combination: To analyze the number of solutions to an equation, we can interpret the columns of \(\bf A\) as directions from the origin (the zero vector). The number of ways to reach vector \(\bf b\) corresponds to finding scalars \(x_i\) such that:
$$
{\bf A}{\bf x} = \sum_i x_i {\bf A}_{:,i} \tag{13}
$$
where \({\bf A}_{:,i}\) is the \(i\)-th column of \(\bf A\).

Span: Formally, the set of all linear combinations of a set of vectors \(\{v^{(1)}, v^{(2)}, \dots, v^{(n)}\}\) is the span of these vectors. It is the set of all points reachable by linear combinations of the vectors:
$$
\sum_i c_i v^{(i)} \tag{14}
$$


Norms

Norm: In machine learning, we often use norms to measure the “size” of vectors. The \(L^p\) norm is defined as:
$$
|{\bf x}|_p = \left( \sum_i |x_i|^p \right)^{\frac{1}{p}} \tag{15}
$$
Norms are functions satisfying:
- \(f({\bf x}) = 0 \iff {\bf x} = 0\)
- \(f({\bf x} + {\bf y}) \leq f({\bf x}) + f({\bf y})\) (triangle inequality)
- \(\forall \alpha \in \mathbb{R}, f(\alpha {\bf x}) = |\alpha| f({\bf x})\)

When \(p = 2\), the \(L^2\) norm is the Euclidean norm, representing the distance from the origin to the point defined by \(\bf x\).

When \(p = \infty\), the \(L^\infty\) norm is the max norm, defined as the absolute value of the largest element in the vector:
$$
|{\bf x}|_\infty = \max_i |x_i|
$$


References

  1. Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning (Chinese Edition). Translated by Zhao Shenjian, Li Yujun, Fu Tianfan, Li Kai. Beijing: People’s Posts & Telecommunications Press.
  2. Department of Mathematics, Tongji
Xiaoye