.. _sec_linear-algebra:
Linear Algebra
==============
Now that you can store and manipulate data, let us briefly review the
subset of basic linear algebra that you will need to understand and
implement most of models covered in this book. Below, we introduce the
basic mathematical objects, arithmetic, and operations in linear
algebra, expressing each of them through mathematical notation and the
corresponding implementation in code.
Scalars
-------
If you never studied linear algebra or machine learning, then your past
experience with math probably consisted of thinking about one number at
a time. And, if you ever balanced a checkbook or even paid for dinner at
a restaurant then you already know how to do basic things like adding
and multiplying pairs of numbers. For example, the temperature in Palo
Alto is :math:`52` degrees Fahrenheit. Formally, we call values
consisting of just one numerical quantity *scalars*. If you wanted to
convert this value to Celsius (the metric system's more sensible
temperature scale), you would evaluate the expression
:math:`c = \frac{5}{9}(f - 32)`, setting :math:`f` to :math:`52`. In
this equation, each of the terms---\ :math:`5`, :math:`9`, and
:math:`32`---are scalar values. The placeholders :math:`c` and :math:`f`
are called *variables* and they represent unknown scalar values.
In this book, we adopt the mathematical notation where scalar variables
are denoted by ordinary lower-cased letters (e.g., :math:`x`, :math:`y`,
and :math:`z`). We denote the space of all (continuous) *real-valued*
scalars by :math:`\mathbb{R}`. For expedience, we will punt on rigorous
definitions of what precisely *space* is, but just remember for now that
the expression :math:`x \in \mathbb{R}` is a formal way to say that
:math:`x` is a real-valued scalar. The symbol :math:`\in` can be
pronounced "in" and simply denotes membership in a set. Analogously, we
could write :math:`x, y \in \{0, 1\}` to state that :math:`x` and
:math:`y` are numbers whose value can only be :math:`0` or :math:`1`.
A scalar is represented by a tensor with just one element. In the next
snippet, we instantiate two scalars and perform some familiar arithmetic
operations with them, namely addition, multiplication, division, and
exponentiation.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
from mxnet import np, npx
npx.set_np()
x = np.array(3.0)
y = np.array(2.0)
x + y, x * y, x / y, x ** y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array(5.), array(6.), array(1.5), array(9.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import torch
x = torch.tensor(3.0)
y = torch.tensor(2.0)
x + y, x * y, x / y, x**y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor(5.), tensor(6.), tensor(1.5000), tensor(9.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import tensorflow as tf
x = tf.constant(3.0)
y = tf.constant(2.0)
x + y, x * y, x / y, x**y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
,
,
)
.. raw:: html
.. raw:: html
Vectors
-------
You can think of a vector as simply a list of scalar values. We call
these values the *elements* (*entries* or *components*) of the vector.
When our vectors represent examples from our dataset, their values hold
some real-world significance. For example, if we were training a model
to predict the risk that a loan defaults, we might associate each
applicant with a vector whose components correspond to their income,
length of employment, number of previous defaults, and other factors. If
we were studying the risk of heart attacks hospital patients potentially
face, we might represent each patient by a vector whose components
capture their most recent vital signs, cholesterol levels, minutes of
exercise per day, etc. In math notation, we will usually denote vectors
as bold-faced, lower-cased letters (e.g., :math:`\mathbf{x}`,
:math:`\mathbf{y}`, and :math:`\mathbf{z})`.
We work with vectors via one-dimensional tensors. In general tensors can
have arbitrary lengths, subject to the memory limits of your machine.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = np.arange(4)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([0., 1., 2., 3.])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = torch.arange(4)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([0, 1, 2, 3])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = tf.range(4)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
We can refer to any element of a vector by using a subscript. For
example, we can refer to the :math:`i^\mathrm{th}` element of
:math:`\mathbf{x}` by :math:`x_i`. Note that the element :math:`x_i` is
a scalar, so we do not bold-face the font when referring to it.
Extensive literature considers column vectors to be the default
orientation of vectors, so does this book. In math, a vector
:math:`\mathbf{x}` can be written as
.. math:: \mathbf{x} =\begin{bmatrix}x_{1} \\x_{2} \\ \vdots \\x_{n}\end{bmatrix},
:label: eq_vec_def
where :math:`x_1, \ldots, x_n` are elements of the vector. In code, we
access any element by indexing into the tensor.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x[3]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(3.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x[3]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(3)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x[3]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Length, Dimensionality, and Shape
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let us revisit some concepts from :numref:`sec_ndarray`. A vector is
just an array of numbers. And just as every array has a length, so does
every vector. In math notation, if we want to say that a vector
:math:`\mathbf{x}` consists of :math:`n` real-valued scalars, we can
express this as :math:`\mathbf{x} \in \mathbb{R}^n`. The length of a
vector is commonly called the *dimension* of the vector.
As with an ordinary Python array, we can access the length of a tensor
by calling Python's built-in ``len()`` function.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
len(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
4
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
len(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
4
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
len(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
4
.. raw:: html
.. raw:: html
When a tensor represents a vector (with precisely one axis), we can also
access its length via the ``.shape`` attribute. The shape is a tuple
that lists the length (dimensionality) along each axis of the tensor.
For tensors with just one axis, the shape has just one element.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(4,)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
torch.Size([4])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
TensorShape([4])
.. raw:: html
.. raw:: html
Note that the word "dimension" tends to get overloaded in these contexts
and this tends to confuse people. To clarify, we use the dimensionality
of a *vector* or an *axis* to refer to its length, i.e., the number of
elements of a vector or an axis. However, we use the dimensionality of a
tensor to refer to the number of axes that a tensor has. In this sense,
the dimensionality of some axis of a tensor will be the length of that
axis.
Matrices
--------
Just as vectors generalize scalars from order zero to order one,
matrices generalize vectors from order one to order two. Matrices, which
we will typically denote with bold-faced, capital letters (e.g.,
:math:`\mathbf{X}`, :math:`\mathbf{Y}`, and :math:`\mathbf{Z}`), are
represented in code as tensors with two axes.
In math notation, we use :math:`\mathbf{A} \in \mathbb{R}^{m \times n}`
to express that the matrix :math:`\mathbf{A}` consists of :math:`m` rows
and :math:`n` columns of real-valued scalars. Visually, we can
illustrate any matrix :math:`\mathbf{A} \in \mathbb{R}^{m \times n}` as
a table, where each element :math:`a_{ij}` belongs to the
:math:`i^{\mathrm{th}}` row and :math:`j^{\mathrm{th}}` column:
.. math:: \mathbf{A}=\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix}.
:label: eq_matrix_def
For any :math:`\mathbf{A} \in \mathbb{R}^{m \times n}`, the shape of
:math:`\mathbf{A}` is (:math:`m`, :math:`n`) or :math:`m \times n`.
Specifically, when a matrix has the same number of rows and columns, its
shape becomes a square; thus, it is called a *square matrix*.
We can create an :math:`m \times n` matrix by specifying a shape with
two components :math:`m` and :math:`n` when calling any of our favorite
functions for instantiating a tensor.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = np.arange(20).reshape(5, 4)
A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.],
[16., 17., 18., 19.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = torch.arange(20).reshape(5, 4)
A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = tf.reshape(tf.range(20), (5, 4))
A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
We can access the scalar element :math:`a_{ij}` of a matrix
:math:`\mathbf{A}` in :eq:`eq_matrix_def` by specifying the indices
for the row (:math:`i`) and column (:math:`j`), such as
:math:`[\mathbf{A}]_{ij}`. When the scalar elements of a matrix
:math:`\mathbf{A}`, such as in :eq:`eq_matrix_def`, are not given,
we may simply use the lower-case letter of the matrix :math:`\mathbf{A}`
with the index subscript, :math:`a_{ij}`, to refer to
:math:`[\mathbf{A}]_{ij}`. To keep notation simple, commas are inserted
to separate indices only when necessary, such as :math:`a_{2, 3j}` and
:math:`[\mathbf{A}]_{2i-1, 3}`.
Sometimes, we want to flip the axes. When we exchange a matrix's rows
and columns, the result is called the *transpose* of the matrix.
Formally, we signify a matrix :math:`\mathbf{A}`'s transpose by
:math:`\mathbf{A}^\top` and if :math:`\mathbf{B} = \mathbf{A}^\top`,
then :math:`b_{ij} = a_{ji}` for any :math:`i` and :math:`j`. Thus, the
transpose of :math:`\mathbf{A}` in :eq:`eq_matrix_def` is a
:math:`n \times m` matrix:
.. math::
\mathbf{A}^\top =
\begin{bmatrix}
a_{11} & a_{21} & \dots & a_{m1} \\
a_{12} & a_{22} & \dots & a_{m2} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1n} & a_{2n} & \dots & a_{mn}
\end{bmatrix}.
Now we access a matrix's transpose in code.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0., 4., 8., 12., 16.],
[ 1., 5., 9., 13., 17.],
[ 2., 6., 10., 14., 18.],
[ 3., 7., 11., 15., 19.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0, 4, 8, 12, 16],
[ 1, 5, 9, 13, 17],
[ 2, 6, 10, 14, 18],
[ 3, 7, 11, 15, 19]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.transpose(A)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
As a special type of the square matrix, a *symmetric matrix*
:math:`\mathbf{A}` is equal to its transpose:
:math:`\mathbf{A} = \mathbf{A}^\top`. Here we define a symmetric matrix
``B``.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = np.array([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[1., 2., 3.],
[2., 0., 4.],
[3., 4., 5.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[1, 2, 3],
[2, 0, 4],
[3, 4, 5]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = tf.constant([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Now we compare ``B`` with its transpose.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B == B.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B == B.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[True, True, True],
[True, True, True],
[True, True, True]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B == tf.transpose(B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Matrices are useful data structures: they allow us to organize data that
have different modalities of variation. For example, rows in our matrix
might correspond to different houses (data examples), while columns
might correspond to different attributes. This should sound familiar if
you have ever used spreadsheet software or have read
:numref:`sec_pandas`. Thus, although the default orientation of a
single vector is a column vector, in a matrix that represents a tabular
dataset, it is more conventional to treat each data example as a row
vector in the matrix. And, as we will see in later chapters, this
convention will enable common deep learning practices. For example,
along the outermost axis of a tensor, we can access or enumerate
minibatches of data examples, or just data examples if no minibatch
exists.
Tensors
-------
Just as vectors generalize scalars, and matrices generalize vectors, we
can build data structures with even more axes. Tensors ("tensors" in
this subsection refer to algebraic objects) give us a generic way of
describing :math:`n`-dimensional arrays with an arbitrary number of
axes. Vectors, for example, are first-order tensors, and matrices are
second-order tensors. Tensors are denoted with capital letters of a
special font face (e.g., :math:`\mathsf{X}`, :math:`\mathsf{Y}`, and
:math:`\mathsf{Z}`) and their indexing mechanism (e.g., :math:`x_{ijk}`
and :math:`[\mathsf{X}]_{1, 2i-1, 3}`) is similar to that of matrices.
Tensors will become more important when we start working with images,
which arrive as :math:`n`-dimensional arrays with 3 axes corresponding
to the height, width, and a *channel* axis for stacking the color
channels (red, green, and blue). For now, we will skip over higher order
tensors and focus on the basics.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = np.arange(24).reshape(2, 3, 4)
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]],
[[12., 13., 14., 15.],
[16., 17., 18., 19.],
[20., 21., 22., 23.]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = torch.arange(24).reshape(2, 3, 4)
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = tf.reshape(tf.range(24), (2, 3, 4))
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Basic Properties of Tensor Arithmetic
-------------------------------------
Scalars, vectors, matrices, and tensors ("tensors" in this subsection
refer to algebraic objects) of an arbitrary number of axes have some
nice properties that often come in handy. For example, you might have
noticed from the definition of an elementwise operation that any
elementwise unary operation does not change the shape of its operand.
Similarly, given any two tensors with the same shape, the result of any
binary elementwise operation will be a tensor of that same shape. For
example, adding two matrices of the same shape performs elementwise
addition over these two matrices.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = np.arange(20).reshape(5, 4)
B = A.copy() # Assign a copy of `A` to `B` by allocating new memory
A, A + B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.],
[16., 17., 18., 19.]]),
array([[ 0., 2., 4., 6.],
[ 8., 10., 12., 14.],
[16., 18., 20., 22.],
[24., 26., 28., 30.],
[32., 34., 36., 38.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone() # Assign a copy of `A` to `B` by allocating new memory
A, A + B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.],
[16., 17., 18., 19.]]),
tensor([[ 0., 2., 4., 6.],
[ 8., 10., 12., 14.],
[16., 18., 20., 22.],
[24., 26., 28., 30.],
[32., 34., 36., 38.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = tf.reshape(tf.range(20, dtype=tf.float32), (5, 4))
B = A # No cloning of `A` to `B` by allocating new memory
A, A + B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
Specifically, elementwise multiplication of two matrices is called their
*Hadamard product* (math notation :math:`\odot`). Consider matrix
:math:`\mathbf{B} \in \mathbb{R}^{m \times n}` whose element of row
:math:`i` and column :math:`j` is :math:`b_{ij}`. The Hadamard product
of matrices :math:`\mathbf{A}` (defined in :eq:`eq_matrix_def`) and
:math:`\mathbf{B}`
.. math::
\mathbf{A} \odot \mathbf{B} =
\begin{bmatrix}
a_{11} b_{11} & a_{12} b_{12} & \dots & a_{1n} b_{1n} \\
a_{21} b_{21} & a_{22} b_{22} & \dots & a_{2n} b_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} b_{m1} & a_{m2} b_{m2} & \dots & a_{mn} b_{mn}
\end{bmatrix}.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A * B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0., 1., 4., 9.],
[ 16., 25., 36., 49.],
[ 64., 81., 100., 121.],
[144., 169., 196., 225.],
[256., 289., 324., 361.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A * B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0., 1., 4., 9.],
[ 16., 25., 36., 49.],
[ 64., 81., 100., 121.],
[144., 169., 196., 225.],
[256., 289., 324., 361.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A * B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Multiplying or adding a tensor by a scalar also does not change the
shape of the tensor, where each element of the operand tensor will be
added or multiplied by the scalar.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = 2
X = np.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([[[ 2., 3., 4., 5.],
[ 6., 7., 8., 9.],
[10., 11., 12., 13.]],
[[14., 15., 16., 17.],
[18., 19., 20., 21.],
[22., 23., 24., 25.]]]),
(2, 3, 4))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[[ 2, 3, 4, 5],
[ 6, 7, 8, 9],
[10, 11, 12, 13]],
[[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]]),
torch.Size([2, 3, 4]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = 2
X = tf.reshape(tf.range(24), (2, 3, 4))
a + X, (a * X).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
TensorShape([2, 3, 4]))
.. raw:: html
.. raw:: html
.. _subseq_lin-alg-reduction:
Reduction
---------
One useful operation that we can perform with arbitrary tensors is to
calculate the sum of their elements. In mathematical notation, we
express sums using the :math:`\sum` symbol. To express the sum of the
elements in a vector :math:`\mathbf{x}` of length :math:`d`, we write
:math:`\sum_{i=1}^d x_i`. In code, we can just call the function for
calculating the sum.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = np.arange(4)
x, x.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([0., 1., 2., 3.]), array(6.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = torch.arange(4, dtype=torch.float32)
x, x.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([0., 1., 2., 3.]), tensor(6.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = tf.range(4, dtype=tf.float32)
x, tf.reduce_sum(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
We can express sums over the elements of tensors of arbitrary shape. For
example, the sum of the elements of an :math:`m \times n` matrix
:math:`\mathbf{A}` could be written
:math:`\sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}`.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((5, 4), array(190.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(torch.Size([5, 4]), tensor(190.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, tf.reduce_sum(A)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(TensorShape([5, 4]), )
.. raw:: html
.. raw:: html
By default, invoking the function for calculating the sum *reduces* a
tensor along all its axes to a scalar. We can also specify the axes
along which the tensor is reduced via summation. Take matrices as an
example. To reduce the row dimension (axis 0) by summing up elements of
all the rows, we specify ``axis=0`` when invoking the function. Since
the input matrix reduces along axis 0 to generate the output vector, the
dimension of axis 0 of the input is lost in the output shape.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A_sum_axis0 = A.sum(axis=0)
A_sum_axis0, A_sum_axis0.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([40., 45., 50., 55.]), (4,))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A_sum_axis0 = A.sum(axis=0)
A_sum_axis0, A_sum_axis0.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([40., 45., 50., 55.]), torch.Size([4]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A_sum_axis0 = tf.reduce_sum(A, axis=0)
A_sum_axis0, A_sum_axis0.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
TensorShape([4]))
.. raw:: html
.. raw:: html
Specifying ``axis=1`` will reduce the column dimension (axis 1) by
summing up elements of all the columns. Thus, the dimension of axis 1 of
the input is lost in the output shape.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([ 6., 22., 38., 54., 70.]), (5,))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([ 6., 22., 38., 54., 70.]), torch.Size([5]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A_sum_axis1 = tf.reduce_sum(A, axis=1)
A_sum_axis1, A_sum_axis1.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
TensorShape([5]))
.. raw:: html
.. raw:: html
Reducing a matrix along both rows and columns via summation is
equivalent to summing up all the elements of the matrix.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.sum(axis=[0, 1]) # Same as `A.sum()`
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(190.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.sum(axis=[0, 1]) # Same as `A.sum()`
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(190.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_sum(A, axis=[0, 1]) # Same as `tf.reduce_sum(A)`
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
A related quantity is the *mean*, which is also called the *average*. We
calculate the mean by dividing the sum by the total number of elements.
In code, we could just call the function for calculating the mean on
tensors of arbitrary shape.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(), A.sum() / A.size
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array(9.5), array(9.5))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(), A.sum() / A.numel()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor(9.5000), tensor(9.5000))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_mean(A), tf.reduce_sum(A) / tf.size(A).numpy()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
Likewise, the function for calculating the mean can also reduce a tensor
along the specified axes.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(axis=0), A.sum(axis=0) / A.shape[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([ 8., 9., 10., 11.]), array([ 8., 9., 10., 11.]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(axis=0), A.sum(axis=0) / A.shape[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([ 8., 9., 10., 11.]), tensor([ 8., 9., 10., 11.]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_mean(A, axis=0), tf.reduce_sum(A, axis=0) / A.shape[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
.. _subseq_lin-alg-non-reduction:
Non-Reduction Sum
~~~~~~~~~~~~~~~~~
However, sometimes it can be useful to keep the number of axes unchanged
when invoking the function for calculating the sum or mean.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
sum_A = A.sum(axis=1, keepdims=True)
sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 6.],
[22.],
[38.],
[54.],
[70.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
sum_A = A.sum(axis=1, keepdims=True)
sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 6.],
[22.],
[38.],
[54.],
[70.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
sum_A = tf.reduce_sum(A, axis=1, keepdims=True)
sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
For instance, since ``sum_A`` still keeps its two axes after summing
each row, we can divide ``A`` by ``sum_A`` with broadcasting.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A / sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0. , 0.16666667, 0.33333334, 0.5 ],
[0.18181819, 0.22727273, 0.27272728, 0.3181818 ],
[0.21052632, 0.23684211, 0.2631579 , 0.28947368],
[0.22222222, 0.24074075, 0.25925925, 0.2777778 ],
[0.22857143, 0.24285714, 0.25714287, 0.27142859]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A / sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[0.0000, 0.1667, 0.3333, 0.5000],
[0.1818, 0.2273, 0.2727, 0.3182],
[0.2105, 0.2368, 0.2632, 0.2895],
[0.2222, 0.2407, 0.2593, 0.2778],
[0.2286, 0.2429, 0.2571, 0.2714]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A / sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
If we want to calculate the cumulative sum of elements of ``A`` along
some axis, say ``axis=0`` (row by row), we can call the ``cumsum``
function. This function will not reduce the input tensor along any axis.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.cumsum(axis=0)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0., 1., 2., 3.],
[ 4., 6., 8., 10.],
[12., 15., 18., 21.],
[24., 28., 32., 36.],
[40., 45., 50., 55.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.cumsum(axis=0)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0., 1., 2., 3.],
[ 4., 6., 8., 10.],
[12., 15., 18., 21.],
[24., 28., 32., 36.],
[40., 45., 50., 55.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.cumsum(A, axis=0)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Dot Products
------------
So far, we have only performed elementwise operations, sums, and
averages. And if this was all we could do, linear algebra probably would
not deserve its own section. However, one of the most fundamental
operations is the dot product. Given two vectors
:math:`\mathbf{x}, \mathbf{y} \in \mathbb{R}^d`, their *dot product*
:math:`\mathbf{x}^\top \mathbf{y}` (or
:math:`\langle \mathbf{x}, \mathbf{y} \rangle`) is a sum over the
products of the elements at the same position:
:math:`\mathbf{x}^\top \mathbf{y} = \sum_{i=1}^{d} x_i y_i`.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
y = np.ones(4)
x, y, np.dot(x, y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([0., 1., 2., 3.]), array([1., 1., 1., 1.]), array(6.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
y = torch.ones(4, dtype = torch.float32)
x, y, torch.dot(x, y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
y = tf.ones(4, dtype=tf.float32)
x, y, tf.tensordot(x, y, axes=1)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
,
)
.. raw:: html
.. raw:: html
Note that we can express the dot product of two vectors equivalently by
performing an elementwise multiplication and then a sum:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.sum(x * y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(6.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.sum(x * y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(6.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_sum(x * y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Dot products are useful in a wide range of contexts. For example, given
some set of values, denoted by a vector
:math:`\mathbf{x} \in \mathbb{R}^d` and a set of weights denoted by
:math:`\mathbf{w} \in \mathbb{R}^d`, the weighted sum of the values in
:math:`\mathbf{x}` according to the weights :math:`\mathbf{w}` could be
expressed as the dot product :math:`\mathbf{x}^\top \mathbf{w}`. When
the weights are non-negative and sum to one (i.e.,
:math:`\left(\sum_{i=1}^{d} {w_i} = 1\right)`), the dot product
expresses a *weighted average*. After normalizing two vectors to have
the unit length, the dot products express the cosine of the angle
between them. We will formally introduce this notion of *length* later
in this section.
Matrix-Vector Products
----------------------
Now that we know how to calculate dot products, we can begin to
understand *matrix-vector products*. Recall the matrix
:math:`\mathbf{A} \in \mathbb{R}^{m \times n}` and the vector
:math:`\mathbf{x} \in \mathbb{R}^n` defined and visualized in
:eq:`eq_matrix_def` and :eq:`eq_vec_def` respectively. Let us
start off by visualizing the matrix :math:`\mathbf{A}` in terms of its
row vectors
.. math::
\mathbf{A}=
\begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_m \\
\end{bmatrix},
where each :math:`\mathbf{a}^\top_{i} \in \mathbb{R}^n` is a row vector
representing the :math:`i^\mathrm{th}` row of the matrix
:math:`\mathbf{A}`.
The matrix-vector product :math:`\mathbf{A}\mathbf{x}` is simply a
column vector of length :math:`m`, whose :math:`i^\mathrm{th}` element
is the dot product :math:`\mathbf{a}^\top_i \mathbf{x}`:
.. math::
\mathbf{A}\mathbf{x}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_m \\
\end{bmatrix}\mathbf{x}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \mathbf{x} \\
\mathbf{a}^\top_{2} \mathbf{x} \\
\vdots\\
\mathbf{a}^\top_{m} \mathbf{x}\\
\end{bmatrix}.
We can think of multiplication by a matrix
:math:`\mathbf{A}\in \mathbb{R}^{m \times n}` as a transformation that
projects vectors from :math:`\mathbb{R}^{n}` to :math:`\mathbb{R}^{m}`.
These transformations turn out to be remarkably useful. For example, we
can represent rotations as multiplications by a square matrix. As we
will see in subsequent chapters, we can also use matrix-vector products
to describe the most intensive calculations required when computing each
layer in a neural network given the values of the previous layer.
.. raw:: html
.. raw:: html
Expressing matrix-vector products in code with tensors, we use the same
``dot`` function as for dot products. When we call ``np.dot(A, x)`` with
a matrix ``A`` and a vector ``x``, the matrix-vector product is
performed. Note that the column dimension of ``A`` (its length along
axis 1) must be the same as the dimension of ``x`` (its length).
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, x.shape, np.dot(A, x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((5, 4), (4,), array([ 14., 38., 62., 86., 110.]))
.. raw:: html
.. raw:: html
Expressing matrix-vector products in code with tensors, we use the
``mv`` function. When we call ``torch.mv(A, x)`` with a matrix ``A`` and
a vector ``x``, the matrix-vector product is performed. Note that the
column dimension of ``A`` (its length along axis 1) must be the same as
the dimension of ``x`` (its length).
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, x.shape, torch.mv(A, x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(torch.Size([5, 4]), torch.Size([4]), tensor([ 14., 38., 62., 86., 110.]))
.. raw:: html
.. raw:: html
Expressing matrix-vector products in code with tensors, we use the
``matvec`` function. When we call ``tf.linalg.matvec(A, x)`` with a
matrix ``A`` and a vector ``x``, the matrix-vector product is performed.
Note that the column dimension of ``A`` (its length along axis 1) must
be the same as the dimension of ``x`` (its length).
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, x.shape, tf.linalg.matvec(A, x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(TensorShape([5, 4]),
TensorShape([4]),
)
.. raw:: html
.. raw:: html
Matrix-Matrix Multiplication
----------------------------
If you have gotten the hang of dot products and matrix-vector products,
then *matrix-matrix multiplication* should be straightforward.
Say that we have two matrices
:math:`\mathbf{A} \in \mathbb{R}^{n \times k}` and
:math:`\mathbf{B} \in \mathbb{R}^{k \times m}`:
.. math::
\mathbf{A}=\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1k} \\
a_{21} & a_{22} & \cdots & a_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nk} \\
\end{bmatrix},\quad
\mathbf{B}=\begin{bmatrix}
b_{11} & b_{12} & \cdots & b_{1m} \\
b_{21} & b_{22} & \cdots & b_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
b_{k1} & b_{k2} & \cdots & b_{km} \\
\end{bmatrix}.
Denote by :math:`\mathbf{a}^\top_{i} \in \mathbb{R}^k` the row vector
representing the :math:`i^\mathrm{th}` row of the matrix
:math:`\mathbf{A}`, and let :math:`\mathbf{b}_{j} \in \mathbb{R}^k` be
the column vector from the :math:`j^\mathrm{th}` column of the matrix
:math:`\mathbf{B}`. To produce the matrix product
:math:`\mathbf{C} = \mathbf{A}\mathbf{B}`, it is easiest to think of
:math:`\mathbf{A}` in terms of its row vectors and :math:`\mathbf{B}` in
terms of its column vectors:
.. math::
\mathbf{A}=
\begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_n \\
\end{bmatrix},
\quad \mathbf{B}=\begin{bmatrix}
\mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{m} \\
\end{bmatrix}.
Then the matrix product :math:`\mathbf{C} \in \mathbb{R}^{n \times m}`
is produced as we simply compute each element :math:`c_{ij}` as the dot
product :math:`\mathbf{a}^\top_i \mathbf{b}_j`:
.. math::
\mathbf{C} = \mathbf{AB} = \begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_n \\
\end{bmatrix}
\begin{bmatrix}
\mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{m} \\
\end{bmatrix}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \mathbf{b}_1 & \mathbf{a}^\top_{1}\mathbf{b}_2& \cdots & \mathbf{a}^\top_{1} \mathbf{b}_m \\
\mathbf{a}^\top_{2}\mathbf{b}_1 & \mathbf{a}^\top_{2} \mathbf{b}_2 & \cdots & \mathbf{a}^\top_{2} \mathbf{b}_m \\
\vdots & \vdots & \ddots &\vdots\\
\mathbf{a}^\top_{n} \mathbf{b}_1 & \mathbf{a}^\top_{n}\mathbf{b}_2& \cdots& \mathbf{a}^\top_{n} \mathbf{b}_m
\end{bmatrix}.
We can think of the matrix-matrix multiplication :math:`\mathbf{AB}` as
simply performing :math:`m` matrix-vector products and stitching the
results together to form an :math:`n \times m` matrix. In the following
snippet, we perform matrix multiplication on ``A`` and ``B``.
Here, \ ``A`` is a matrix with 5 rows and 4 columns, and ``B`` is a
matrix with 4 rows and 3 columns. After multiplication, we obtain a
matrix with 5 rows and 3 columns.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = np.ones(shape=(4, 3))
np.dot(A, B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 6., 6., 6.],
[22., 22., 22.],
[38., 38., 38.],
[54., 54., 54.],
[70., 70., 70.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = torch.ones(4, 3)
torch.mm(A, B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 6., 6., 6.],
[22., 22., 22.],
[38., 38., 38.],
[54., 54., 54.],
[70., 70., 70.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = tf.ones((4, 3), tf.float32)
tf.matmul(A, B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Matrix-matrix multiplication can be simply called *matrix
multiplication*, and should not be confused with the Hadamard product.
.. _subsec_lin-algebra-norms:
Norms
-----
Some of the most useful operators in linear algebra are *norms*.
Informally, the norm of a vector tells us how *big* a vector is. The
notion of *size* under consideration here concerns not dimensionality
but rather the magnitude of the components.
In linear algebra, a vector norm is a function :math:`f` that maps a
vector to a scalar, satisfying a handful of properties. Given any vector
:math:`\mathbf{x}`, the first property says that if we scale all the
elements of a vector by a constant factor :math:`\alpha`, its norm also
scales by the *absolute value* of the same constant factor:
.. math:: f(\alpha \mathbf{x}) = |\alpha| f(\mathbf{x}).
The second property is the familiar triangle inequality:
.. math:: f(\mathbf{x} + \mathbf{y}) \leq f(\mathbf{x}) + f(\mathbf{y}).
The third property simply says that the norm must be non-negative:
.. math:: f(\mathbf{x}) \geq 0.
That makes sense, as in most contexts the smallest *size* for anything
is 0. The final property requires that the smallest norm is achieved and
only achieved by a vector consisting of all zeros.
.. math:: \forall i, [\mathbf{x}]_i = 0 \Leftrightarrow f(\mathbf{x})=0.
You might notice that norms sound a lot like measures of distance. And
if you remember Euclidean distances (think Pythagoras' theorem) from
grade school, then the concepts of non-negativity and the triangle
inequality might ring a bell. In fact, the Euclidean distance is a norm:
specifically it is the :math:`L_2` norm. Suppose that the elements in
the :math:`n`-dimensional vector :math:`\mathbf{x}` are
:math:`x_1, \ldots, x_n`.
The :math:`L_2` *norm* of :math:`\mathbf{x}` is the square root of the
sum of the squares of the vector elements:
.. math:: \|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2},
where the subscript :math:`2` is often omitted in :math:`L_2` norms,
i.e., :math:`\|\mathbf{x}\|` is equivalent to :math:`\|\mathbf{x}\|_2`.
In code, we can calculate the :math:`L_2` norm of a vector as follows.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
u = np.array([3, -4])
np.linalg.norm(u)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(5.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
u = torch.tensor([3.0, -4.0])
torch.norm(u)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(5.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
u = tf.constant([3.0, -4.0])
tf.norm(u)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
In deep learning, we work more often with the squared :math:`L_2` norm.
You will also frequently encounter the :math:`L_1` *norm*, which is
expressed as the sum of the absolute values of the vector elements:
.. math:: \|\mathbf{x}\|_1 = \sum_{i=1}^n \left|x_i \right|.
As compared with the :math:`L_2` norm, it is less influenced by
outliers. To calculate the :math:`L_1` norm, we compose the absolute
value function with a sum over the elements.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.abs(u).sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(7.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.abs(u).sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(7.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_sum(tf.abs(u))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Both the :math:`L_2` norm and the :math:`L_1` norm are special cases of
the more general :math:`L_p` *norm*:
.. math:: \|\mathbf{x}\|_p = \left(\sum_{i=1}^n \left|x_i \right|^p \right)^{1/p}.
Analogous to :math:`L_2` norms of vectors, the *Frobenius norm* of a
matrix :math:`\mathbf{X} \in \mathbb{R}^{m \times n}` is the square root
of the sum of the squares of the matrix elements:
.. math:: \|\mathbf{X}\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n x_{ij}^2}.
The Frobenius norm satisfies all the properties of vector norms. It
behaves as if it were an :math:`L_2` norm of a matrix-shaped vector.
Invoking the following function will calculate the Frobenius norm of a
matrix.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.linalg.norm(np.ones((4, 9)))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(6.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.norm(torch.ones((4, 9)))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(6.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.norm(tf.ones((4, 9)))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
.. _subsec_norms_and_objectives:
Norms and Objectives
~~~~~~~~~~~~~~~~~~~~
While we do not want to get too far ahead of ourselves, we can plant
some intuition already about why these concepts are useful. In deep
learning, we are often trying to solve optimization problems: *maximize*
the probability assigned to observed data; *minimize* the distance
between predictions and the ground-truth observations. Assign vector
representations to items (like words, products, or news articles) such
that the distance between similar items is minimized, and the distance
between dissimilar items is maximized. Oftentimes, the objectives,
perhaps the most important components of deep learning algorithms
(besides the data), are expressed as norms.
More on Linear Algebra
----------------------
In just this section, we have taught you all the linear algebra that you
will need to understand a remarkable chunk of modern deep learning.
There is a lot more to linear algebra and a lot of that mathematics is
useful for machine learning. For example, matrices can be decomposed
into factors, and these decompositions can reveal low-dimensional
structure in real-world datasets. There are entire subfields of machine
learning that focus on using matrix decompositions and their
generalizations to high-order tensors to discover structure in datasets
and solve prediction problems. But this book focuses on deep learning.
And we believe you will be much more inclined to learn more mathematics
once you have gotten your hands dirty deploying useful machine learning
models on real datasets. So while we reserve the right to introduce more
mathematics much later on, we will wrap up this section here.
If you are eager to learn more about linear algebra, you may refer to
either the `online appendix on linear algebraic
operations `__
or other excellent resources
:cite:`Strang.1993,Kolter.2008,Petersen.Pedersen.ea.2008`.
Summary
-------
- Scalars, vectors, matrices, and tensors are basic mathematical
objects in linear algebra.
- Vectors generalize scalars, and matrices generalize vectors.
- Scalars, vectors, matrices, and tensors have zero, one, two, and an
arbitrary number of axes, respectively.
- A tensor can be reduced along the specified axes by ``sum`` and
``mean``.
- Elementwise multiplication of two matrices is called their Hadamard
product. It is different from matrix multiplication.
- In deep learning, we often work with norms such as the :math:`L_1`
norm, the :math:`L_2` norm, and the Frobenius norm.
- We can perform a variety of operations over scalars, vectors,
matrices, and tensors.
Exercises
---------
1. Prove that the transpose of a matrix :math:`\mathbf{A}`'s transpose
is :math:`\mathbf{A}`: :math:`(\mathbf{A}^\top)^\top = \mathbf{A}`.
2. Given two matrices :math:`\mathbf{A}` and :math:`\mathbf{B}`, show
that the sum of transposes is equal to the transpose of a sum:
:math:`\mathbf{A}^\top + \mathbf{B}^\top = (\mathbf{A} + \mathbf{B})^\top`.
3. Given any square matrix :math:`\mathbf{A}`, is
:math:`\mathbf{A} + \mathbf{A}^\top` always symmetric? Why?
4. We defined the tensor ``X`` of shape (2, 3, 4) in this section. What
is the output of ``len(X)``?
5. For a tensor ``X`` of arbitrary shape, does ``len(X)`` always
correspond to the length of a certain axis of ``X``? What is that
axis?
6. Run ``A / A.sum(axis=1)`` and see what happens. Can you analyze the
reason?
7. When traveling between two points in Manhattan, what is the distance
that you need to cover in terms of the coordinates, i.e., in terms of
avenues and streets? Can you travel diagonally?
8. Consider a tensor with shape (2, 3, 4). What are the shapes of the
summation outputs along axis 0, 1, and 2?
9. Feed a tensor with 3 or more axes to the ``linalg.norm`` function and
observe its output. What does this function compute for tensors of
arbitrary shape?
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html