.. _sec_ndarray:
Data Manipulation
=================
In order to get anything done, we need some way to store and manipulate
data. Generally, there are two important things we need to do with data:
(i) acquire them; and (ii) process them once they are inside the
computer. There is no point in acquiring data without some way to store
it, so let us get our hands dirty first by playing with synthetic data.
To start, we introduce the :math:`n`-dimensional array, which is also
called the *tensor*.
If you have worked with NumPy, the most widely-used scientific computing
package in Python, then you will find this section familiar. No matter
which framework you use, its *tensor class* (``ndarray`` in MXNet,
``Tensor`` in both PyTorch and TensorFlow) is similar to NumPy's
``ndarray`` with a few killer features. First, GPU is well-supported to
accelerate the computation whereas NumPy only supports CPU computation.
Second, the tensor class supports automatic differentiation. These
properties make the tensor class suitable for deep learning. Throughout
the book, when we say tensors, we are referring to instances of the
tensor class unless otherwise stated.
Getting Started
---------------
In this section, we aim to get you up and running, equipping you with
the basic math and numerical computing tools that you will build on as
you progress through the book. Do not worry if you struggle to grok some
of the mathematical concepts or library functions. The following
sections will revisit this material in the context of practical examples
and it will sink in. On the other hand, if you already have some
background and want to go deeper into the mathematical content, just
skip this section.
.. raw:: html
.. raw:: html
To start, we import the ``np`` (``numpy``) and ``npx``
(``numpy_extension``) modules from MXNet. Here, the ``np`` module
includes functions supported by NumPy, while the ``npx`` module contains
a set of extensions developed to empower deep learning within a
NumPy-like environment. When using tensors, we almost always invoke the
``set_np`` function: this is for compatibility of tensor processing by
other components of MXNet.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
from mxnet import np, npx
npx.set_np()
.. raw:: html
.. raw:: html
To start, we import ``torch``. Note that though it's called PyTorch, we
should import ``torch`` instead of ``pytorch``.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import torch
.. raw:: html
.. raw:: html
To start, we import ``tensorflow``. As the name is a little long, we
often import it with a short alias ``tf``.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import tensorflow as tf
.. raw:: html
.. raw:: html
A tensor represents a (possibly multi-dimensional) array of numerical
values. With one axis, a tensor is called a *vector*. With two axes, a
tensor is called a *matrix*. With :math:`k > 2` axes, we drop the
specialized names and just refer to the object as a
:math:`k^\mathrm{th}` *order tensor*.
.. raw:: html
.. raw:: html
MXNet provides a variety of functions for creating new tensors
prepopulated with values. For example, by invoking ``arange(n)``, we can
create a vector of evenly spaced values, starting at 0 (included) and
ending at ``n`` (not included). By default, the interval size is
:math:`1`. Unless otherwise specified, new tensors are stored in main
memory and designated for CPU-based computation.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = np.arange(12)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])
.. raw:: html
.. raw:: html
PyTorch provides a variety of functions for creating new tensors
prepopulated with values. For example, by invoking ``arange(n)``, we can
create a vector of evenly spaced values, starting at 0 (included) and
ending at ``n`` (not included). By default, the interval size is
:math:`1`. Unless otherwise specified, new tensors are stored in main
memory and designated for CPU-based computation.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = torch.arange(12, dtype=torch.float32)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])
.. raw:: html
.. raw:: html
TensorFlow provides a variety of functions for creating new tensors
prepopulated with values. For example, by invoking ``range(n)``, we can
create a vector of evenly spaced values, starting at 0 (included) and
ending at ``n`` (not included). By default, the interval size is
:math:`1`. Unless otherwise specified, new tensors are stored in main
memory and designated for CPU-based computation.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = tf.range(12, dtype=tf.float32)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
We can access a tensor's *shape* (the length along each axis) by
inspecting its ``shape`` property.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(12,)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
torch.Size([12])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
TensorShape([12])
.. raw:: html
.. raw:: html
If we just want to know the total number of elements in a tensor, i.e.,
the product of all of the shape elements, we can inspect its size.
Because we are dealing with a vector here, the single element of its
``shape`` is identical to its size.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.size
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
12
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.numel()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
12
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.size(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
To change the shape of a tensor without altering either the number of
elements or their values, we can invoke the ``reshape`` function. For
example, we can transform our tensor, ``x``, from a row vector with
shape (12,) to a matrix with shape (3, 4). This new tensor contains the
exact same values, but views them as a matrix organized as 3 rows and 4
columns. To reiterate, although the shape has changed, the elements have
not. Note that the size is unaltered by reshaping.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = x.reshape(3, 4)
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = x.reshape(3, 4)
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = tf.reshape(x, (3, 4))
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Reshaping by manually specifying every dimension is unnecessary. If our
target shape is a matrix with shape (height, width), then after we know
the width, the height is given implicitly. Why should we have to perform
the division ourselves? In the example above, to get a matrix with 3
rows, we specified both that it should have 3 rows and 4 columns.
Fortunately, tensors can automatically work out one dimension given the
rest. We invoke this capability by placing ``-1`` for the dimension that
we would like tensors to automatically infer. In our case, instead of
calling ``x.reshape(3, 4)``, we could have equivalently called
``x.reshape(-1, 4)`` or ``x.reshape(3, -1)``.
Typically, we will want our matrices initialized either with zeros,
ones, some other constants, or numbers randomly sampled from a specific
distribution. We can create a tensor representing a tensor with all
elements set to 0 and a shape of (2, 3, 4) as follows:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.zeros((2, 3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.zeros((2, 3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.zeros((2, 3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Similarly, we can create tensors with each element set to 1 as follows:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.ones((2, 3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.ones((2, 3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.ones((2, 3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Often, we want to randomly sample the values for each element in a
tensor from some probability distribution. For example, when we
construct arrays to serve as parameters in a neural network, we will
typically initialize their values randomly. The following snippet
creates a tensor with shape (3, 4). Each of its elements is randomly
sampled from a standard Gaussian (normal) distribution with a mean of 0
and a standard deviation of 1.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.random.normal(0, 1, size=(3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 2.2122064 , 1.1630787 , 0.7740038 , 0.4838046 ],
[ 1.0434403 , 0.29956347, 1.1839255 , 0.15302546],
[ 1.8917114 , -1.1688148 , -1.2347414 , 1.5580711 ]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.randn(3, 4)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[-0.6715, -0.2678, 0.1801, 0.3640],
[ 0.8030, 0.5554, -1.0327, 0.1885],
[-1.1527, -1.7143, 0.6783, -0.3666]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.random.normal(shape=[3, 4])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
We can also specify the exact values for each element in the desired
tensor by supplying a Python list (or list of lists) containing the
numerical values. Here, the outermost list corresponds to axis 0, and
the inner list to axis 1.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[2., 1., 4., 3.],
[1., 2., 3., 4.],
[4., 3., 2., 1.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[2, 1, 4, 3],
[1, 2, 3, 4],
[4, 3, 2, 1]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.constant([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Operations
----------
This book is not about software engineering. Our interests are not
limited to simply reading and writing data from/to arrays. We want to
perform mathematical operations on those arrays. Some of the simplest
and most useful operations are the *elementwise* operations. These apply
a standard scalar operation to each element of an array. For functions
that take two arrays as inputs, elementwise operations apply some
standard binary operator on each pair of corresponding elements from the
two arrays. We can create an elementwise function from any function that
maps from a scalar to a scalar.
In mathematical notation, we would denote such a *unary* scalar operator
(taking one input) by the signature
:math:`f: \mathbb{R} \rightarrow \mathbb{R}`. This just means that the
function is mapping from any real number (:math:`\mathbb{R}`) onto
another. Likewise, we denote a *binary* scalar operator (taking two real
inputs, and yielding one output) by the signature
:math:`f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}`. Given any two
vectors :math:`\mathbf{u}` and :math:`\mathbf{v}` *of the same shape*,
and a binary operator :math:`f`, we can produce a vector
:math:`\mathbf{c} = F(\mathbf{u},\mathbf{v})` by setting
:math:`c_i \gets f(u_i, v_i)` for all :math:`i`, where :math:`c_i, u_i`,
and :math:`v_i` are the :math:`i^\mathrm{th}` elements of vectors
:math:`\mathbf{c}, \mathbf{u}`, and :math:`\mathbf{v}`. Here, we
produced the vector-valued
:math:`F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d` by
*lifting* the scalar function to an elementwise vector operation.
The common standard arithmetic operators (``+``, ``-``, ``*``, ``/``,
and ``**``) have all been *lifted* to elementwise operations for any
identically-shaped tensors of arbitrary shape. We can call elementwise
operations on any two tensors of the same shape. In the following
example, we use commas to formulate a 5-element tuple, where each
element is the result of an elementwise operation.
Operations
~~~~~~~~~~
The common standard arithmetic operators (``+``, ``-``, ``*``, ``/``,
and ``**``) have all been *lifted* to elementwise operations.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = np.array([1, 2, 4, 8])
y = np.array([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y # The ** operator is exponentiation
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([ 3., 4., 6., 10.]),
array([-1., 0., 2., 6.]),
array([ 2., 4., 8., 16.]),
array([0.5, 1. , 2. , 4. ]),
array([ 1., 4., 16., 64.]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y # The ** operator is exponentiation
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([ 3., 4., 6., 10.]),
tensor([-1., 0., 2., 6.]),
tensor([ 2., 4., 8., 16.]),
tensor([0.5000, 1.0000, 2.0000, 4.0000]),
tensor([ 1., 4., 16., 64.]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = tf.constant([1.0, 2, 4, 8])
y = tf.constant([2.0, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y # The ** operator is exponentiation
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
,
,
,
)
.. raw:: html
.. raw:: html
Many more operations can be applied elementwise, including unary
operators like exponentiation.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.exp(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([2.7182817e+00, 7.3890562e+00, 5.4598148e+01, 2.9809580e+03])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.exp(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([2.7183e+00, 7.3891e+00, 5.4598e+01, 2.9810e+03])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.exp(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
In addition to elementwise computations, we can also perform linear
algebra operations, including vector dot products and matrix
multiplication. We will explain the crucial bits of linear algebra (with
no assumed prior knowledge) in :numref:`sec_linear-algebra`.
We can also *concatenate* multiple tensors together, stacking them
end-to-end to form a larger tensor. We just need to provide a list of
tensors and tell the system along which axis to concatenate. The example
below shows what happens when we concatenate two matrices along rows
(axis 0, the first element of the shape) vs. columns (axis 1, the second
element of the shape). We can see that the first output tensor's axis-0
length (:math:`6`) is the sum of the two input tensors' axis-0 lengths
(:math:`3 + 3`); while the second output tensor's axis-1 length
(:math:`8`) is the sum of the two input tensors' axis-1 lengths
(:math:`4 + 4`).
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = np.arange(12).reshape(3, 4)
Y = np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
np.concatenate([X, Y], axis=0), np.concatenate([X, Y], axis=1)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 2., 1., 4., 3.],
[ 1., 2., 3., 4.],
[ 4., 3., 2., 1.]]),
array([[ 0., 1., 2., 3., 2., 1., 4., 3.],
[ 4., 5., 6., 7., 1., 2., 3., 4.],
[ 8., 9., 10., 11., 4., 3., 2., 1.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 2., 1., 4., 3.],
[ 1., 2., 3., 4.],
[ 4., 3., 2., 1.]]),
tensor([[ 0., 1., 2., 3., 2., 1., 4., 3.],
[ 4., 5., 6., 7., 1., 2., 3., 4.],
[ 8., 9., 10., 11., 4., 3., 2., 1.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X = tf.reshape(tf.range(12, dtype=tf.float32), (3, 4))
Y = tf.constant([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
tf.concat([X, Y], axis=0), tf.concat([X, Y], axis=1)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
Sometimes, we want to construct a binary tensor via *logical
statements*. Take ``X == Y`` as an example. For each position, if ``X``
and ``Y`` are equal at that position, the corresponding entry in the new
tensor takes a value of 1, meaning that the logical statement ``X == Y``
is true at that position; otherwise that position takes 0.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X == Y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[False, True, False, True],
[False, False, False, False],
[False, False, False, False]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X == Y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[False, True, False, True],
[False, False, False, False],
[False, False, False, False]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X == Y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Summing all the elements in the tensor yields a tensor with only one
element.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(66.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(66.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_sum(X)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
.. _subsec_broadcasting:
Broadcasting Mechanism
----------------------
In the above section, we saw how to perform elementwise operations on
two tensors of the same shape. Under certain conditions, even when
shapes differ, we can still perform elementwise operations by invoking
the *broadcasting mechanism*. This mechanism works in the following way:
First, expand one or both arrays by copying elements appropriately so
that after this transformation, the two tensors have the same shape.
Second, carry out the elementwise operations on the resulting arrays.
In most cases, we broadcast along an axis where an array initially only
has length 1, such as in the following example:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = np.arange(3).reshape(3, 1)
b = np.arange(2).reshape(1, 2)
a, b
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([[0.],
[1.],
[2.]]),
array([[0., 1.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[0],
[1],
[2]]),
tensor([[0, 1]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = tf.reshape(tf.range(3), (3, 1))
b = tf.reshape(tf.range(2), (1, 2))
a, b
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
Since ``a`` and ``b`` are :math:`3\times1` and :math:`1\times2` matrices
respectively, their shapes do not match up if we want to add them. We
*broadcast* the entries of both matrices into a larger :math:`3\times2`
matrix as follows: for matrix ``a`` it replicates the columns and for
matrix ``b`` it replicates the rows before adding up both elementwise.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a + b
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0., 1.],
[1., 2.],
[2., 3.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a + b
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[0, 1],
[1, 2],
[2, 3]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a + b
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Indexing and Slicing
--------------------
Just as in any other Python array, elements in a tensor can be accessed
by index. As in any Python array, the first element has index 0 and
ranges are specified to include the first but *before* the last element.
As in standard Python lists, we can access elements according to their
relative position to the end of the list by using negative indices.
Thus, ``[-1]`` selects the last element and ``[1:3]`` selects the second
and the third elements as follows:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X[-1], X[1:3]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([ 8., 9., 10., 11.]),
array([[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]]))
Beyond reading, we can also write elements of a matrix by specifying
indices.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X[1, 2] = 9
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0., 1., 2., 3.],
[ 4., 5., 9., 7.],
[ 8., 9., 10., 11.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X[-1], X[1:3]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([ 8., 9., 10., 11.]),
tensor([[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]]))
Beyond reading, we can also write elements of a matrix by specifying
indices.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X[1, 2] = 9
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0., 1., 2., 3.],
[ 4., 5., 9., 7.],
[ 8., 9., 10., 11.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X[-1], X[1:3]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
``Tensors`` in TensorFlow are immutable, and cannot be assigned to.
``Variables`` in TensorFlow are mutable containers of state that support
assignments. Keep in mind that gradients in TensorFlow do not flow
backwards through ``Variable`` assignments.
Beyond assigning a value to the entire ``Variable``, we can write
elements of a ``Variable`` by specifying indices.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X_var = tf.Variable(X)
X_var[1, 2].assign(9)
X_var
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
If we want to assign multiple elements the same value, we simply index
all of them and then assign them the value. For instance, ``[0:2, :]``
accesses the first and second rows, where ``:`` takes all the elements
along axis 1 (column). While we discussed indexing for matrices, this
obviously also works for vectors and for tensors of more than 2
dimensions.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X[0:2, :] = 12
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[12., 12., 12., 12.],
[12., 12., 12., 12.],
[ 8., 9., 10., 11.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X[0:2, :] = 12
X
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[12., 12., 12., 12.],
[12., 12., 12., 12.],
[ 8., 9., 10., 11.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
X_var = tf.Variable(X)
X_var[0:2, :].assign(tf.ones(X_var[0:2,:].shape, dtype = tf.float32) * 12)
X_var
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Saving Memory
-------------
Running operations can cause new memory to be allocated to host results.
For example, if we write ``Y = X + Y``, we will dereference the tensor
that ``Y`` used to point to and instead point ``Y`` at the newly
allocated memory. In the following example, we demonstrate this with
Python's ``id()`` function, which gives us the exact address of the
referenced object in memory. After running ``Y = Y + X``, we will find
that ``id(Y)`` points to a different location. That is because Python
first evaluates ``Y + X``, allocating new memory for the result and then
makes ``Y`` point to this new location in memory.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
before = id(Y)
Y = Y + X
id(Y) == before
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
False
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
before = id(Y)
Y = Y + X
id(Y) == before
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
False
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
before = id(Y)
Y = Y + X
id(Y) == before
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
False
.. raw:: html
.. raw:: html
This might be undesirable for two reasons. First, we do not want to run
around allocating memory unnecessarily all the time. In machine
learning, we might have hundreds of megabytes of parameters and update
all of them multiple times per second. Typically, we will want to
perform these updates *in place*. Second, we might point at the same
parameters from multiple variables. If we do not update in place, other
references will still point to the old memory location, making it
possible for parts of our code to inadvertently reference stale
parameters.
.. raw:: html
.. raw:: html
Fortunately, performing in-place operations is easy. We can assign the
result of an operation to a previously allocated array with slice
notation, e.g., ``Y[:] = ``. To illustrate this concept, we
first create a new matrix ``Z`` with the same shape as another ``Y``,
using ``zeros_like`` to allocate a block of :math:`0` entries.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
Z = np.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
id(Z): 140030434598464
id(Z): 140030434598464
If the value of ``X`` is not reused in subsequent computations, we can
also use ``X[:] = X + Y`` or ``X += Y`` to reduce the memory overhead of
the operation.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
before = id(X)
X += Y
id(X) == before
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
True
.. raw:: html
.. raw:: html
Fortunately, performing in-place operations is easy. We can assign the
result of an operation to a previously allocated array with slice
notation, e.g., ``Y[:] = ``. To illustrate this concept, we
first create a new matrix ``Z`` with the same shape as another ``Y``,
using ``zeros_like`` to allocate a block of :math:`0` entries.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
Z = torch.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
id(Z): 139771322421872
id(Z): 139771322421872
If the value of ``X`` is not reused in subsequent computations, we can
also use ``X[:] = X + Y`` or ``X += Y`` to reduce the memory overhead of
the operation.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
before = id(X)
X += Y
id(X) == before
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
True
.. raw:: html
.. raw:: html
``Variables`` are mutable containers of state in TensorFlow. They
provide a way to store your model parameters. We can assign the result
of an operation to a ``Variable`` with ``assign``. To illustrate this
concept, we create a ``Variable`` ``Z`` with the same shape as another
tensor ``Y``, using ``zeros_like`` to allocate a block of :math:`0`
entries.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
Z = tf.Variable(tf.zeros_like(Y))
print('id(Z):', id(Z))
Z.assign(X + Y)
print('id(Z):', id(Z))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
id(Z): 139835681563600
id(Z): 139835681563600
Even once you store state persistently in a ``Variable``, you may want
to reduce your memory usage further by avoiding excess allocations for
tensors that are not your model parameters.
Because TensorFlow ``Tensors`` are immutable and gradients do not flow
through ``Variable`` assignments, TensorFlow does not provide an
explicit way to run an individual operation in-place.
However, TensorFlow provides the ``tf.function`` decorator to wrap
computation inside of a TensorFlow graph that gets compiled and
optimized before running. This allows TensorFlow to prune unused values,
and to re-use prior allocations that are no longer needed. This
minimizes the memory overhead of TensorFlow computations.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
@tf.function
def computation(X, Y):
Z = tf.zeros_like(Y) # This unused value will be pruned out
A = X + Y # Allocations will be re-used when no longer needed
B = A + Y
C = B + Y
return C + Y
computation(X, Y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Conversion to Other Python Objects
----------------------------------
.. raw:: html
.. raw:: html
Converting to a NumPy tensor (``ndarray``), or vice versa, is easy. The
converted result does not share memory. This minor inconvenience is
actually quite important: when you perform operations on the CPU or on
GPUs, you do not want to halt computation, waiting to see whether the
NumPy package of Python might want to be doing something else with the
same chunk of memory.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = X.asnumpy()
B = np.array(A)
type(A), type(B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(numpy.ndarray, mxnet.numpy.ndarray)
.. raw:: html
.. raw:: html
Converting to a NumPy tensor (``ndarray``), or vice versa, is easy. The
torch Tensor and numpy array will share their underlying memory
locations, and changing one through an in-place operation will also
change the other.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(numpy.ndarray, torch.Tensor)
.. raw:: html
.. raw:: html
Converting to a NumPy tensor (``ndarray``), or vice versa, is easy. The
converted result does not share memory. This minor inconvenience is
actually quite important: when you perform operations on the CPU or on
GPUs, you do not want to halt computation, waiting to see whether the
NumPy package of Python might want to be doing something else with the
same chunk of memory.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = X.numpy()
B = tf.constant(A)
type(A), type(B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(numpy.ndarray, tensorflow.python.framework.ops.EagerTensor)
.. raw:: html
.. raw:: html
To convert a size-1 tensor to a Python scalar, we can invoke the
``item`` function or Python's built-in functions.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = np.array([3.5])
a, a.item(), float(a), int(a)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([3.5]), 3.5, 3.5, 3)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([3.5000]), 3.5, 3.5, 3)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = tf.constant([3.5]).numpy()
a, a.item(), float(a), int(a)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([3.5], dtype=float32), 3.5, 3.5, 3)
.. raw:: html
.. raw:: html
Summary
-------
- The main interface to store and manipulate data for deep learning is
the tensor (:math:`n`-dimensional array). It provides a variety of
functionalities including basic mathematics operations, broadcasting,
indexing, slicing, memory saving, and conversion to other Python
objects.
Exercises
---------
1. Run the code in this section. Change the conditional statement
``X == Y`` in this section to ``X < Y`` or ``X > Y``, and then see
what kind of tensor you can get.
2. Replace the two tensors that operate by element in the broadcasting
mechanism with other shapes, e.g., 3-dimensional tensors. Is the
result the same as expected?
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html