Parameter Management
====================
Once we have chosen an architecture and set our hyperparameters, we
proceed to the training loop, where our goal is to find parameter values
that minimize our loss function. After training, we will need these
parameters in order to make future predictions. Additionally, we will
sometimes wish to extract the parameters either to reuse them in some
other context, to save our model to disk so that it may be executed in
other software, or for examination in the hope of gaining scientific
understanding.
Most of the time, we will be able to ignore the nitty-gritty details of
how parameters are declared and manipulated, relying on deep learning
frameworks to do the heavy lifting. However, when we move away from
stacked architectures with standard layers, we will sometimes need to
get into the weeds of declaring and manipulating parameters. In this
section, we cover the following:
- Accessing parameters for debugging, diagnostics, and visualizations.
- Parameter initialization.
- Sharing parameters across different model components.
We start by focusing on an MLP with one hidden layer.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
from mxnet import init, np, npx
from mxnet.gluon import nn
npx.set_np()
net = nn.Sequential()
net.add(nn.Dense(8, activation='relu'))
net.add(nn.Dense(1))
net.initialize() # Use the default initialization method
X = np.random.uniform(size=(2, 4))
net(X) # Forward computation
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0.0054572 ],
[0.00488594]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import torch
from torch import nn
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[0.0731],
[0.1613]], grad_fn=
)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import tensorflow as tf
net = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(4, activation=tf.nn.relu),
tf.keras.layers.Dense(1),
])
X = tf.random.uniform((2, 4))
net(X)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Parameter Access
----------------
Let us start with how to access parameters from the models that you
already know. When a model is defined via the ``Sequential`` class, we
can first access any layer by indexing into the model as though it were
a list. Each layer's parameters are conveniently located in its
attribute. We can inspect the parameters of the second fully-connected
layer as follows.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(net[1].params)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
dense1_ (
Parameter dense1_weight (shape=(1, 8), dtype=float32)
Parameter dense1_bias (shape=(1,), dtype=float32)
)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(net[2].state_dict())
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
OrderedDict([('weight', tensor([[-0.2665, 0.1665, -0.2725, -0.1111, 0.1236, -0.2030, 0.3430, 0.3423]])), ('bias', tensor([0.0211]))])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(net.layers[2].weights)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
[, ]
.. raw:: html
.. raw:: html
The output tells us a few important things. First, this fully-connected
layer contains two parameters, corresponding to that layer's weights and
biases, respectively. Both are stored as single precision floats
(float32). Note that the names of the parameters allow us to uniquely
identify each layer's parameters, even in a network containing hundreds
of layers.
Targeted Parameters
~~~~~~~~~~~~~~~~~~~
Note that each parameter is represented as an instance of the parameter
class. To do anything useful with the parameters, we first need to
access the underlying numerical values. There are several ways to do
this. Some are simpler while others are more general. The following code
extracts the bias from the second neural network layer, which returns a
parameter class instance, and further accesses that parameter's value.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(type(net[1].bias))
print(net[1].bias)
print(net[1].bias.data())
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Parameter dense1_bias (shape=(1,), dtype=float32)
[0.]
Parameters are complex objects, containing values, gradients, and
additional information. That's why we need to request the value
explicitly.
In addition to the value, each parameter also allows us to access the
gradient. Because we have not invoked backpropagation for this network
yet, it is in its initial state.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net[1].weight.grad()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0., 0., 0., 0., 0., 0., 0., 0.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(type(net[2].bias))
print(net[2].bias)
print(net[2].bias.data)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Parameter containing:
tensor([0.0211], requires_grad=True)
tensor([0.0211])
Parameters are complex objects, containing values, gradients, and
additional information. That's why we need to request the value
explicitly.
In addition to the value, each parameter also allows us to access the
gradient. Because we have not invoked backpropagation for this network
yet, it is in its initial state.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net[2].weight.grad == None
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
True
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(type(net.layers[2].weights[1]))
print(net.layers[2].weights[1])
print(tf.convert_to_tensor(net.layers[2].weights[1]))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tf.Tensor([0.], shape=(1,), dtype=float32)
.. raw:: html
.. raw:: html
All Parameters at Once
~~~~~~~~~~~~~~~~~~~~~~
When we need to perform operations on all parameters, accessing them
one-by-one can grow tedious. The situation can grow especially unwieldy
when we work with more complex blocks (e.g., nested blocks), since we
would need to recurse through the entire tree to extract each
sub-block's parameters. Below we demonstrate accessing the parameters of
the first fully-connected layer vs. accessing all layers.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(net[0].collect_params())
print(net.collect_params())
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
dense0_ (
Parameter dense0_weight (shape=(8, 4), dtype=float32)
Parameter dense0_bias (shape=(8,), dtype=float32)
)
sequential0_ (
Parameter dense0_weight (shape=(8, 4), dtype=float32)
Parameter dense0_bias (shape=(8,), dtype=float32)
Parameter dense1_weight (shape=(1, 8), dtype=float32)
Parameter dense1_bias (shape=(1,), dtype=float32)
)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(*[(name, param.shape) for name, param in net[0].named_parameters()])
print(*[(name, param.shape) for name, param in net.named_parameters()])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
('weight', torch.Size([8, 4])) ('bias', torch.Size([8]))
('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(net.layers[1].weights)
print(net.get_weights())
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
[, ]
[array([[ 0.6733095 , -0.1968227 , 0.70611614, 0.42580146],
[ 0.8420772 , -0.05097824, -0.29724634, 0.7435817 ],
[-0.55750954, -0.36873615, 0.8461582 , -0.75422806],
[-0.09947252, -0.71967113, -0.06134313, -0.12178153]],
dtype=float32), array([0., 0., 0., 0.], dtype=float32), array([[ 0.6852075 ],
[-0.81062627],
[ 0.14000857],
[ 0.5764358 ]], dtype=float32), array([0.], dtype=float32)]
.. raw:: html
.. raw:: html
This provides us with another way of accessing the parameters of the
network as follows.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net.collect_params()['dense1_bias'].data()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([0.])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net.state_dict()['2.bias'].data
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([0.0211])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net.get_weights()[1]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([0., 0., 0., 0.], dtype=float32)
.. raw:: html
.. raw:: html
Collecting Parameters from Nested Blocks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let us see how the parameter naming conventions work if we nest multiple
blocks inside each other. For that we first define a function that
produces blocks (a block factory, so to speak) and then combine these
inside yet larger blocks.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
def block1():
net = nn.Sequential()
net.add(nn.Dense(32, activation='relu'))
net.add(nn.Dense(16, activation='relu'))
return net
def block2():
net = nn.Sequential()
for _ in range(4):
# Nested here
net.add(block1())
return net
rgnet = nn.Sequential()
rgnet.add(block2())
rgnet.add(nn.Dense(10))
rgnet.initialize()
rgnet(X)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[-6.3465846e-09, -1.1096752e-09, 6.4161787e-09, 6.6354140e-09,
-1.1265507e-09, 1.3284951e-10, 9.3619388e-09, 3.2229084e-09,
5.9429879e-09, 8.8181435e-09],
[-8.6219423e-09, -7.5150686e-10, 8.3133251e-09, 8.9321128e-09,
-1.6740003e-09, 3.2405989e-10, 1.2115976e-08, 4.4926449e-09,
8.0741742e-09, 1.2075874e-08]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
def block1():
return nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
nn.Linear(8, 4), nn.ReLU())
def block2():
net = nn.Sequential()
for i in range(4):
# Nested here
net.add_module(f'block {i}', block1())
return net
rgnet = nn.Sequential(block2(), nn.Linear(4, 1))
rgnet(X)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[-0.3015],
[-0.3015]], grad_fn=
)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
def block1(name):
return tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(4, activation=tf.nn.relu)],
name=name)
def block2():
net = tf.keras.Sequential()
for i in range(4):
# Nested here
net.add(block1(name=f'block-{i}'))
return net
rgnet = tf.keras.Sequential()
rgnet.add(block2())
rgnet.add(tf.keras.layers.Dense(1))
rgnet(X)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Now that we have designed the network, let us see how it is organized.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(rgnet.collect_params)
print(rgnet.collect_params())
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
32, Activation(relu))
(1): Dense(32 -> 16, Activation(relu))
)
(1): Sequential(
(0): Dense(16 -> 32, Activation(relu))
(1): Dense(32 -> 16, Activation(relu))
)
(2): Sequential(
(0): Dense(16 -> 32, Activation(relu))
(1): Dense(32 -> 16, Activation(relu))
)
(3): Sequential(
(0): Dense(16 -> 32, Activation(relu))
(1): Dense(32 -> 16, Activation(relu))
)
)
(1): Dense(16 -> 10, linear)
)>
sequential1_ (
Parameter dense2_weight (shape=(32, 4), dtype=float32)
Parameter dense2_bias (shape=(32,), dtype=float32)
Parameter dense3_weight (shape=(16, 32), dtype=float32)
Parameter dense3_bias (shape=(16,), dtype=float32)
Parameter dense4_weight (shape=(32, 16), dtype=float32)
Parameter dense4_bias (shape=(32,), dtype=float32)
Parameter dense5_weight (shape=(16, 32), dtype=float32)
Parameter dense5_bias (shape=(16,), dtype=float32)
Parameter dense6_weight (shape=(32, 16), dtype=float32)
Parameter dense6_bias (shape=(32,), dtype=float32)
Parameter dense7_weight (shape=(16, 32), dtype=float32)
Parameter dense7_bias (shape=(16,), dtype=float32)
Parameter dense8_weight (shape=(32, 16), dtype=float32)
Parameter dense8_bias (shape=(32,), dtype=float32)
Parameter dense9_weight (shape=(16, 32), dtype=float32)
Parameter dense9_bias (shape=(16,), dtype=float32)
Parameter dense10_weight (shape=(10, 16), dtype=float32)
Parameter dense10_bias (shape=(10,), dtype=float32)
)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(rgnet)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Sequential(
(0): Sequential(
(block 0): Sequential(
(0): Linear(in_features=4, out_features=8, bias=True)
(1): ReLU()
(2): Linear(in_features=8, out_features=4, bias=True)
(3): ReLU()
)
(block 1): Sequential(
(0): Linear(in_features=4, out_features=8, bias=True)
(1): ReLU()
(2): Linear(in_features=8, out_features=4, bias=True)
(3): ReLU()
)
(block 2): Sequential(
(0): Linear(in_features=4, out_features=8, bias=True)
(1): ReLU()
(2): Linear(in_features=8, out_features=4, bias=True)
(3): ReLU()
)
(block 3): Sequential(
(0): Linear(in_features=4, out_features=8, bias=True)
(1): ReLU()
(2): Linear(in_features=8, out_features=4, bias=True)
(3): ReLU()
)
)
(1): Linear(in_features=4, out_features=1, bias=True)
)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
print(rgnet.summary())
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_2 (Sequential) (2, 4) 80
dense_6 (Dense) (2, 1) 5
=================================================================
Total params: 85
Trainable params: 85
Non-trainable params: 0
_________________________________________________________________
None
.. raw:: html
.. raw:: html
Since the layers are hierarchically nested, we can also access them as
though indexing through nested lists. For instance, we can access the
first major block, within it the second sub-block, and within that the
bias of the first layer, with as follows.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
rgnet[0][1][0].bias.data()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
rgnet[0][1][0].bias.data
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([-0.1396, 0.2621, -0.4331, -0.2222, -0.3755, -0.0421, -0.4612, -0.4498])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
rgnet.layers[0].layers[1].layers[1].weights[1]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Parameter Initialization
------------------------
Now that we know how to access the parameters, let us look at how to
initialize them properly. We discussed the need for proper
initialization in :numref:`sec_numerical_stability`. The deep learning
framework provides default random initializations to its layers.
However, we often want to initialize our weights according to various
other protocols. The framework provides most commonly used protocols,
and also allows to create a custom initializer.
.. raw:: html
.. raw:: html
By default, MXNet initializes weight parameters by randomly drawing from
a uniform distribution :math:`U(-0.07, 0.07)`, clearing bias parameters
to zero. MXNet's ``init`` module provides a variety of preset
initialization methods.
.. raw:: html
.. raw:: html
By default, PyTorch initializes weight and bias matrices uniformly by
drawing from a range that is computed according to the input and output
dimension. PyTorch's ``nn.init`` module provides a variety of preset
initialization methods.
.. raw:: html
.. raw:: html
By default, Keras initializes weight matrices uniformly by drawing from
a range that is computed according to the input and output dimension,
and the bias parameters are all set to zero. TensorFlow provides a
variety of initialization methods both in the root module and the
``keras.initializers`` module.
.. raw:: html
.. raw:: html
Built-in Initialization
~~~~~~~~~~~~~~~~~~~~~~~
Let us begin by calling on built-in initializers. The code below
initializes all weight parameters as Gaussian random variables with
standard deviation 0.01, while bias parameters cleared to zero.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
# Here `force_reinit` ensures that parameters are freshly initialized even if
# they were already initialized previously
net.initialize(init=init.Normal(sigma=0.01), force_reinit=True)
net[0].weight.data()[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([-0.00324057, -0.00895028, -0.00698632, 0.01030831])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
def init_normal(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, mean=0, std=0.01)
nn.init.zeros_(m.bias)
net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([ 0.0007, 0.0208, 0.0103, -0.0094]), tensor(0.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(
4, activation=tf.nn.relu,
kernel_initializer=tf.random_normal_initializer(mean=0, stddev=0.01),
bias_initializer=tf.zeros_initializer()),
tf.keras.layers.Dense(1)])
net(X)
net.weights[0], net.weights[1]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
We can also initialize all the parameters to a given constant value
(say, 1).
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net.initialize(init=init.Constant(1), force_reinit=True)
net[0].weight.data()[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([1., 1., 1., 1.])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
def init_constant(m):
if type(m) == nn.Linear:
nn.init.constant_(m.weight, 1)
nn.init.zeros_(m.bias)
net.apply(init_constant)
net[0].weight.data[0], net[0].bias.data[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([1., 1., 1., 1.]), tensor(0.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(
4, activation=tf.nn.relu,
kernel_initializer=tf.keras.initializers.Constant(1),
bias_initializer=tf.zeros_initializer()),
tf.keras.layers.Dense(1),
])
net(X)
net.weights[0], net.weights[1]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
We can also apply different initializers for certain blocks. For
example, below we initialize the first layer with the Xavier initializer
and initialize the second layer to a constant value of 42.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net[0].weight.initialize(init=init.Xavier(), force_reinit=True)
net[1].initialize(init=init.Constant(42), force_reinit=True)
print(net[0].weight.data()[0])
print(net[1].weight.data())
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
[-0.17594433 0.02314097 -0.1992535 0.09509248]
[[42. 42. 42. 42. 42. 42. 42. 42.]]
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
def xavier(m):
if type(m) == nn.Linear:
nn.init.xavier_uniform_(m.weight)
def init_42(m):
if type(m) == nn.Linear:
nn.init.constant_(m.weight, 42)
net[0].apply(xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([-0.2886, -0.6428, -0.2568, 0.0093])
tensor([[42., 42., 42., 42., 42., 42., 42., 42.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(
4,
activation=tf.nn.relu,
kernel_initializer=tf.keras.initializers.GlorotUniform()),
tf.keras.layers.Dense(
1, kernel_initializer=tf.keras.initializers.Constant(42)),
])
net(X)
print(net.layers[1].weights[0])
print(net.layers[2].weights[0])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Custom Initialization
~~~~~~~~~~~~~~~~~~~~~
Sometimes, the initialization methods we need are not provided by the
deep learning framework. In the example below, we define an initializer
for any weight parameter :math:`w` using the following strange
distribution:
.. math::
\begin{aligned}
w \sim \begin{cases}
U(5, 10) & \text{ with probability } \frac{1}{4} \\
0 & \text{ with probability } \frac{1}{2} \\
U(-10, -5) & \text{ with probability } \frac{1}{4}
\end{cases}
\end{aligned}
.. raw:: html
.. raw:: html
Here we define a subclass of the ``Initializer`` class. Usually, we only
need to implement the ``_init_weight`` function which takes a tensor
argument (``data``) and assigns to it the desired initialized values.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
class MyInit(init.Initializer):
def _init_weight(self, name, data):
print('Init', name, data.shape)
data[:] = np.random.uniform(-10, 10, data.shape)
data *= np.abs(data) >= 5
net.initialize(MyInit(), force_reinit=True)
net[0].weight.data()[:2]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Init dense0_weight (8, 4)
Init dense1_weight (1, 8)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0. , -0. , -0. , 8.522827 ],
[ 0. , -8.828651 , -0. , -5.6012006]])
.. raw:: html
.. raw:: html
Again, we implement a ``my_init`` function to apply to ``net``.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
def my_init(m):
if type(m) == nn.Linear:
print("Init", *[(name, param.shape)
for name, param in m.named_parameters()][0])
nn.init.uniform_(m.weight, -10, 10)
m.weight.data *= m.weight.data.abs() >= 5
net.apply(my_init)
net[0].weight[:2]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Init weight torch.Size([8, 4])
Init weight torch.Size([1, 8])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0.0000, -7.4797, -8.2019, 9.8475],
[ 5.0957, 7.4569, -0.0000, -0.0000]], grad_fn=)
.. raw:: html
.. raw:: html
Here we define a subclass of ``Initializer`` and implement the
``__call__`` function that return a desired tensor given the shape and
data type.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
class MyInit(tf.keras.initializers.Initializer):
def __call__(self, shape, dtype=None):
data=tf.random.uniform(shape, -10, 10, dtype=dtype)
factor=(tf.abs(data) >= 5)
factor=tf.cast(factor, tf.float32)
return data * factor
net = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(
4,
activation=tf.nn.relu,
kernel_initializer=MyInit()),
tf.keras.layers.Dense(1),
])
net(X)
print(net.layers[1].weights[0])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Note that we always have the option of setting parameters directly.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net[0].weight.data()[:] += 1
net[0].weight.data()[0, 0] = 42
net[0].weight.data()[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([42. , 1. , 1. , 9.522827])
A note for advanced users: if you want to adjust parameters within an
``autograd`` scope, you need to use ``set_data`` to avoid confusing the
automatic differentiation mechanics.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net[0].weight.data[:] += 1
net[0].weight.data[0, 0] = 42
net[0].weight.data[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([42.0000, -6.4797, -7.2019, 10.8475])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net.layers[1].weights[0][:].assign(net.layers[1].weights[0] + 1)
net.layers[1].weights[0][0, 0].assign(42)
net.layers[1].weights[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Tied Parameters
---------------
Often, we want to share parameters across multiple layers. Let us see
how to do this elegantly. In the following we allocate a dense layer and
then use its parameters specifically to set those of another layer.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
net = nn.Sequential()
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.Dense(8, activation='relu')
net.add(nn.Dense(8, activation='relu'),
shared,
nn.Dense(8, activation='relu', params=shared.params),
nn.Dense(10))
net.initialize()
X = np.random.uniform(size=(2, 20))
net(X)
# Check whether the parameters are the same
print(net[1].weight.data()[0] == net[2].weight.data()[0])
net[1].weight.data()[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[1].weight.data()[0] == net[2].weight.data()[0])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
[ True True True True True True True True]
[ True True True True True True True True]
This example shows that the parameters of the second and third layer are
tied. They are not just equal, they are represented by the same exact
tensor. Thus, if we change one of the parameters, the other one changes,
too. You might wonder, when parameters are tied what happens to the
gradients? Since the model parameters contain gradients, the gradients
of the second hidden layer and the third hidden layer are added together
during backpropagation.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
shared, nn.ReLU(),
shared, nn.ReLU(),
nn.Linear(8, 1))
net(X)
# Check whether the parameters are the same
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[2].weight.data[0] == net[4].weight.data[0])
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])
This example shows that the parameters of the second and third layer are
tied. They are not just equal, they are represented by the same exact
tensor. Thus, if we change one of the parameters, the other one changes,
too. You might wonder, when parameters are tied what happens to the
gradients? Since the model parameters contain gradients, the gradients
of the second hidden layer and the third hidden layer are added together
during backpropagation.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
# tf.keras behaves a bit differently. It removes the duplicate layer
# automatically
shared = tf.keras.layers.Dense(4, activation=tf.nn.relu)
net = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
shared,
shared,
tf.keras.layers.Dense(1),
])
net(X)
# Check whether the parameters are different
print(len(net.layers) == 3)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
True
.. raw:: html
.. raw:: html
Summary
-------
- We have several ways to access, initialize, and tie model parameters.
- We can use custom initialization.
Exercises
---------
1. Use the ``FancyMLP`` model defined in
:numref:`sec_model_construction` and access the parameters of the
various layers.
2. Look at the initialization module document to explore different
initializers.
3. Construct an MLP containing a shared parameter layer and train it.
During the training process, observe the model parameters and
gradients of each layer.
4. Why is sharing parameters a good idea?
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html