4.3. Concise Implementation of Multilayer Perceptrons¶

Open the notebook in Colab

Open the notebook in Colab

Open the notebook in Colab

Open the notebook in SageMaker Studio Lab

As you might expect, by relying on the high-level APIs, we can implement MLPs even more concisely.

from mxnet import gluon, init, npx
from mxnet.gluon import nn
from d2l import mxnet as d2l

npx.set_np()

import torch
from torch import nn
from d2l import torch as d2l

import tensorflow as tf
from d2l import tensorflow as d2l

4.3.1. Model¶

As compared with our concise implementation of softmax regression implementation (Section 3.7), the only difference is that we add two fully-connected layers (previously, we added one). The first is our hidden layer, which contains 256 hidden units and applies the ReLU activation function. The second is our output layer.

mxnet pytorch tensorflow

net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'),
        nn.Dense(10))
net.initialize(init.Normal(sigma=0.01))

net = nn.Sequential(nn.Flatten(),
                    nn.Linear(784, 256),
                    nn.ReLU(),
                    nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);

net = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(10)])

The training loop is exactly the same as when we implemented softmax regression. This modularity enables us to separate matters concerning the model architecture from orthogonal considerations.

mxnet pytorch tensorflow

batch_size, lr, num_epochs = 256, 0.1, 10
loss = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

../_images/output_mlp-concise_f87756_27_0.svg

batch_size, lr, num_epochs = 256, 0.1, 10
loss = nn.CrossEntropyLoss(reduction='none')
trainer = torch.optim.SGD(net.parameters(), lr=lr)

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

../_images/output_mlp-concise_f87756_30_0.svg

batch_size, lr, num_epochs = 256, 0.1, 10
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
trainer = tf.keras.optimizers.SGD(learning_rate=lr)

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

../_images/output_mlp-concise_f87756_33_0.svg

4.3.2. Summary¶

Using high-level APIs, we can implement MLPs much more concisely.
For the same classification problem, the implementation of an MLP is the same as that of softmax regression except for additional hidden layers with activation functions.

4.3.3. Exercises¶

Try adding different numbers of hidden layers (you may also modify the learning rate). What setting works best?
Try out different activation functions. Which one works best?
Try different schemes for initializing the weights. What method works best?

mxnet pytorch tensorflow

Discussions

4.3. Concise Implementation of Multilayer Perceptrons¶ Colab [mxnet] Open the notebook in Colab Colab [pytorch] Open the notebook in Colab Colab [tensorflow] Open the notebook in Colab SageMaker Studio Lab Open the notebook in SageMaker Studio Lab

4.3.1. Model¶

4.3.2. Summary¶

4.3.3. Exercises¶

4.3. Concise Implementation of Multilayer Perceptrons¶

Open the notebook in Colab

Open the notebook in Colab

Open the notebook in Colab

Open the notebook in SageMaker Studio Lab