5.1. Layers and Blocks¶ Open the notebook in SageMaker Studio Lab
When we first introduced neural networks, we focused on linear models with a single output. Here, the entire model consists of just a single neuron. Note that a single neuron (i) takes some set of inputs; (ii) generates a corresponding scalar output; and (iii) has a set of associated parameters that can be updated to optimize some objective function of interest. Then, once we started thinking about networks with multiple outputs, we leveraged vectorized arithmetic to characterize an entire layer of neurons. Just like individual neurons, layers (i) take a set of inputs, (ii) generate corresponding outputs, and (iii) are described by a set of tunable parameters. When we worked through softmax regression, a single layer was itself the model. However, even when we subsequently introduced MLPs, we could still think of the model as retaining this same basic structure.
Interestingly, for MLPs, both the entire model and its constituent layers share this structure. The entire model takes in raw inputs (the features), generates outputs (the predictions), and possesses parameters (the combined parameters from all constituent layers). Likewise, each individual layer ingests inputs (supplied by the previous layer) generates outputs (the inputs to the subsequent layer), and possesses a set of tunable parameters that are updated according to the signal that flows backwards from the subsequent layer.
While you might think that neurons, layers, and models give us enough abstractions to go about our business, it turns out that we often find it convenient to speak about components that are larger than an individual layer but smaller than the entire model. For example, the ResNet-152 architecture, which is wildly popular in computer vision, possesses hundreds of layers. These layers consist of repeating patterns of groups of layers. Implementing such a network one layer at a time can grow tedious. This concern is not just hypothetical—such design patterns are common in practice. The ResNet architecture mentioned above won the 2015 ImageNet and COCO computer vision competitions for both recognition and detection () and remains a go-to architecture for many vision tasks. Similar architectures in which layers are arranged in various repeating patterns are now ubiquitous in other domains, including natural language processing and speech.
To implement these complex networks, we introduce the concept of a neural network block. A block could describe a single layer, a component consisting of multiple layers, or the entire model itself! One benefit of working with the block abstraction is that they can be combined into larger artifacts, often recursively. This is illustrated in Fig. 5.1.1. By defining code to generate blocks of arbitrary complexity on demand, we can write surprisingly compact code and still implement complex neural networks.
From a programing standpoint, a block is represented by a class. Any subclass of it must define a forward propagation function that transforms its input into output and must store any necessary parameters. Note that some blocks do not require any parameters at all. Finally a block must possess a backpropagation function, for purposes of calculating gradients. Fortunately, due to some behind-the-scenes magic supplied by the auto differentiation (introduced in Section 2.5) when defining our own block, we only need to worry about parameters and the forward propagation function.
To begin, we revisit the code that we used to implement MLPs (Section 4.3). The following code generates a network with one fully-connected hidden layer with 256 units and ReLU activation, followed by a fully-connected output layer with 10 units (no activation function).
from mxnet import np, npx
from mxnet.gluon import nn
npx.set_np()
net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'))
net.add(nn.Dense(10))
net.initialize()
X = np.random.uniform(size=(2, 20))
net(X)
array([[ 0.06240274, -0.03268593, 0.02582653, 0.02254181, -0.03728798,
-0.04253785, 0.00540612, -0.01364185, -0.09915454, -0.02272737],
[ 0.02816679, -0.03341204, 0.03565665, 0.02506384, -0.04136416,
-0.04941844, 0.01738529, 0.01081963, -0.09932579, -0.01176296]])
In this example, we constructed our model by instantiating an
nn.Sequential
, assigning the returned object to the net
variable. Next, we repeatedly call its add
function, appending
layers in the order that they should be executed. In short,
nn.Sequential
defines a special kind of Block
, the class that
presents a block in Gluon. It maintains an ordered list of constituent
Block
s. The add
function simply facilitates the addition of
each successive Block
to the list. Note that each layer is an
instance of the Dense
class which is itself a subclass of Block
.
The forward propagation (forward
) function is also remarkably
simple: it chains each Block
in the list together, passing the
output of each as the input to the next. Note that until now, we have
been invoking our models via the construction net(X)
to obtain their
outputs. This is actually just shorthand for net.forward(X)
, a slick
Python trick achieved via the Block
class’s __call__
function.
import torch
from torch import nn
from torch.nn import functional as F
net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
X = torch.rand(2, 20)
net(X)
tensor([[-0.0227, 0.0174, -0.2084, 0.2129, 0.4705, 0.2038, -0.0006, 0.0777,
0.0678, -0.0851],
[ 0.0294, 0.1090, -0.1026, 0.2531, 0.4979, 0.0136, -0.0542, 0.0829,
-0.1305, -0.0116]], grad_fn=<AddmmBackward0>)
In this example, we constructed our model by instantiating an
nn.Sequential
, with layers in the order that they should be executed
passed as arguments. In short, nn.Sequential
defines a special kind
of Module
, the class that presents a block in PyTorch. It maintains
an ordered list of constituent Module
s. Note that each of the two
fully-connected layers is an instance of the Linear
class which is
itself a subclass of Module
. The forward propagation (forward
)
function is also remarkably simple: it chains each block in the list
together, passing the output of each as the input to the next. Note that
until now, we have been invoking our models via the construction
net(X)
to obtain their outputs. This is actually just shorthand for
net.__call__(X)
.
import tensorflow as tf
net = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dense(10),
])
X = tf.random.uniform((2, 20))
net(X)
<tf.Tensor: shape=(2, 10), dtype=float32, numpy=
array([[-0.3606038 , 0.10865784, 0.00642361, -0.33575526, 0.07707725,
-0.11766398, -0.05345164, 0.04037177, 0.11260508, 0.23689133],
[-0.3886392 , 0.12902877, 0.04228858, -0.1410903 , 0.00691184,
0.06830215, -0.00436489, -0.06233275, 0.25432587, 0.2820235 ]],
dtype=float32)>
In this example, we constructed our model by instantiating an
keras.models.Sequential
, with layers in the order that they should
be executed passed as arguments. In short, Sequential
defines a
special kind of keras.Model
, the class that presents a block in
Keras. It maintains an ordered list of constituent Model
s. Note
that each of the two fully-connected layers is an instance of the
Dense
class which is itself a subclass of Model
. The forward
propagation (call
) function is also remarkably simple: it chains
each block in the list together, passing the output of each as the input
to the next. Note that until now, we have been invoking our models via
the construction net(X)
to obtain their outputs. This is actually
just shorthand for net.call(X)
, a slick Python trick achieved via
the Block class’s __call__
function.
5.1.1. A Custom Block¶
Perhaps the easiest way to develop intuition about how a block works is to implement one ourselves. Before we implement our own custom block, we briefly summarize the basic functionality that each block must provide:
Ingest input data as arguments to its forward propagation function.
Generate an output by having the forward propagation function return a value. Note that the output may have a different shape from the input. For example, the first fully-connected layer in our model above ingests an input of arbitrary dimension but returns an output of dimension 256.
Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation function. Typically this happens automatically.
Store and provide access to those parameters necessary to execute the forward propagation computation.
Initialize model parameters as needed.
Ingest input data as arguments to its forward propagation function.
Generate an output by having the forward propagation function return a value. Note that the output may have a different shape from the input. For example, the first fully-connected layer in our model above ingests an input of dimension 20 but returns an output of dimension 256.
Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation function. Typically this happens automatically.
Store and provide access to those parameters necessary to execute the forward propagation computation.
Initialize model parameters as needed.
Ingest input data as arguments to its forward propagation function.
Generate an output by having the forward propagation function return a value. Note that the output may have a different shape from the input. For example, the first fully-connected layer in our model above ingests an input of arbitrary dimension but returns an output of dimension 256.
Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation function. Typically this happens automatically.
Store and provide access to those parameters necessary to execute the forward propagation computation.
Initialize model parameters as needed.
In the following snippet, we code up a block from scratch corresponding
to an MLP with one hidden layer with 256 hidden units, and a
10-dimensional output layer. Note that the MLP
class below inherits
the class that represents a block. We will heavily rely on the parent
class’s functions, supplying only our own constructor (the __init__
function in Python) and the forward propagation function.
class MLP(nn.Block):
# Declare a layer with model parameters. Here, we declare two
# fully-connected layers
def __init__(self, **kwargs):
# Call the constructor of the `MLP` parent class `Block` to perform
# the necessary initialization. In this way, other function arguments
# can also be specified during class instantiation, such as the model
# parameters, `params` (to be described later)
super().__init__(**kwargs)
self.hidden = nn.Dense(256, activation='relu') # Hidden layer
self.out = nn.Dense(10) # Output layer
# Define the forward propagation of the model, that is, how to return the
# required model output based on the input `X`
def forward(self, X):
return self.out(self.hidden(X))
class MLP(nn.Module):
# Declare a layer with model parameters. Here, we declare two fully
# connected layers
def __init__(self):
# Call the constructor of the `MLP` parent class `Module` to perform
# the necessary initialization. In this way, other function arguments
# can also be specified during class instantiation, such as the model
# parameters, `params` (to be described later)
super().__init__()
self.hidden = nn.Linear(20, 256) # Hidden layer
self.out = nn.Linear(256, 10) # Output layer
# Define the forward propagation of the model, that is, how to return the
# required model output based on the input `X`
def forward(self, X):
# Note here we use the funtional version of ReLU defined in the
# nn.functional module.
return self.out(F.relu(self.hidden(X)))
class MLP(tf.keras.Model):
# Declare a layer with model parameters. Here, we declare two fully
# connected layers
def __init__(self):
# Call the constructor of the `MLP` parent class `Model` to perform
# the necessary initialization. In this way, other function arguments
# can also be specified during class instantiation, such as the model
# parameters, `params` (to be described later)
super().__init__()
# Hidden layer
self.hidden = tf.keras.layers.Dense(units=256, activation=tf.nn.relu)
self.out = tf.keras.layers.Dense(units=10) # Output layer
# Define the forward propagation of the model, that is, how to return the
# required model output based on the input `X`
def call(self, X):
return self.out(self.hidden((X)))
Let us first focus on the forward propagation function. Note that it
takes X
as the input, calculates the hidden representation with the
activation function applied, and outputs its logits. In this MLP
implementation, both layers are instance variables. To see why this is
reasonable, imagine instantiating two MLPs, net1
and net2
, and
training them on different data. Naturally, we would expect them to
represent two different learned models.
We instantiate the MLP’s layers in the constructor and subsequently
invoke these layers on each call to the forward propagation function.
Note a few key details. First, our customized __init__
function
invokes the parent class’s __init__
function via
super().__init__()
sparing us the pain of restating boilerplate code
applicable to most blocks. We then instantiate our two fully-connected
layers, assigning them to self.hidden
and self.out
. Note that
unless we implement a new operator, we need not worry about the
backpropagation function or parameter initialization. The system will
generate these functions automatically. Let us try this out.
net = MLP()
net.initialize()
net(X)
array([[-0.03989595, -0.10414709, 0.06799038, 0.05245074, 0.0252606 ,
-0.00640342, 0.04182098, -0.01665318, -0.02067345, -0.07863816],
[-0.03612847, -0.07210435, 0.09159479, 0.07890773, 0.02494171,
-0.01028665, 0.01732427, -0.02843244, 0.03772651, -0.06671703]])
net = MLP()
net(X)
tensor([[-0.0724, 0.0468, 0.1732, 0.0501, 0.0392, -0.0757, 0.1858, -0.0532,
0.0358, 0.0045],
[-0.0090, 0.1694, 0.0956, 0.0717, 0.1298, -0.0841, -0.0033, -0.0285,
0.0767, 0.0682]], grad_fn=<AddmmBackward0>)
net = MLP()
net(X)
<tf.Tensor: shape=(2, 10), dtype=float32, numpy=
array([[-0.30532813, 0.01244949, 0.0536986 , 0.17476092, -0.05998001,
-0.12069365, 0.04688525, 0.00311747, 0.37231156, -0.16164199],
[-0.38690534, -0.1730837 , -0.0425095 , 0.35398513, 0.0011123 ,
-0.27669314, 0.13319898, -0.17433885, 0.26914883, -0.3020901 ]],
dtype=float32)>
A key virtue of the block abstraction is its versatility. We can
subclass a block to create layers (such as the fully-connected layer
class), entire models (such as the MLP
class above), or various
components of intermediate complexity. We exploit this versatility
throughout the following chapters, such as when addressing convolutional
neural networks.
5.1.2. The Sequential Block¶
We can now take a closer look at how the Sequential
class works.
Recall that Sequential
was designed to daisy-chain other blocks
together. To build our own simplified MySequential
, we just need to
define two key function: 1. A function to append blocks one by one to a
list. 2. A forward propagation function to pass an input through the
chain of blocks, in the same order as they were appended.
The following MySequential
class delivers the same functionality of
the default Sequential
class.
class MySequential(nn.Block):
def add(self, block):
# Here, `block` is an instance of a `Block` subclass, and we assume
# that it has a unique name. We save it in the member variable
# `_children` of the `Block` class, and its type is OrderedDict. When
# the `MySequential` instance calls the `initialize` function, the
# system automatically initializes all members of `_children`
self._children[block.name] = block
def forward(self, X):
# OrderedDict guarantees that members will be traversed in the order
# they were added
for block in self._children.values():
X = block(X)
return X
The add
function adds a single block to the ordered dictionary
_children
. You might wonder why every Gluon Block
possesses a
_children
attribute and why we used it rather than just define a
Python list ourselves. In short the chief advantage of _children
is
that during our block’s parameter initialization, Gluon knows to look
inside the _children
dictionary to find sub-blocks whose parameters
also need to be initialized.
class MySequential(nn.Module):
def __init__(self, *args):
super().__init__()
for idx, module in enumerate(args):
# Here, `module` is an instance of a `Module` subclass. We save it
# in the member variable `_modules` of the `Module` class, and its
# type is OrderedDict
self._modules[str(idx)] = module
def forward(self, X):
# OrderedDict guarantees that members will be traversed in the order
# they were added
for block in self._modules.values():
X = block(X)
return X
In the __init__
method, we add every module to the ordered
dictionary _modules
one by one. You might wonder why every
Module
possesses a _modules
attribute and why we used it rather
than just define a Python list ourselves. In short the chief advantage
of _modules
is that during our module’s parameter initialization,
the system knows to look inside the _modules
dictionary to find
sub-modules whose parameters also need to be initialized.
class MySequential(tf.keras.Model):
def __init__(self, *args):
super().__init__()
self.modules = []
for block in args:
# Here, `block` is an instance of a `tf.keras.layers.Layer`
# subclass
self.modules.append(block)
def call(self, X):
for module in self.modules:
X = module(X)
return X
When our MySequential
’s forward propagation function is invoked,
each added block is executed in the order in which they were added. We
can now reimplement an MLP using our MySequential
class.
net = MySequential()
net.add(nn.Dense(256, activation='relu'))
net.add(nn.Dense(10))
net.initialize()
net(X)
array([[-0.0764568 , -0.01130233, 0.04952145, -0.04651389, -0.04131571,
-0.05884131, -0.06213811, 0.01311471, -0.01379425, -0.02514282],
[-0.05124623, 0.00711232, -0.00155933, -0.07555379, -0.06675334,
-0.01762914, 0.00589085, 0.0144719 , -0.04330775, 0.03317727]])
net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
net(X)
tensor([[-0.0191, -0.0987, 0.0176, 0.0457, 0.0628, 0.0057, -0.2619, 0.0883,
0.0726, -0.1936],
[ 0.0812, -0.2205, 0.1459, 0.0116, -0.0114, 0.0599, -0.2580, 0.0913,
0.1160, -0.2189]], grad_fn=<AddmmBackward0>)
net = MySequential(
tf.keras.layers.Dense(units=256, activation=tf.nn.relu),
tf.keras.layers.Dense(10))
net(X)
<tf.Tensor: shape=(2, 10), dtype=float32, numpy=
array([[-0.11438183, 0.1560243 , -0.29495043, 0.08101085, -0.0454475 ,
0.16877568, 0.16795623, 0.46220922, 0.4001017 , -0.20763123],
[-0.27060828, -0.0067683 , -0.07590526, -0.07891521, -0.2944404 ,
0.05093757, 0.13403754, 0.5880474 , 0.33582437, -0.08513789]],
dtype=float32)>
Note that this use of MySequential
is identical to the code we
previously wrote for the Sequential
class (as described in
Section 4.3).
5.1.3. Executing Code in the Forward Propagation Function¶
The Sequential
class makes model construction easy, allowing us to
assemble new architectures without having to define our own class.
However, not all architectures are simple daisy chains. When greater
flexibility is required, we will want to define our own blocks. For
example, we might want to execute Python’s control flow within the
forward propagation function. Moreover, we might want to perform
arbitrary mathematical operations, not simply relying on predefined
neural network layers.
You might have noticed that until now, all of the operations in our
networks have acted upon our network’s activations and its parameters.
Sometimes, however, we might want to incorporate terms that are neither
the result of previous layers nor updatable parameters. We call these
constant parameters. Say for example that we want a layer that
calculates the function
\(f(\mathbf{x},\mathbf{w}) = c \cdot \mathbf{w}^\top \mathbf{x}\),
where \(\mathbf{x}\) is the input, \(\mathbf{w}\) is our
parameter, and \(c\) is some specified constant that is not updated
during optimization. So we implement a FixedHiddenMLP
class as
follows.
class FixedHiddenMLP(nn.Block):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Random weight parameters created with the `get_constant` function
# are not updated during training (i.e., constant parameters)
self.rand_weight = self.params.get_constant(
'rand_weight', np.random.uniform(size=(20, 20)))
self.dense = nn.Dense(20, activation='relu')
def forward(self, X):
X = self.dense(X)
# Use the created constant parameters, as well as the `relu` and `dot`
# functions
X = npx.relu(np.dot(X, self.rand_weight.data()) + 1)
# Reuse the fully-connected layer. This is equivalent to sharing
# parameters with two fully-connected layers
X = self.dense(X)
# Control flow
while np.abs(X).sum() > 1:
X /= 2
return X.sum()
class FixedHiddenMLP(nn.Module):
def __init__(self):
super().__init__()
# Random weight parameters that will not compute gradients and
# therefore keep constant during training
self.rand_weight = torch.rand((20, 20), requires_grad=False)
self.linear = nn.Linear(20, 20)
def forward(self, X):
X = self.linear(X)
# Use the created constant parameters, as well as the `relu` and `mm`
# functions
X = F.relu(torch.mm(X, self.rand_weight) + 1)
# Reuse the fully-connected layer. This is equivalent to sharing
# parameters with two fully-connected layers
X = self.linear(X)
# Control flow
while X.abs().sum() > 1:
X /= 2
return X.sum()
class FixedHiddenMLP(tf.keras.Model):
def __init__(self):
super().__init__()
self.flatten = tf.keras.layers.Flatten()
# Random weight parameters created with `tf.constant` are not updated
# during training (i.e., constant parameters)
self.rand_weight = tf.constant(tf.random.uniform((20, 20)))
self.dense = tf.keras.layers.Dense(20, activation=tf.nn.relu)
def call(self, inputs):
X = self.flatten(inputs)
# Use the created constant parameters, as well as the `relu` and
# `matmul` functions
X = tf.nn.relu(tf.matmul(X, self.rand_weight) + 1)
# Reuse the fully-connected layer. This is equivalent to sharing
# parameters with two fully-connected layers
X = self.dense(X)
# Control flow
while tf.reduce_sum(tf.math.abs(X)) > 1:
X /= 2
return tf.reduce_sum(X)
In this FixedHiddenMLP
model, we implement a hidden layer whose
weights (self.rand_weight
) are initialized randomly at instantiation
and are thereafter constant. This weight is not a model parameter and
thus it is never updated by backpropagation. The network then passes the
output of this “fixed” layer through a fully-connected layer.
Note that before returning the output, our model did something unusual.
We ran a while-loop, testing on the condition its \(L_1\) norm is
larger than \(1\), and dividing our output vector by \(2\) until
it satisfied the condition. Finally, we returned the sum of the entries
in X
. To our knowledge, no standard neural network performs this
operation. Note that this particular operation may not be useful in any
real-world task. Our point is only to show you how to integrate
arbitrary code into the flow of your neural network computations.
net = FixedHiddenMLP()
net.initialize()
net(X)
array(0.52637565)
net = FixedHiddenMLP()
net(X)
tensor(-0.2081, grad_fn=<SumBackward0>)
net = FixedHiddenMLP()
net(X)
<tf.Tensor: shape=(), dtype=float32, numpy=0.81018776>
We can mix and match various ways of assembling blocks together. In the following example, we nest blocks in some creative ways.
class NestMLP(nn.Block):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.net = nn.Sequential()
self.net.add(nn.Dense(64, activation='relu'),
nn.Dense(32, activation='relu'))
self.dense = nn.Dense(16, activation='relu')
def forward(self, X):
return self.dense(self.net(X))
chimera = nn.Sequential()
chimera.add(NestMLP(), nn.Dense(20), FixedHiddenMLP())
chimera.initialize()
chimera(X)
array(0.9772054)
class NestMLP(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(),
nn.Linear(64, 32), nn.ReLU())
self.linear = nn.Linear(32, 16)
def forward(self, X):
return self.linear(self.net(X))
chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixedHiddenMLP())
chimera(X)
tensor(-0.1082, grad_fn=<SumBackward0>)
class NestMLP(tf.keras.Model):
def __init__(self):
super().__init__()
self.net = tf.keras.Sequential()
self.net.add(tf.keras.layers.Dense(64, activation=tf.nn.relu))
self.net.add(tf.keras.layers.Dense(32, activation=tf.nn.relu))
self.dense = tf.keras.layers.Dense(16, activation=tf.nn.relu)
def call(self, inputs):
return self.dense(self.net(inputs))
chimera = tf.keras.Sequential()
chimera.add(NestMLP())
chimera.add(tf.keras.layers.Dense(20))
chimera.add(FixedHiddenMLP())
chimera(X)
<tf.Tensor: shape=(), dtype=float32, numpy=0.5907136>
5.1.4. Efficiency¶
The avid reader might start to worry about the efficiency of some of these operations. After all, we have lots of dictionary lookups, code execution, and lots of other Pythonic things taking place in what is supposed to be a high-performance deep learning library. The problems of Python’s global interpreter lock are well known. In the context of deep learning, we may worry that our extremely fast GPU(s) might have to wait until a puny CPU runs Python code before it gets another job to run. The best way to speed up Python is by avoiding it altogether.
One way that Gluon does this is by allowing for hybridization, which will be described later. Here, the Python interpreter executes a block the first time it is invoked. The Gluon runtime records what is happening and the next time around it short-circuits calls to Python. This can accelerate things considerably in some cases but care needs to be taken when control flow (as above) leads down different branches on different passes through the net. We recommend that the interested reader checks out the hybridization section (Section 12.1) to learn about compilation after finishing the current chapter.
The avid reader might start to worry about the efficiency of some of these operations. After all, we have lots of dictionary lookups, code execution, and lots of other Pythonic things taking place in what is supposed to be a high-performance deep learning library. The problems of Python’s global interpreter lock are well known. In the context of deep learning, we may worry that our extremely fast GPU(s) might have to wait until a puny CPU runs Python code before it gets another job to run.
The avid reader might start to worry about the efficiency of some of these operations. After all, we have lots of dictionary lookups, code execution, and lots of other Pythonic things taking place in what is supposed to be a high-performance deep learning library. The problems of Python’s global interpreter lock are well known. In the context of deep learning, we may worry that our extremely fast GPU(s) might have to wait until a puny CPU runs Python code before it gets another job to run. The best way to speed up Python is by avoiding it altogether.
5.1.5. Summary¶
Layers are blocks.
Many layers can comprise a block.
Many blocks can comprise a block.
A block can contain code.
Blocks take care of lots of housekeeping, including parameter initialization and backpropagation.
Sequential concatenations of layers and blocks are handled by the
Sequential
block.
5.1.6. Exercises¶
What kinds of problems will occur if you change
MySequential
to store blocks in a Python list?Implement a block that takes two blocks as an argument, say
net1
andnet2
and returns the concatenated output of both networks in the forward propagation. This is also called a parallel block.Assume that you want to concatenate multiple instances of the same network. Implement a factory function that generates multiple instances of the same block and build a larger network from it.