.. _sec_pooling:

Pooling
=======


Often, as we process images, we want to gradually reduce the spatial
resolution of our hidden representations, aggregating information so
that the higher up we go in the network, the larger the receptive field
(in the input) to which each hidden node is sensitive.

Often our ultimate task asks some global question about the image, e.g.,
*does it contain a cat?* So typically the units of our final layer
should be sensitive to the entire input. By gradually aggregating
information, yielding coarser and coarser maps, we accomplish this goal
of ultimately learning a global representation, while keeping all of the
advantages of convolutional layers at the intermediate layers of
processing.

Moreover, when detecting lower-level features, such as edges (as
discussed in :numref:`sec_conv_layer`), we often want our
representations to be somewhat invariant to translation. For instance,
if we take the image ``X`` with a sharp delineation between black and
white and shift the whole image by one pixel to the right, i.e.,
``Z[i, j] = X[i, j + 1]``, then the output for the new image ``Z`` might
be vastly different. The edge will have shifted by one pixel. In
reality, objects hardly ever occur exactly at the same place. In fact,
even with a tripod and a stationary object, vibration of the camera due
to the movement of the shutter might shift everything by a pixel or so
(high-end cameras are loaded with special features to address this
problem).

This section introduces *pooling layers*, which serve the dual purposes
of mitigating the sensitivity of convolutional layers to location and of
spatially downsampling representations.

Maximum Pooling and Average Pooling
-----------------------------------

Like convolutional layers, *pooling* operators consist of a fixed-shape
window that is slid over all regions in the input according to its
stride, computing a single output for each location traversed by the
fixed-shape window (sometimes known as the *pooling window*). However,
unlike the cross-correlation computation of the inputs and kernels in
the convolutional layer, the pooling layer contains no parameters (there
is no *kernel*). Instead, pooling operators are deterministic, typically
calculating either the maximum or the average value of the elements in
the pooling window. These operations are called *maximum pooling* (*max
pooling* for short) and *average pooling*, respectively.

In both cases, as with the cross-correlation operator, we can think of
the pooling window as starting from the upper-left of the input tensor
and sliding across the input tensor from left to right and top to
bottom. At each location that the pooling window hits, it computes the
maximum or average value of the input subtensor in the window, depending
on whether max or average pooling is employed.

.. _fig_pooling:

.. figure:: ../img/pooling.svg

   Maximum pooling with a pooling window shape of :math:`2\times 2`. The
   shaded portions are the first output element as well as the input
   tensor elements used for the output computation:
   :math:`\max(0, 1, 3, 4)=4`.


The output tensor in :numref:`fig_pooling` has a height of 2 and a
width of 2. The four elements are derived from the maximum value in each
pooling window:

.. math::


   \max(0, 1, 3, 4)=4,\\
   \max(1, 2, 4, 5)=5,\\
   \max(3, 4, 6, 7)=7,\\
   \max(4, 5, 7, 8)=8.\\

A pooling layer with a pooling window shape of :math:`p \times q` is
called a :math:`p \times q` pooling layer. The pooling operation is
called :math:`p \times q` pooling.

Let us return to the object edge detection example mentioned at the
beginning of this section. Now we will use the output of the
convolutional layer as the input for :math:`2\times 2` maximum pooling.
Set the convolutional layer input as ``X`` and the pooling layer output
as ``Y``. Whether or not the values of ``X[i, j]`` and ``X[i, j + 1]``
are different, or ``X[i, j + 1]`` and ``X[i, j + 2]`` are different, the
pooling layer always outputs ``Y[i, j] = 1``. That is to say, using the
:math:`2\times 2` maximum pooling layer, we can still detect if the
pattern recognized by the convolutional layer moves no more than one
element in height or width.

In the code below, we implement the forward propagation of the pooling
layer in the ``pool2d`` function. This function is similar to the
``corr2d`` function in :numref:`sec_conv_layer`. However, here we have
no kernel, computing the output as either the maximum or the average of
each region in the input.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#mxnet-1-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-1-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-1-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-1-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    from mxnet import np, npx
    from mxnet.gluon import nn
    from d2l import mxnet as d2l
    
    npx.set_np()
    
    def pool2d(X, pool_size, mode='max'):
        p_h, p_w = pool_size
        Y = np.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))
        for i in range(Y.shape[0]):
            for j in range(Y.shape[1]):
                if mode == 'max':
                    Y[i, j] = X[i: i + p_h, j: j + p_w].max()
                elif mode == 'avg':
                    Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
        return Y


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-1-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    import torch
    from torch import nn
    from d2l import torch as d2l
    
    def pool2d(X, pool_size, mode='max'):
        p_h, p_w = pool_size
        Y = torch.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))
        for i in range(Y.shape[0]):
            for j in range(Y.shape[1]):
                if mode == 'max':
                    Y[i, j] = X[i: i + p_h, j: j + p_w].max()
                elif mode == 'avg':
                    Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
        return Y


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-1-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    import tensorflow as tf
    
    
    def pool2d(X, pool_size, mode='max'):
        p_h, p_w = pool_size
        Y = tf.Variable(tf.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w +1)))
        for i in range(Y.shape[0]):
            for j in range(Y.shape[1]):
                if mode == 'max':
                    Y[i, j].assign(tf.reduce_max(X[i: i + p_h, j: j + p_w]))
                elif mode =='avg':
                    Y[i, j].assign(tf.reduce_mean(X[i: i + p_h, j: j + p_w]))
        return Y


.. raw:: html

     </div>


.. raw:: html

     </div>

We can construct the input tensor ``X`` in :numref:`fig_pooling` to
validate the output of the two-dimensional maximum pooling layer.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#mxnet-3-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-3-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-3-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-3-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = np.array([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
    pool2d(X, (2, 2))


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[4., 5.],
           [7., 8.]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-3-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
    pool2d(X, (2, 2))


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[4., 5.],
            [7., 8.]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-3-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
    pool2d(X, (2, 2))


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
    array([[4., 5.],
           [7., 8.]], dtype=float32)>


.. raw:: html

     </div>


.. raw:: html

     </div>

Also, we experiment with the average pooling layer.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#mxnet-5-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-5-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-5-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-5-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d(X, (2, 2), 'avg')


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[2., 3.],
           [5., 6.]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-5-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d(X, (2, 2), 'avg')


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[2., 3.],
            [5., 6.]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-5-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d(X, (2, 2), 'avg')


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
    array([[2., 3.],
           [5., 6.]], dtype=float32)>


.. raw:: html

     </div>


.. raw:: html

     </div>

Padding and Stride
------------------

As with convolutional layers, pooling layers can also change the output
shape. And as before, we can alter the operation to achieve a desired
output shape by padding the input and adjusting the stride. We can
demonstrate the use of padding and strides in pooling layers via the
built-in two-dimensional maximum pooling layer from the deep learning
framework. We first construct an input tensor ``X`` whose shape has four
dimensions, where the number of examples (batch size) and number of
channels are both 1.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-7-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-7-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-7-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-7-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = np.arange(16, dtype=np.float32).reshape((1, 1, 4, 4))
    X


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[[[ 0.,  1.,  2.,  3.],
             [ 4.,  5.,  6.,  7.],
             [ 8.,  9., 10., 11.],
             [12., 13., 14., 15.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-7-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = torch.arange(16, dtype=torch.float32).reshape((1, 1, 4, 4))
    X


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[[[ 0.,  1.,  2.,  3.],
              [ 4.,  5.,  6.,  7.],
              [ 8.,  9., 10., 11.],
              [12., 13., 14., 15.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-7-2">

It is important to note that tensorflow prefers and is optimized for
*channels-last* input.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = tf.reshape(tf.range(16, dtype=tf.float32), (1, 4, 4, 1))
    X


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    <tf.Tensor: shape=(1, 4, 4, 1), dtype=float32, numpy=
    array([[[[ 0.],
             [ 1.],
             [ 2.],
             [ 3.]],
    
            [[ 4.],
             [ 5.],
             [ 6.],
             [ 7.]],
    
            [[ 8.],
             [ 9.],
             [10.],
             [11.]],
    
            [[12.],
             [13.],
             [14.],
             [15.]]]], dtype=float32)>


.. raw:: html

     </div>


.. raw:: html

     </div>

By default, the stride and the pooling window in the instance from the
framework's built-in class have the same shape. Below, we use a pooling
window of shape ``(3, 3)``, so we get a stride shape of ``(3, 3)`` by
default.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#mxnet-9-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-9-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-9-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-9-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2D(3)
    # Because there are no model parameters in the pooling layer, we do not need
    # to call the parameter initialization function
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[[[10.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-9-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2d(3)
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[[[10.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-9-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3])
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    <tf.Tensor: shape=(1, 1, 1, 1), dtype=float32, numpy=array([[[[10.]]]], dtype=float32)>


.. raw:: html

     </div>


.. raw:: html

     </div>

The stride and padding can be manually specified.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#mxnet-11-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-11-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-11-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-11-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2D(3, padding=1, strides=2)
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[[[ 5.,  7.],
             [13., 15.]]]])


Of course, we can specify an arbitrary rectangular pooling window and
specify the padding and stride for height and width, respectively.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2D((2, 3), padding=(0, 1), strides=(2, 3))
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[[[ 5.,  7.],
             [13., 15.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-11-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2d(3, padding=1, stride=2)
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[[[ 5.,  7.],
              [13., 15.]]]])


Of course, we can specify an arbitrary rectangular pooling window and
specify the padding and stride for height and width, respectively.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2d((2, 3), stride=(2, 3), padding=(0, 1))
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[[[ 5.,  7.],
              [13., 15.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-11-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]])
    X_padded = tf.pad(X, paddings, "CONSTANT")
    pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3], padding='valid',
                                       strides=2)
    pool2d(X_padded)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    <tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
    array([[[[ 5.],
             [ 7.]],
    
            [[13.],
             [15.]]]], dtype=float32)>


Of course, we can specify an arbitrary rectangular pooling window and
specify the padding and stride for height and width, respectively.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    paddings = tf.constant([[0, 0], [0, 0], [1, 1], [0, 0]])
    X_padded = tf.pad(X, paddings, "CONSTANT")
    
    pool2d = tf.keras.layers.MaxPool2D(pool_size=[2, 3], padding='valid',
                                       strides=(2, 3))
    pool2d(X_padded)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    <tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
    array([[[[ 5.],
             [ 7.]],
    
            [[13.],
             [15.]]]], dtype=float32)>


.. raw:: html

     </div>


.. raw:: html

     </div>

Multiple Channels
-----------------

When processing multi-channel input data, the pooling layer pools each
input channel separately, rather than summing the inputs up over
channels as in a convolutional layer. This means that the number of
output channels for the pooling layer is the same as the number of input
channels. Below, we will concatenate tensors ``X`` and ``X + 1`` on the
channel dimension to construct an input with 2 channels.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-13-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-13-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-13-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-13-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = np.concatenate((X, X + 1), 1)
    X


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[[[ 0.,  1.,  2.,  3.],
             [ 4.,  5.,  6.,  7.],
             [ 8.,  9., 10., 11.],
             [12., 13., 14., 15.]],
    
            [[ 1.,  2.,  3.,  4.],
             [ 5.,  6.,  7.,  8.],
             [ 9., 10., 11., 12.],
             [13., 14., 15., 16.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-13-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = torch.cat((X, X + 1), 1)
    X


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[[[ 0.,  1.,  2.,  3.],
              [ 4.,  5.,  6.,  7.],
              [ 8.,  9., 10., 11.],
              [12., 13., 14., 15.]],
    
             [[ 1.,  2.,  3.,  4.],
              [ 5.,  6.,  7.,  8.],
              [ 9., 10., 11., 12.],
              [13., 14., 15., 16.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-13-2">

Note that this will require a concatenation along the last dimension for
TensorFlow due to the channels-last syntax.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    X = tf.concat([X, X + 1], 3)  # Concatenate along `dim=3` due to channels-last syntax


.. raw:: html

     </div>


.. raw:: html

     </div>

As we can see, the number of output channels is still 2 after pooling.


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar code"><a href="#mxnet-15-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-15-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-15-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-15-0">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2D(3, padding=1, strides=2)
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    array([[[[ 5.,  7.],
             [13., 15.]],
    
            [[ 6.,  8.],
             [14., 16.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-15-1">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    pool2d = nn.MaxPool2d(3, padding=1, stride=2)
    pool2d(X)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    tensor([[[[ 5.,  7.],
              [13., 15.]],
    
             [[ 6.,  8.],
              [14., 16.]]]])


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-15-2">

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]])
    X_padded = tf.pad(X, paddings, "CONSTANT")
    pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3], padding='valid',
                                       strides=2)
    pool2d(X_padded)


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    <tf.Tensor: shape=(1, 2, 2, 2), dtype=float32, numpy=
    array([[[[ 5.,  6.],
             [ 7.,  8.]],
    
            [[13., 14.],
             [15., 16.]]]], dtype=float32)>


Note that the output for the tensorflow pooling appears at first glance
to be different, however numerically the same results are presented as
MXNet and PyTorch. The difference lies in the dimensionality, and
reading the output vertically yields the same output as the other
implementations.


.. raw:: html

     </div>


.. raw:: html

     </div>

Summary
-------

-  Taking the input elements in the pooling window, the maximum pooling
   operation assigns the maximum value as the output and the average
   pooling operation assigns the average value as the output.
-  One of the major benefits of a pooling layer is to alleviate the
   excessive sensitivity of the convolutional layer to location.
-  We can specify the padding and stride for the pooling layer.
-  Maximum pooling, combined with a stride larger than 1 can be used to
   reduce the spatial dimensions (e.g., width and height).
-  The pooling layer's number of output channels is the same as the
   number of input channels.

Exercises
---------

1. Can you implement average pooling as a special case of a convolution
   layer? If so, do it.
2. Can you implement maximum pooling as a special case of a convolution
   layer? If so, do it.
3. What is the computational cost of the pooling layer? Assume that the
   input to the pooling layer is of size :math:`c\times h\times w`, the
   pooling window has a shape of :math:`p_h\times p_w` with a padding of
   :math:`(p_h, p_w)` and a stride of :math:`(s_h, s_w)`.
4. Why do you expect maximum pooling and average pooling to work
   differently?
5. Do we need a separate minimum pooling layer? Can you replace it with
   another operation?
6. Is there another operation between average and maximum pooling that
   you could consider (hint: recall the softmax)? Why might it not be so
   popular?


.. raw:: html

     <div class="mdl-tabs mdl-js-tabs mdl-js-ripple-effect"><div class="mdl-tabs__tab-bar text"><a href="#mxnet-17-0" onclick="tagClick('mxnet'); return false;" class="mdl-tabs__tab is-active">mxnet</a><a href="#pytorch-17-1" onclick="tagClick('pytorch'); return false;" class="mdl-tabs__tab ">pytorch</a><a href="#tensorflow-17-2" onclick="tagClick('tensorflow'); return false;" class="mdl-tabs__tab ">tensorflow</a></div>


.. raw:: html

     <div class="mdl-tabs__panel is-active" id="mxnet-17-0">

`Discussions <https://discuss.d2l.ai/t/71>`__


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="pytorch-17-1">

`Discussions <https://discuss.d2l.ai/t/72>`__


.. raw:: html

     </div>


.. raw:: html

     <div class="mdl-tabs__panel " id="tensorflow-17-2">

`Discussions <https://discuss.d2l.ai/t/274>`__


.. raw:: html

     </div>


.. raw:: html

     </div>