.. _chap_modern_rnn:

Modern Recurrent Neural Networks
================================


We have introduced the basics of RNNs, which can better handle sequence
data. For demonstration, we implemented RNN-based language models on
text data. However, such techniques may not be sufficient for
practitioners when they face a wide range of sequence learning problems
nowadays.

For instance, a notable issue in practice is the numerical instability
of RNNs. Although we have applied implementation tricks such as gradient
clipping, this issue can be alleviated further with more sophisticated
designs of sequence models. Specifically, gated RNNs are much more
common in practice. We will begin by introducing two of such widely-used
networks, namely *gated recurrent units* (GRUs) and *long short-term
memory* (LSTM). Furthermore, we will expand the RNN architecture with a
single unidirectional hidden layer that has been discussed so far. We
will describe deep architectures with multiple hidden layers, and
discuss the bidirectional design with both forward and backward
recurrent computations. Such expansions are frequently adopted in modern
recurrent networks. When explaining these RNN variants, we continue to
consider the same language modeling problem introduced in
:numref:`chap_rnn`.

In fact, language modeling reveals only a small fraction of what
sequence learning is capable of. In a variety of sequence learning
problems, such as automatic speech recognition, text to speech, and
machine translation, both inputs and outputs are sequences of arbitrary
length. To explain how to fit this type of data, we will take machine
translation as an example, and introduce the encoder-decoder
architecture based on RNNs and beam search for sequence generation.

.. toctree::
    :maxdepth: 2

    gru
    lstm
    deep-rnn
    bi-rnn
    machine-translation-and-dataset
    encoder-decoder
    seq2seq
    beam-search