a.k.a RNN

family of neural networks for processing sequential data.

e.g. in NLP, modelling the next word given the observed history Recurrent Neural Networks compress the entire history in a fixed length vector, enabling long range correlations to be captured.

RNN is a a feed forward network where the hidden layer is a function of both the input x and the previous hidden layer h, with output layers attached to every hidden layer.

The unrolled recurrent network is a directed acyclic computation graph.

Backpropagation Through Time (BPTT)

Issues with RNNs

  • Lack of parallelizability: Forward and backward passes have O(sequence length) unparallelizable operations. Enter Transformers