Scaling Transformer to Output Over 2 Million Words With RMT

Recurrent Memory Transformer retains information across up to 2 million tokens (words). Applying Transformers to long texts does not necessarily require large amounts of memory. By employing a recurrent approach and memory, the quadratic complexity can be reduced to linear. The models trained on sufficiently large inputs can extrapolate their abilities to texts orders of magnitude longer. Synthetic tasks explored in this study serve as the first milestone for enabling RMT to generalize to tasks with unseen properties, including language modelling. In future work, the researchers aim to tailor the recurrent memory approach to the most commonly used Transformers to improve their effective context size.

This enables the LLM (Large Language Models) to output long novels.

Les Miserables by Victor Hugo has about 545,000 words.
The total Harry Potter (by JK Rowling) series word count is 1,084,625 words.
Lord of the Rings by JRR Tolkien is 550,000 words.
The current five novels of the Game of Thrones series is nearly 1.74 million words.
The 14 books in the Wheel of Time by Robert Jordan is almost 4.3 million words.
War and Peace by Tolstoy has 587,287 words.

By leveraging the Recurrent Memory Transformer architecture, researchers have successfully increased the model’s effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. The method allows for the storage and processing of both local and global information and enables information flow between segments of the input sequence through the use of recurrence. The approach holds significant potential to enhance long-term dependency handling in natural language understanding and generation tasks as well as enable large-scale context processing for memory-intensive applications.

Other AI Developments This Past Week