Chapter 4 of The Big Book of Large Language Models is Here!

Chapter 4 of The Big Book of Large Language Models is Here!

Chapter 4 of the Big Book of Large Language Models is finally here! That was a difficult chapter to write! Originally, I wanted to cram in that chapter all the improvements related to the Transformer architecture, since the Attention is all you need paper, but I realized that it would be too long for one chapter. I ended up focusing only on improvements related to the attention layer and delaying things like relative positional encoding and Mixture of Experts to the next chapter. In this chapter, I addressed the following improvements:

Sparse Attention Mechanisms

  • The First Sparse Attention: Sparse Transformers
  • Choosing Sparsity Efficiently: Reformer
  • Local vs Global Attention: Longformer and BigBird

Linear Attention Mechanisms

  • Low-Rank Projection of Attention Matrices: Linformer
  • Recurrent Attention Equivalence: The Linear Transformer
  • Kernel Approximation: Performers

Memory-Efficient Attention

  • Self-attention Does Not Need O(N^2) Memory
  • The FlashAttention

Faster Decoding Attention Mechanisms

  • Multi‑Query Attention
  • Grouped‑Query Attention
  • Multi-Head Latent Attention

Long Sequence Attentions

  • Transformer-XL
  • Memorizing Transformers
  • Infini-Attention

Obviously, I could not include everything that was ever invented in the context of the attention layer, but I believe those use cases capture well the different research routes that have been explored since then. I believe it is a very important chapter, as most materials available online tend to focus on the vanilla self-attention, which starts to be an outdated concept for today’s standards. I also found that trying to understand how to improve the self-attention is a very good way to understand what it is we are trying to improve in the first place! The self-attention may appear odd at first, but diving into the inner workings of the layer in order to improve it gives us a level of understanding that is beyond anything we can learn just by looking at the original self-attention. I hope you will enjoy it!

Looking for corporate training or consulting services for your AI/ML endeavors? Just send me an email: damienb@theaiedge.io


Great content Damien Benveniste, PhD 👌🏼 Looking forward to the next chapter

Like
Reply

I appreciate this, Damien

Like
Reply

Teach us how to prompt images like you!!!!

you've got some image prompting skill 😄

To view or add a comment, sign in

More articles by Damien Benveniste, PhD

Explore content categories