Chapter 4 of The Big Book of Large Language Models is Here!
Chapter 4 of the Big Book of Large Language Models is finally here! That was a difficult chapter to write! Originally, I wanted to cram in that chapter all the improvements related to the Transformer architecture, since the Attention is all you need paper, but I realized that it would be too long for one chapter. I ended up focusing only on improvements related to the attention layer and delaying things like relative positional encoding and Mixture of Experts to the next chapter. In this chapter, I addressed the following improvements:
Sparse Attention Mechanisms
- The First Sparse Attention: Sparse Transformers
- Choosing Sparsity Efficiently: Reformer
- Local vs Global Attention: Longformer and BigBird
Linear Attention Mechanisms
- Low-Rank Projection of Attention Matrices: Linformer
- Recurrent Attention Equivalence: The Linear Transformer
- Kernel Approximation: Performers
Memory-Efficient Attention
- Self-attention Does Not Need O(N^2) Memory
- The FlashAttention
Faster Decoding Attention Mechanisms
- Multi‑Query Attention
- Grouped‑Query Attention
- Multi-Head Latent Attention
Long Sequence Attentions
- Transformer-XL
- Memorizing Transformers
- Infini-Attention
Obviously, I could not include everything that was ever invented in the context of the attention layer, but I believe those use cases capture well the different research routes that have been explored since then. I believe it is a very important chapter, as most materials available online tend to focus on the vanilla self-attention, which starts to be an outdated concept for today’s standards. I also found that trying to understand how to improve the self-attention is a very good way to understand what it is we are trying to improve in the first place! The self-attention may appear odd at first, but diving into the inner workings of the layer in order to improve it gives us a level of understanding that is beyond anything we can learn just by looking at the original self-attention. I hope you will enjoy it!
Looking for corporate training or consulting services for your AI/ML endeavors? Just send me an email: damienb@theaiedge.io
Great content Damien Benveniste, PhD 👌🏼 Looking forward to the next chapter
I appreciate this, Damien
Teach us how to prompt images like you!!!!
you've got some image prompting skill 😄
Good luck with the book Damien Benveniste, PhD