SESSION
Pioneering a Hybrid SSM Transformer Architecture
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Generative AI |
TECHNOLOGIES | AI/Machine Learning, GenAI/LLMs |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
AI21 Labs presents their latest Foundation Model, Jamba, based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Lenz will deep dive into the decision process that lead them to developing a hybrid architecture and will walk us through a break down of how the architecture is structured with the various layers of SSM, Transformer and MoE. This flexible architecture allows resource- and objective-specific configurations. With unprecedented throughput and the largest context window in its size class of 256K while able to fit 140K on a single GPU, Jamba introduces a paradigm shift in the way large language model builders can be thinking about developing new models
SESSION SPEAKERS
Chen Wang
/Lead Solution Architect
AI21 Labs