SESSION

Pioneering a Hybrid SSM Transformer Architecture

Register or Login

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKGenerative AI
TECHNOLOGIESAI/Machine Learning, GenAI/LLMs
SKILL LEVELIntermediate
DURATION40 min

AI21 Labs presents their latest Foundation Model, Jamba, based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Lenz will deep dive into the decision process that lead them to developing a hybrid architecture and will walk us through a break down of how the architecture is structured with the various layers of SSM, Transformer and MoE. This flexible architecture allows resource- and objective-specific configurations. With unprecedented throughput and the largest context window in its size class of 256K while able to fit 140K on a single GPU, Jamba introduces a paradigm shift in the way large language model builders can be thinking about developing new models

SESSION SPEAKERS

Chen Wang

/Lead Solution Architect
AI21 Labs