Mosaic AI Foundation Model Serving
Access and query state-of-the-art open foundation models and use them to quickly and easily build applications that leverage a high-quality generative AI model without maintaining your own model deployment.
Foundation Model Serving DBU rates and Throughput
Model | Pay-Per-Token | Provisioned Throughput1 | ||
---|---|---|---|---|
DBU / 1M INPUT tokens (Global) | DBU / 1M OUTPUT tokens (Global) | DBU / hour (Global) | Throughput Band2 (max tokens / sec)3 | |
Current Models | ||||
Llama 3.1 405B | 35.714 | 142.857 | 600.000 | 3,400 |
Llama 3.1 70B | 7.143 | 21.429 | 342.857 | 9,500 |
Llama 3.1 8B | n/a | n/a | 106.000 | 19,000 |
Llama 3.2 3B | n/a | n/a | 92.857 | 22,000 |
Llama 3.2 1B | n/a | n/a | 85.714 | 35,000 |
DBRX | 10.714 | 32.143 | 171.429 | 650 |
Mixtral 8x7B | 7.143 | 14.286 | 290.857 | 5,000 |
GTE | 1.857 | n/a | 20.000 | 9,450 |
Legacy Models | ||||
Llama 3 70B | n/a | n/a | 212.143 | 1,000 |
Llama 3 8B | n/a | n/a | 106.000 | 3,000 |
Llama 2 70B | n/a | n/a | 290.800 | 1,200 |
Llama 2 13B | n/a | n/a | 112.000 | 980 |
MPT 30B | n/a | n/a | 112.000 | 450 |
MPT 7B | n/a | n/a | 20.000 | 2,450 |
BGE Large | 1.429 | n/a | 24.000 | 11,800 |
1: Throughput shown is an example based on a typical real-time use case with input / output of 3500 / 300 tokens. Actual throughput will vary, depending on the use case, query shape and other factors. Input/output ratios do not apply to embedding models.
2: Throughput band is a model-specific maximum throughput (tokens per second) provided at the above per-hour price. With Provisioned Throughput Serving, model throughput is provided in increments of its specific "throughput band"; higher model throughput will require the customer to set an appropriate multiple of the throughput band which is then charged at the multiple of the per-hour price above.
3: Shown for serving on AWS. Some numbers are different on Azure when charged at a different price.
Pay-Per-Token Serving Pricing Examples
Model | Input tokens | Output tokens | Region | Unit price $ / DBU | Total Price |
---|---|---|---|---|---|
Llama 3.1 405B | 4,000,000 | 1,000,000 | US East | $0.070 | $35.00 |
Llama 3.1 70B | 4,000,000 | 1,000,000 | US East | $0.070 | $7.00 |
DBRX | 4,000,000 | 1,000,000 | Europe (Ireland) | $0.077 | $5.78 |
Mixtral 8x7B | 4,000,000 | 1,000,000 | AP (Sydney) | $0.088 | $3.77 |
Provisioned Throughput Serving Pricing Examples
Model | Throughput bands | Hours / month | Region | Unit price $ / DBU | Monthly Price |
---|---|---|---|---|---|
Llama 3.1 405B | 1 | 720 | US East | $0.070 | $35,280 |
Llama 3.1 70B | 1 | 720 | US East | $0.070 | $21,384 |
DBRX | 1 | 720 | US East | $0.070 | $8,640 |
Mixtral 8x7B | 2 | 720 | Europe (Ireland) | $0.077 | $17,424 |
Llama 3.1 8B | 4 | 720 | AP (Sydney) | $0.088 | $26,865 |
Pay as you go with a 14-day free trial or contact us for committed-use discounts or custom requirements.