Skip to main content

Mosaic AI Foundation Model Serving

Access and query state-of-the-art open foundation models and use them to quickly and easily build applications that leverage a high-quality generative AI model without maintaining your own model deployment.

Loading...

* For regional availability: AWS, Azure

Foundation Model Serving DBU rates and Throughput

ModelPay-Per-TokenProvisioned Throughput1
DBU / 1M INPUT tokens
(Global)
DBU / 1M OUTPUT tokens
(Global)
DBU / hour
(Global)
Throughput Band2
(max tokens / sec)3
Current Models
Llama 3.1 405B35.714142.857600.0003,400
Llama 3.1 70B7.14321.429342.8579,500
Llama 3.1 8Bn/an/a106.00019,000
Llama 3.2 3Bn/an/a92.85722,000
Llama 3.2 1Bn/an/a85.71435,000
DBRX10.714 32.143171.429650
Mixtral 8x7B 7.143 14.286290.857 5,000
GTE1.857n/a20.0009,450
Legacy Models
Llama 3 70Bn/an/a 212.143 1,000
Llama 3 8Bn/an/a106.0003,000
Llama 2 70B n/a n/a 290.8001,200
Llama 2 13Bn/an/a112.000980
MPT 30B n/a n/a 112.000 450
MPT 7B n/a n/a 20.000 2,450
BGE Large 1.429 n/a24.00011,800

1: Throughput shown is an example based on a typical real-time use case with input / output of 3500 / 300 tokens. Actual throughput will vary, depending on the use case, query shape and other factors. Input/output ratios do not apply to embedding models.

2: Throughput band is a model-specific maximum throughput (tokens per second) provided at the above per-hour price.  With Provisioned Throughput Serving, model throughput is provided in increments of its specific "throughput band"; higher model throughput will require the customer to set an appropriate multiple of the throughput band which is then charged at the multiple of the per-hour price above.

3: Shown for serving on AWS.  Some numbers are different on Azure when charged at a different price.

Pay-Per-Token Serving Pricing Examples

ModelInput tokensOutput tokensRegionUnit price
$ / DBU
Total Price
Llama 3.1 405B4,000,0001,000,000US East$0.070$35.00
Llama 3.1 70B4,000,0001,000,000US East$0.070$7.00
DBRX4,000,0001,000,000Europe (Ireland)$0.077$5.78
Mixtral 8x7B4,000,0001,000,000AP (Sydney)$0.088$3.77

Provisioned Throughput Serving Pricing Examples

ModelThroughput bandsHours / monthRegionUnit price
$ / DBU
Monthly Price
Llama 3.1 405B1720US East$0.070$35,280
Llama 3.1 70B1720US East$0.070$21,384
DBRX1720US East$0.070$8,640
Mixtral 8x7B2720Europe (Ireland)$0.077$17,424
Llama 3.1 8B4720AP (Sydney)$0.088$26,865

Pay as you go with a 14-day free trial or contact us for committed-use discounts or custom requirements.

Mosaic AI Foundation Model Serving FAQ