Skip to main content
Julian Quevedo

Julian Quevedo

Julian Quevedo's posts

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

Mosaic Research

January 30, 2024/7 min read

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

LLM Inference Performance Engineering: Best Practices

Mosaic Research

October 12, 2023/15 min read

LLM Inference Performance Engineering: Best Practices