Skip to main content
Julian Quevedo

Julian Quevedo

Julian Quevedo's posts

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

AI Research

January 31, 2024/7 min read

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

LLM Inference Performance Engineering: Best Practices

AI Research

October 12, 2023/15 min read

LLM Inference Performance Engineering: Best Practices