Skip to main content
Daya Khudia

Daya Khudia

Daya Khudia's posts

Abstract representation of LLM Inference token generation pipeline

Mosaic Research

March 20, 2024/14 min read

Fast, Secure and Reliable: Enterprise-grade LLM Inference

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

Mosaic Research

January 30, 2024/7 min read

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

Mosaic Research

January 4, 2024/15 min read

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

Mosaic Research

December 21, 2023/3 min read

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

mixtral_social

Engineering

December 21, 2023/5 min read

Introducing Mixtral 8x7B with Databricks Model Serving

LLM Inference Performance Engineering: Best Practices

Mosaic Research

October 12, 2023/15 min read

LLM Inference Performance Engineering: Best Practices

Introducing Llama2-70B-Chat with MosaicML Inference

Mosaic Research

August 24, 2023/12 min read

Introducing Llama2-70B-Chat with MosaicML Inference

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)

Mosaic Research

April 27, 2023/7 min read

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)

Mosaic Research

November 9, 2022/3 min read

MosaicML Delivers Leading NLP Performance in MLPerf v2.1

Mosaic Research

June 29, 2022/4 min read

MosaicML Satisfies the Need for Speed with MLPerf Results