Daya Khudia

Daya Khudia's posts

Abstract representation of LLM Inference token generation pipeline

March 20, 2024/14 min read

Fast, Secure and Reliable: Enterprise-grade LLM Inference

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

January 31, 2024/7 min read

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

January 4, 2024/15 min read

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

December 21, 2023/3 min read

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

mixtral_social

December 21, 2023/5 min read

Introducing Mixtral 8x7B with Databricks Model Serving

LLM Inference Performance Engineering: Best Practices

October 12, 2023/15 min read

LLM Inference Performance Engineering: Best Practices

Introducing Llama2-70B-Chat with MosaicML Inference

August 24, 2023/12 min read

Introducing Llama2-70B-Chat with MosaicML Inference

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)

April 27, 2023/7 min read

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)

November 9, 2022/3 min read

MosaicML Delivers Leading NLP Performance in MLPerf v2.1

June 29, 2022/4 min read

MosaicML Satisfies the Need for Speed with MLPerf Results