Skip to main content
Matei Zaharia

Matei Zaharia

Follow Matei Zaharia

Matei is the CTO and co-founder of Databricks and an Associate Professor of Computer Science at UC Berkeley. He started the Apache Spark project during his Ph.D. program at UC Berkeley in 2009 and has worked on other widely used data and AI software, including MLflow, Delta Lake, and DBRX. His most recent research is about combining large language models (LLMs) with external data sources, such as search systems, and improving their efficiency and result quality. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award and the U.S. Presidential Early Career Award for Scientists and Engineers (PECASE).

Matei Zaharia's posts

Header graphic for long context RAG part 2

Mosaic Research

October 8, 2024/10 min read

The Long Context RAG Capabilities of OpenAI o1 and Google Gemini

Generating Coding Tests for LLMs: A Focus on Spark SQL

Data Engineering

October 2, 2024/10 min read

Generating Coding Tests for LLMs: A Focus on Spark SQL

Mosaic Research

August 12, 2024/19 min read

Long Context RAG Performance of LLMs

enhancing LLM-as-a-Judge with Grading Notes OG

Generative AI

July 22, 2024/7 min read

Enhancing LLM-as-a-Judge with Grading Notes

Open Sourcing Unity Catalog

Product

June 13, 2024/10 min read

Open Sourcing Unity Catalog

Introducing AI/BI: Intelligent Analytics for Real-World Data

Product

June 11, 2024/9 min read

Introducing AI/BI: Intelligent Analytics for Real-World Data

Unity Catalog Lakeguard - Data Governance for multi-user Apache Spark Clusters

Platform & Products & Announcements

April 24, 2024/5 min read

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache Spark™ clusters

Repeating pattern of diagonally connected red squares with white circles interspersed

Mosaic Research

April 8, 2024/6 min read

DSPy on Databricks

Announcing DBRX: A new standard for efficient open source LLMs

Company

March 27, 2024/4 min read

Announcing DBRX: A new standard for efficient open source LLMs

Lilac Joins Databricks to Simplify Unstructured Data Evaluation for Generative AI

News

March 19, 2024/3 min read

Lilac Joins Databricks to Simplify Unstructured Data Evaluation for Generative AI

Databricks and Neon

Announcements

May 14, 2025/4 min read

Databricks + Neon

The Power of Fine-Tuning on Your Data

Mosaic Research

April 8, 2025/9 min read

The Power of Fine-Tuning on Your Data: Quick Fixing Bugs with LLMs via Never Ending Learning (NEL)

Showing 1 - 12 of 20 results