Skip to main content

OLMo Is Here, Powered by Databricks

by Jonathan Frankle

February 1, 2024 in Mosaic AI Research


Share this post
graphic

As Chief Scientist (Neural Networks) at Databricks, I lead our research team toward the goal of giving everyone the ability to build and fine-tune AI models with their own data. In 2020, I was part of a small group of machine learning academics and industry veterans that founded MosaicML. We have always been committed to supporting open scientific inquiry, both by sharing our knowledge and providing tools to the community. Since joining Databricks, which shares similar academic roots, we have only deepened that commitment. 

 

With that spirit in mind, we have been collaborating with scientists from the nonprofit Allen Institute for AI (AI2) on everything from technical knowledge-sharing to today’s big announcement: OLMo. In my opinion, AI2 is one of the best NLP labs in the world, even more so because they conduct their cutting-edge research with the unrestrained creativity, commitment to integrity, and resources of a non-profit. We’ve found common ground in a belief in openness, a passion for doing rigorous science, and a love of building artifacts that we put into the hands of the community.

 

Today AI2 is releasing OLMo 7B, an open source, state-of-the-art large language model. Databricks is proud to have supported their work: OLMo (short for Open-source Large Language Model) was trained using our Mosaic AI Model Training Platform. The AI2 team is also sharing the pre-training data and training code used to develop this model (which is a derivative of the MosaicML LLM Foundry).

 

We’re thrilled to have played a part in the success of the OLMo project, but I want to give credit where credit is due. We shared our tools, but they did the hard work of building the models. Pete Walsh, Senior Software Engineer at AI2, said, "Mosaic was a game-changer for developing OLMo. Their platform allowed us to effortlessly scale up training and ablations when needed, while their command-line interface lets us iterate quickly by launching multi-node jobs right from our laptops." AI2’s seamless experience using our training platform validated the work we’ve done to make building and fine-tuning large models as straightforward as possible. To learn more about the OLMo 7B model and its variants, check out AI2’s blog post or the model card on Hugging Face.