Skip to main content

Bringing Models and Data Closer Together

Databricks Feature Store Now Built Into AutoML
Lu Wang
Jimmy Xu
Wenfei Yan
Justin Wei

January 25, 2023 in Platform Blog

Share this post

We are excited to announce a new AutoML capability to quickly and easily use Feature Store data to improve model outcomes. AutoML users can now simply join Feature Store tables to AutoML data sets to improve model quality. As Machine Learning (ML) gets faster and easier, customers are able to apply this transformational technology to an increasing variety of use cases. This allows customers to find more ways to grow their revenues or reduce their costs using ML. We have already seen many customers using AutoML to solve critical business challenges. Some customers use AutoML to extend their ML expertise while others use it to help accelerate their outcomes. With today's announcement, AutoML is now fully integrated with the Databricks Feature Store.

What is a Feature Store?

A feature store is a centralized data repository that enables data scientists to store, find, and share features. The feature store ensures that the same code used to compute the feature values is used for model training and inference. This creates a curated set of data that modelers can access knowing that they can use this data both to train as well as to deploy their models. Many companies report significant accelerations in experimentation and deployment when utilizing the Feature Store. For example, Director of Data Engineering at Anheuser-Busch InBev said, "It [the Feature Store] has been instrumental in helping us quickly scale our data science capabilities as well as in uniting data engineers and analysts alike with a common source of feature engineering and data transformations."

Getting started with a feature store is easy, any Delta table with a primary key and a timestamp can easily be used in the feature store. You can learn more about the Databricks Feature Store here: AWS, Azure, GCP.

How will this integration accelerate ML outcomes?

Databricks AutoML (AWS, Azure, GCP) was designed to help customers at all levels of technical expertise build and train ML models. AutoML not only provides a high quality candidate model, but also provides the customer with all of the model code in a notebook should the customer want to further tune the model's performance.

In the past AutoML was able to train a model using a table as a training set. Now, customers can improve their model quality by augmenting their AutoML training data with data in their feature store. This makes it easier to train an even more accurate model. AutoML models using the Feature Store integration will automatically capture the feature lineage as well as add the new model to the end to end lineage tracking. This lineage accelerates deployment and provides the tooling to help meet your MLOps and compliance needs.

How do I get started?

In the AutoML experiment page, select a cluster with Databricks Runtime 11.3 LTS ML or above. After selecting the problem type, data set and prediction target, you will see a button in the bottom left of the screen.

Databricks Runtime

Selecting this button will bring up the ability for you to select feature tables to join to your data set as well as the lookup keys that will be used to do the joins.


Once we have identified the tables that we want to join as well as the lookup keys, we can simply hit the "Start AutoML" button and the service will start creating models with both your inputted data and data added from your feature store tables. In this example, augmenting the NYC Yellow Taxi fares data with feature tables brings a 21% improvement to the model fit ( i.e. a decrease from 3.991 to 3.142 in RMSE).

Not only is this integration in the AutoML UI, but the AutoML API now supports programmatically augmenting your training data with feature store tables. You can learn more about the API capabilities here (AWS, Azure, GCP)

As we continue to invest in ways of making ML faster and simpler, we are excited to see how customers improve their workflows and look forward to finding more ways we can help teams achieve their ML objectives.

Try Databricks for free

Related posts

See all Platform Blog posts