Deploying dbt on Databricks Just Got Even Simpler
December 6, 2021 in Platform Blog
At Databricks, nothing makes us happier than making our users more productive, which is why we are delighted to announce a native adapter for dbt. It’s now easier than ever to develop robust data pipelines on Databricks using SQL.
dbt is a popular open source tool that lets a new breed of ‘analytics engineer’ build data pipelines using simple SQL. Everything is organized within directories, as plain text, making version control, deployment, and testability simple.
With the new dedicated dbt-databricks adapter available in public preview today, dbt developers can get started by simply running
pip install dbt-databricks. This package is open source, and built on the brilliant work led by dbt Labs and the other contributors who made dbt-spark possible. Not only did we streamline the installation by removing any dependency on ODBC drivers, we embraced dbt’s “convention over configuration” for maximum performance:
- dbt models use the Delta format by default
- Incremental models always leverage Delta Lake’s MERGE statement
- Expensive queries like unique key generation are now accelerated with Photon
More improvements to this adapter are coming as we continue to improve the overall integration between dbt and the Databricks Lakehouse Platform. With record-breaking performance and full support for standard SQL, it is the best place to run data warehousing workloads, including data pipelines built with dbt.
We are also excited about the upcoming addition of dbt Cloud to Partner Connect, Databricks’ one-stop shop for its customers to discover and integrate the best data and AI tools on the market. dbt Cloud is a hosted service made by dbt Labs, which helps data analysts and data engineers collaboratively build and productionize dbt projects. Coming in January, any Databricks customer will be able to start a free trial of dbt Cloud from Partner Connect and automatically integrate the two products. That said, the two products already work great together, and we encourage you to connect dbt Cloud to Databricks today.
Speaking of dbt Labs, we hope to see you at their conference, Coalesce, which begins today! Reynold Xin will be having a fireside chat with Drew Banin, CPO for dbt Labs and Ricardo Portillo will be speaking about building data pipelines for Financial Services leveraging dbt and Databricks. You should definitely check it out and join the conversation on the dbt Community Slack in #coalesce-databricks. We look forward to your feedback!
Stay tuned for more exciting updates on how Databricks works with dbt and watch our Github repository for new releases.