ETL framework is the first to both automatically manage infrastructure and bring modern software engineering practices to data engineering, allowing data engineers and analysts to focus on transforming data, not managing pipelines
SAN FRANCISCO – April 5, 2022 – Databricks, the Data and AI company and pioneer of the data lakehouse paradigm, today announced the general availability of Delta Live Tables (DLT), the first ETL framework to use a simple declarative approach to build reliable data pipelines and to automatically manage data infrastructure at scale. Turning SQL queries into production ETL pipelines often requires a lot of tedious, complicated operational work. By using modern software engineering practices to automate the most time consuming parts of data engineering, data engineers and analysts can concentrate on delivering data rather than on operating and maintaining pipelines.
As companies develop strategies to get the most value out of their data, many will hire expensive, highly-skilled data engineers – a resource that is already hard to come by – to avoid delays and failed projects. What is often not well understood is that many of the delays or failed projects are driven by a core issue: it is hard to build reliable data pipelines that work automatically without a lot of operational rigor to keep them up and running. As such, even at a small scale, the majority of a data practitioner’s time is spent on tooling and managing infrastructure to make sure these data pipelines don’t break.
Delta Live Tables is the first and only ETL framework to solve this problem by combining both modern engineering practices and automatic management of infrastructure, whereas past efforts in the market have only tackled one aspect or the other. It simplifies ETL development by allowing engineers to simply describe the outcomes of data transformations. Delta Live Tables then understands dependencies of the full data pipeline live and automates away virtually all of the manual complexity. It also enables data engineers to treat their data as code and apply modern software engineering best practices like testing, error-handling, monitoring, and documentation to deploy reliable pipelines at scale more easily. Delta Live Tables fully supports both Python and SQL and is tailored to work with both streaming and batch workloads.
Delta Live Tables is already powering production use cases at leading companies around the globe like JLL, Shell, Jumbo, Bread Finance, and ADP. “At ADP, we are migrating our human resource management data to an integrated data store on the lakehouse. Delta Live Tables has helped our team build in quality controls, and because of the declarative APIs, support for batch and real-time using only SQL, it has enabled our team to save time and effort in managing our data,” said Jack Berkowitz, Chief Data Officer, ADP.
“The power of DLT comes from something no one else can do – combine modern software engineering practices and automatically manage infrastructure. It’s game-changing technology that will allow data engineers and analysts to be more productive than ever,” said Ali Ghodsi, CEO and Co-Founder at Databricks. “It also broadens Databricks’ reach; DLT supports any type of data workload with a single API, eliminating the need for advanced data engineering skills.”
Learn more on the Databricks blog.
Databricks is the data and AI company. More than 7,000 organizations worldwide — including Comcast, Condé Nast, H&M, and over 40% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.