Skip to main content

How AI is Transforming Data Engineering

Tools & Best Practices for Modern Data Engineering

How AI is Transforming Data Engineering

Published: November 3, 2025

Best Practices3 min read

Summary

  • From AI agents to real-time business intelligence, data is increasingly fueling corporate operations, but manual ETL hinders enterprise AI ambitions.
  • AI is helping data stewards eliminate much of the bespoke work involved in ETL, creating consistency across workloads that helps organizations accelerate innovation and reduce risk.
  • Discuss modern data engineering in the age of AI.

Whether it’s releasing AI agents and applications into production, providing real-time intelligence to employees, or optimizing data warehouses for scale and cost, quality and reliable data pipelines are fundamental.

In most organizations, in-house engineers and architects are now central to efforts to democratize access to AI and analytics. But while data needs are accelerating at a rapid pace, most teams struggle with a shortage of critical skills. This puts more pressure on existing employees and slows down the business from using data and AI to tackle opportunities or solve operational issues.  

To move with the speed the business expects, data and engineering teams are turning to AI to eliminate the manual, bespoke work involved in extracting, transforming, and orchestrating data.

Recently, Simon Whitley, Chief Technology Officer at Advancing Analytics, joined Databricks staff and others at the “From Burnout to Breakthrough: A New Approach to Data Engineering” webinar. Here are the top takeaways.

Traditional ETL processes in the agentic AI era

Few engineers follow a consistent pattern for every data request. Each employee uses their preferred ETL toolkit, creating a web of competing frameworks. Individually, the pipelines may be working great. But collectively, it can become a tangled mess, making it harder to detect and remediate issues.

While this bespoke approach may have worked in the past, the demands of today mean engineers can no longer afford to start from scratch every time. Instead, they have to move even faster to deliver critical assets to an increasingly data and AI-literate workforce. And that’s just the human side of the operations. AI agents will increasingly autonomously build their own data pipelines. They’ll write their own code, and find their own unique way of solving the problem at hand.

Without the right guardrails, the tangled mess becomes even worse. Instead, it’s when AI is combined with a unified framework that the transformation begins.

A declarative approach to data engineering

Every data pipeline starts with intention. The engineer needs to establish the format in which the data needs to be, where it needs to be uploaded, and more. Previously, this was a heavily manual process.

AI enables engineers to simply declare the pipelines they want in natural language, and the system handles the rest. They don't have to scratch every time, or even worry about the underlying tooling. Instead, the AI agents handle it. Similarly, these capabilities should also extend to pipelines created by AI agents. The underlying platform should automatically filter, clean, aggregate, and reshape the data as needed to conform to the standard framework, thereby streamlining the ETL process.

And with every new pipeline created to the same standard, whether by humans or AI agents, companies can mitigate the escalating problem of competing frameworks that lead to disjointed governance, complex IT environments, and poor reliability. This makes it easier for organizations to ultimately deliver the data in a trusted, secure, and compliant way.

Why Data Platforms Are Key to Modern Data Engineering

The job of data engineering is changing – and fast. Agility is paramount. And moving fast requires faster access to trusted, accurate data sets. No longer can engineers, architects, and other roles take a bespoke approach to building data pipelines.

Increasingly, data platforms can handle the rote work involved in ETL workloads. And with AI agents increasingly handling more of the grunt work, platforms can erect the right guardrails to prevent an already tangled mess of competing frameworks from getting worse. 

To learn how Databricks Lakeflow Declarative Pipelines can help simplify your data engineering workloads, watch the full webinar: From Burnout to Breakthrough: A New Approach to Data Engineering.

Never miss a Databricks post

Subscribe to our blog and get the latest posts delivered to your inbox

What's next?

Building High-Quality and Trusted Data Products with Databricks

Best Practices

May 6, 2024/9 min read

Building High-Quality and Trusted Data Products with Databricks

OKR-Centric Delivery Models for Engineering-Focused Enterprises

Best Practices

July 30, 2024/4 min read

OKR-Centric Delivery Models for Engineering-Focused Enterprises