Data engineers are increasingly focused on one core problem: using AI to improve ETL and build reliable, production-grade pipelines without introducing new complexity. They need AI that actually delivers, streamlining workflows without adding disconnected tools or stripping away context.
Databricks Lakeflow brings a unified data engineering platform with embedded and secure AI that automates your entire data processing, unlocks more insights, and supports a broader range of business problems. Whether it’s with AI-generated pipeline code or orchestrating AI workloads, data engineers who leverage Lakeflow can avoid spending hours on manual glue work and instead focus on strategic and higher-value patterns that drive real impact to their business.
In this blog, we’ll explore how you can productize and scale your AI models by implementing them into your data pipeline to automatically unlock business insights.
Data teams are drowning in unstructured inputs, whether it’s contracts, invoices, transcripts, or reviews. Processing them often means juggling brittle NLP models, rigid rules, or manual cleanup. The result: unreliable outputs, slow turnaround, and valuable insights locked inside documents while engineers burn time on repetitive parsing instead of building impact.
With Databricks Lakeflow, you can solve this by seamlessly incorporating AI-powered transformations into your existing workflows through Databricks Agent Bricks AI Functions. These functions let you integrate high-quality AI directly into your ETL process, automating the extraction, transformation, and classification of both unstructured and structured data at scale.
There are several types of AI functions in Agent Bricks you can choose from. Some of which don’t require prompts and are task-specific, such as:
ai_extract : extract specific entities from input text based on the labels you provide. For example, person, location, organizationai_classify : Classify input text according to labels you provide. For example, “urgent” vs. “not urgent,” or topic categories.ai_translate : Translate text to a specified target language.We’re particularly excited about our recently launched AI function ai_parse_document, which can be used to transform any unstructured data into the structured formats needed. Using multimodal foundation models, ai_parse_doc allows you to parse text, extract tables, reason over figures, and turn images into AI-generated descriptions. This function opens new possibilities for processing data that were previously nearly impossible to analyze. Learn more here

We also offer a more general function called ai_query(), powered by our serverless batch inference platform. This function enables you to run AI-driven transformations across large datasets using any LLM of your choice in one go.
To maximize performance over millions of rows, our serverless batch inference engine automatically provisions and scales compute resources and executes workloads in parallel. This removes per-request overhead and delivers significantly faster processing, reducing runtimes from hours to minutes while improving cost efficiency for high-volume AI workloads.
With Lakeflow, you can easily productionize your AI models and orchestrate them natively in your data engineering solution using Lakeflow Jobs. With AI functions, you can bring more efficiency to your orchestration and unlock more use cases, such as:
Combining Lakeflow and Agent Bricks enables you to run your AI models on a single, unified governed data platform, so your AI - and the insights it extracts - have the right business and enterprise context.
Imagine your sales team needs a reliable way to turn long, unstructured call transcripts into clear, actionable summaries. With hundreds of calls per day - many lasting 45 to 60 minutes - manual review quickly becomes impossible.
With Databricks, you can leverage built-in AI functions to easily and quickly analyze all those transcripts, extract key insights, and generate follow-up recommendations.
Instead of building a separate AI service or managing custom agents, you can just write a query and run it as part of your orchestrator with Lakeflow Jobs. Your AI model is then implemented directly in a governed and unified data engineering platform, where you get scalable batch processing that stays fully integrated with your existing sales pipeline workflows while remaining in the right business and enterprise context.
Let’s walk through how this works in practice. After ingesting call transcripts into your pipeline, you can apply AI functions to convert unstructured text into usable signals:
ai_analyze_sentiment to surface the overall sentiment of the call (positive, negative, neutral)ai_extract to extract key information from the calls, including customer name, company name, job title, phone number, etc.ai_classify to categorize the type of call (urgency, topic, etc.)This gives you a structured foundation for downstream analytics and automation.
Next, use ai_query to summarize each call using the AI model of your choice (in our example, we are using a “databricks-meta-llama-3-3-70b-instruct” LLM):
This query produces consistent, high-quality summaries that sales and account teams can review at a glance.
You can then generate personalized follow-ups in the same workflow:
These notes can then be pushed directly into your CRM or sales tools at scale, so your teams know exactly the right course of action to take shortly after the call ends. You could also share those notes with your BI team to uncover gaps and help improve the overall customer service experience.
Imagine you’re building a claims processing pipeline for an insurance provider that needs faster, more consistent approvals. Today, claims often arrive via email with unstructured attachments, such as scanned documents, photos, and PDFs, making them difficult to ingest and process at scale.
With Agent Bricks and Lakeflow, data engineers can use ai_parse_document and ai_query to automatically extract, normalize, and consolidate data from incoming emails as part of their ETL pipelines. This enables reliable, end-to-end automation that reduces manual review, accelerates decisions, and integrates seamlessly into existing data workflows.
Here’s how that would work:
Using Lakeflow and Agent Bricks, you can ingest your email files into your lakehouse and then extract the data you need with:
ai_query to read the email body and extract key information (for example: name, date of birth, address, social security number)ai_query with a model that can specifically read the type of image coming in. This AI function will generate text describing the attached image and extract its metadata. Below is a SQL query example of that function:ai_parse_document to read any PDF (jpg or png) attached to the emailOnce the data is extracted, you can use ai_query again to consolidate all the information into a file that can either be reused in another workflow or shared directly to a downstream team (BI analyst, AI/ML team, etc.), depending on your use case.
Below is a DAG example of what that workflow would look like in Lakeflow Jobs:

There’s so much more you can do combining Lakeflow and Agent Bricks - check out this video to learn how you can turn messy sales data into AI-driven marketing campaigns.
Many Databricks customers and data engineers have successfully addressed various business issues - pricing, customer success, and marketing - using AI and Lakeflow to unlock insights and boost productivity.
Kard, a New York-based fintech company, uses Agent Bricks AI functions to power a scalable, accurate transaction categorization system that replaces manual and inconsistent legacy methods. This modern approach enables Kard to efficiently process billions of transactions, deliver personalized rewards, and provide richer insights that drive loyalty and business value.
The data engineering team at Banco Bradesco, one of Latin America’s largest banks, faced productivity bottlenecks due to lengthy coding, debugging, and documentation processes. By adopting Databricks Assistant, they cut coding time by 50% and empowered both technical and non-technical users to generate and troubleshoot code using natural language — democratizing data access, reducing costs, and speeding up data-driven decisions.
Locala, a global omnichannel advertising platform, used Lakeflow Jobs to orchestrate complex LLM training pipelines, which its previous scheduler, Airflow, could not handle. By streamlining ETL, model training and experimentation, and compute selection, Lakeflow Jobs removed the operational burden of managing complex workflows, allowing a single data scientist to build a GenAI Assistant that became a key sales feature for the ad-tech company.
With Lakeflow, you can easily integrate AI capabilities into your data engineering platform and orchestrate AI workflows, making your data processes more efficient, insights-driven, and accessible. And we have more to come! Soon, you will be able to use Databricks Genie to power your data engineering platform for pipeline authoring and debugging using natural language processing.
Product
November 21, 2024/3 min read
Product
December 10, 2024/7 min read


