Product descriptions:
Pilot Company delivers fuel every 22 seconds across the U.S. and Canada. At this scale, the company faced major inefficiencies in its fuel billing process, with truck drivers uploading handwritten delivery forms that required time-consuming, error-prone manual data entry. Early attempts to automate the process with OCR and RPA failed to handle the complexity and variability of over 300 document formats. With the Databricks Platform, Pilot built and refined a GenAI-powered solution that now processes these forms with near-perfect accuracy. What was once a slow and error-prone workflow is now a streamlined, AI-driven pipeline, paving the way for the broader use of GenAI across store operations, sales, and beyond.
Handwritten bills and human error created roadblocks in fuel billing
Pilot Company is North America’s largest operator of travel centers and a critical player in the continent’s fuel distribution network. With 850 locations across 44 U.S. states and 6 Canadian provinces, Pilot serves millions of professional drivers and travelers annually. To better serve their environmentally conscious consumers, Pilot is also heavily investing in EV charging infrastructure and hydrogen fueling stations coast-to-coast.
To operate at this scale, Pilot’s data teams manage terabytes of structured and unstructured data, ranging from point-of-sale transactions to handwritten shipping documents — and have shifted toward GenAI to unlock greater automation and insight.
One of Pilot’s most pressing data challenges involved intelligent document processing for bills of lading (BOLs) — shipping receipts that document the pickup and delivery of fuel shipments. These forms varied wildly: over 300 layouts, a mix of typed and handwritten fields, and inconsistent labeling. Historically, truck drivers took cell phone photos of these documents, manually entered details into a mobile app, and were supported by human reviewers who verified the data visually. “We’ve tried traditional robotic process automation (RPA), but the forms varied so much it wasn’t feasible,” said Travis.
Pilot’s GenAI initiative aimed to automate this highly variable and error-prone process. However, early automation attempts — including classic optical character recognition (OCR) — were brittle and unsustainable. As Travis explained, “Previous attempts at automating this process failed. At first, it sounded like we could just use OCR, but some of these forms are handwritten, which makes it very hard to just pull out the text. Some of the fields are labeled differently, so you need to have a semantic understanding of what the form is trying to tell you, along with business knowledge.” Edge cases multiplied quickly, and prompt engineering became unwieldy because there are so many exceptions.”
In parallel, Pilot faced broader challenges in data and AI infrastructure. They needed a more effective way to experiment with and evaluate large language models (LLMs), especially as they moved from text-based to multi-modal approaches. “One of our biggest roadblocks was evaluating models,” explained Travis. “There really isn’t an equivalent product where we could quickly test different models and prompts at scale, and have it tell us how well they’re doing. We were stuck evaluating results manually, which caused a bottleneck in our ability to iterate quickly with confidence in the results.”
Intelligent document processing at 98.6% accuracy with Databricks Agent Bricks
To automate the complex and error-prone task of processing BOL documents, Pilot Company turned to the Databricks Data Intelligence Platform. With Databricks, they built a scalable, collaborative environment for both traditional ML and GenAI experimentation. Their goals: accelerate innovation, reduce human error, and streamline operations across their growing fleet and retail footprint.
The team’s first step was classic OCR, which extracted text from driver-submitted images and attempted to match it to known document structures. Due to the previously described challenges of relying on OCR (mainly image quality issues and inconsistent labeling), the results were underwhelming, with 76.2% accuracy — barely outperforming the human annotators (71%). From there, Pilot moved to few-shot prompting, using business-defined examples and lookups to help the model learn how to extract key values. This approach improved semantic handling, reduced prompt complexity, and achieved 84.7% model accuracy. It was still, however, limited by poor OCR quality and maintainability suffered from a complexity in the pipeline to retrieve few-shot examples.
The breakthrough came in Phase 3, when the team adopted a multi-modal few-shot approach, feeding the model actual photos rather than extracted text. “We moved to a multi-modal model and had controlled access through Databricks. Feeding the model the actual photos of the BOL rather than extracted text boosted our accuracy all the way up to 94.3%,” said Travis. “We think it’s because of the model’s ability to preserve spatial reasoning — the layout of the checkboxes and form sections. These things previously were lost.”
To further close the accuracy gap, Pilot implemented fuzzy matching — a technique that finds similar, but not identical, elements in datasets — using Delta Lake reference tables. This helped correct OCR errors in product names and terminal addresses, using known values to resolve discrepancies. Travis went on to elaborate, “Even if a photo reads 128 Main Street and it should be 123 Main Street, we can match it back because we know there’s no 128 in our database. We've now boosted accuracy to 98.6%.”
Throughout each phase, the Databricks Platform enabled Pilot to rapidly iterate, evaluate models in a structured and flexible way, and seamlessly integrate data, governance and monitoring into a repeatable and scalable workflow — accelerating the path from prototype to production. “Databricks made our lives 100% easier. Model access and the REST API endpoints are very easy in Databricks,” said Travis. But it wasn’t just about ease of access — it was the platform’s ability to support fast, structured experimentation that proved transformational. Using the AI Gateway, Travis’ team seamlessly connected to foundation models hosted in Amazon Bedrock while using MLflow to manage prompt versioning, logging and experimentation in a unified environment. What was once a manual, one-off evaluation process has become a rigorous and repeatable workflow, accelerating iteration and improving model reliability. “Now, with model evaluations in Databricks,” Travis explained, “we can automatically compare outputs, spot mismatches, and diagnose exactly where and why a model fails — even when subtle errors occur, like values appearing out of order.”
To strengthen quality and operational confidence, Pilot adopted MLflow Tracing to track model evaluations and monitor performance over time. Unity Catalog introduced a layer of governance, enabling the team to enforce access controls and maintain the integrity of both model artifacts and the underlying data. Meanwhile, the team used Monte Carlo to monitor the quality of vectorized document data in Delta tables, detecting schema drift and maintaining performance even as bill of lading (BOL) formats evolved. The result: a resilient, self-healing GenAI pipeline powered by tightly integrated observability, governance, and model flexibility.
99% reduction in manual processing leads to faster revenue recognition
By automating document processing, Pilot Company has fundamentally transformed a previously manual and error-prone workflow into a high-accuracy, scalable GenAI system. The impact across operations has been immediate and significant. In fact, improvements in operational efficiency have created a 90% decrease in overall processing costs.
The system has delivered a 99% reduction in manual document processing, eliminating costly data entry backlogs and accelerating turnaround times on fuel billing. With faster processing, the business can generate invoices and recognize revenue more quickly. Our GenAI app, supported by Databricks, is delivering value in the low seven figures through a combination of cost savings and increased revenue.” As they look toward the next evolution — an agentic system that autonomously enriches missing product or terminal data from trusted external sources — Pilot plans to continue scaling within the Databricks ecosystem. According to Travis, “We’re working on the agentic portion now. Eventually, the agent will go out, verify if the information is correct, and enter it into the database for us. Once we prove it’s reliable, we’ll be able to close the loop entirely.”
Looking ahead, Pilot is also exploring a new wave of productivity-focused GenAI use cases designed to empower internal teams across store operations, sales, and logistics. These include AI-powered chatbots that provide natural language across knowledge bases, enabling team members to ask questions, submit service tickets, order supplies, and more. All without having to dig through outdated documentation or escalate to managers.
It’s all about making employees’ jobs easier. “Databricks gives us a structured, end-to-end toolset for quickly iterating on GenAI applications — from testing and evaluating to experiment tracking — all in a single platform,” explained Travis.
