Skip to main content
Mirakl

CUSTOMER
STORY

Cutting onboarding from 28 days to hours with genAI

0

Ops overhead equal to 5 full-time engineers

<24

Hours for supplier catalog onboarding, down from 28 days

30x

Faster iteration and delivery across data and AI teams

Mirakl customer story image

Mirakl, the leading provider of eCommerce software solutions, needed a modern data foundation to securely manage large datasets, bring data science, data engineering, and analytics onto a single platform, and operationalize GenAI use cases. The goal was to scale both analytics and AI while avoiding siloing workflows and data sources. By standardizing on the Databricks Data Intelligence Platform, Mirakl consolidated data into a single source of truth and built its flagship Catalog Transformer — a 100% GenAI Solution cutting supplier catalog onboarding time from weeks to less than a day while reducing operational costs and strengthening collaboration across teams.

Avoiding siloed data teams and slow paths to AI value

Mirakl’s eCommerce platform supports over 100,000 merchants worldwide, generating rapidly growing volumes of operational and product data.

As the company scaled, so did the complexity of its data landscape.

Data science, data engineering, and analytics teams sought a way to avoid working in disconnected environments with no unified source of truth. They needed to be able to share work, reuse pipelines, or consistently measure performance.

At the same time, Mirakl operated under strict data governance requirements. “We can't send our customers’ data to third parties because of data governance regulations,” added Clément Labrugère, Senior Data Scientist and Machine Learning Engineer at Mirakl. “We need to control where the data resides and what we do with it.” These constraints became especially limiting as Mirakl began exploring large language models (LLMs) for automating supplier catalog onboarding. This process traditionally took an average of 28 days for each new vendor catalog to be processed.

To keep pace with merchant demand and unlock GenAI at scale, Mirakl needed a single, secure platform that could centralize data, support advanced ML and LLM workflows, and give every data team shared, governed access.

Unifying data and powering Catalog Transformer and Mirakl Nexus with Databricks

To break down silos and enable end-to-end AI, Mirakl chose Databricks as the backbone of its data and machine learning infrastructure. The data engineering team first built a streamlined data platform on Databricks to consolidate marketplace data in near real-time, providing internal teams with a centralized environment where they can query all the data they need from a single location. Governance standards for security and user access were also centralized, allowing Mirakl to maintain strict control over sensitive customer data while enabling broad, role-based collaboration.

On this foundation, Mirakl built Catalog Transformer, an end-to-end GenAI solution that fully automates vendor catalog onboarding. All catalog data is stored and managed in Delta tables to ensure reliability, performance, and transactional guarantees. Unity Catalog governs data and ML assets with consistent permissions, lineage, and traceability, while MLflow manages the complete model lifecycle from experiment tracking and registry to deployment.

“Databricks lets us treat models and data as first-class, governed assets,” said Pierre Lourdelet, Data Scientist at Mirakl. “We can experiment quickly, productionize with confidence, and still know exactly which model version and dataset powered each decision.”

A multi-stage pipeline, orchestrated with Lakeflow Jobs, serves endpoints through Model Serving, which runs a mix of models — including GPT, Llama, Mistral, and multimodal models like CLIP — to categorize, extract, enrich, and automatically rewrite vendor data. Spark parallelizes processing across extensive catalogs, allowing the system to handle thousands of SKUs concurrently. Thanks to a layered model approach, the team can switch between providers such as OpenAI, Mistral AI, Anthropic, or fine-tuned open source models with just a few lines of configuration, without rewriting business logic.

Extending the same foundation to Mirakl Nexus

Building on the success of Catalog Transformer, Mirakl developed Mirakl Nexus. This neutral intelligence layer utilizes the same Databricks foundation to support agentic commerce, meaning autonomous shopper and merchant workflows on AI platforms. Mirakl Nexus leverages governed data, flexible orchestration, and rapid experimentation to enhance core commerce flows, including product discovery, catalog comparison, multi-merchant baskets, transaction execution, and post-purchase support.

Mirakl Nexus uses Agent Bricks to orchestrate production-ready agents that can safely reason over retailer, merchant, and catalog data. Unity Catalog secures all data access, MLflow manages agent and model versions, and Lakeflow Jobs coordinates multi-step agent workflows with full lineage and traceability. A core enabler of both Catalog Transformer and Mirakl Nexus is the native availability of OpenAI GPT models on Databricks, with access, monitoring, and auditability managed centrally through AI Gateway. This allows Mirakl to build agents that interpret product attributes, evaluate policy constraints, extract structured data, ground decisions in retailer rules, and autonomously coordinate actions across workflows.

By combining Databricks’ data intelligence foundation with these agentic capabilities, Mirakl delivers real-time, commerce-specific agents that act on behalf of shoppers and merchants, enabling faster, more intelligent buying and selling experiences.

From 28 days to under 24 hours: tangible AI impact and lower costs

By moving its data and GenAI workloads onto Databricks, Mirakl has significantly increased the speed and quality of value it delivers to customers while optimizing resource use. Catalog Transformer now converts heterogeneous vendor catalogs into enriched, standardized product data in under 24 hours, down from an average of 28 days with manual processes — a roughly 30-fold improvement in turnaround time.

This acceleration enables marketplace operators to launch new sellers more quickly, expand their assortment, and maintain cleaner product data at scale. “What used to be a multi-week, error-prone process has become a one-day, fully traceable pipeline,” said Pierre. “Sellers can reallocate that capacity from manual catalog work to high-value, revenue-generating activities. Together with OpenAI models running natively on Databricks, Catalog Transformer achieves a 91% reduction in onboarding time and approximately 50% fewer categorization errors, directly improving time-to-revenue for merchants and the shopping experience for end customers.

Operationally, Databricks has reduced the burden of infrastructure management. Data engineering teams keep full control over where data resides and who can access it, while autoscaling clusters and managed services remove the need for manual provisioning and tuning. Senior Data Engineer Nicolas Achereiner estimates they would need five additional engineers to maintain equivalent infrastructure on their own.

The shared platform has also improved collaboration and morale. Data scientists build and deploy models directly where the data sits, analysts trust that metrics are consistent across use cases, and product teams rely on a single, governed source of truth. “Having this common platform and common resource has really helped morale, collaboration, and performance,” said Samuel Baker, Staff Analytics Engineer at Mirakl. “We really feel like we are now part of a big happy data family, working alongside the other teams.”

Mirakl Nexus builds on this success by extending GenAI beyond onboarding into live commerce flows. Agent-driven interactions promise faster conversion, richer product discovery, better data quality across the marketplace ecosystem, and more efficient operations for merchants and retailers. Taken together, Catalog Transformer and Mirakl Nexus show how the Databricks and OpenAI partnership enables Mirakl to scale AI value across teams and products, transforming both internal productivity and customer experience.