Skip to main content
Syngenta

CUSTOMER
STORY

Protecting crops today to feed billions tomorrow

200

Data products with more published every week

70%

Faster time to value

50%

Decrease in engineering costs

Person examining young plants in a field with a tablet.

Syngenta is a global science company that designs, manufactures and sells crop protection products (e.g., herbicides, fungicides, insecticides and biologicals) and high-performing seeds to farmers worldwide. Their offerings help growers protect plants, improve yields and feed a growing global population with less land under challenging climate conditions. To meet these goals, their teams needed to harness the power of data and digital technologies to discover, design and develop novel and differentiated products. Yet, Syngenta’s research and development (R&D) team struggled with siloed data across 70-plus countries, legacy systems and months-long delays to access critical insights. Following the mantra “think big, start small, scale fast,” the data team brought together R&D use cases and diverse data on Databricks, leading to Gaia, the first crop protection R&D data platform for analytics and innovation. As a result, they’ve cut data access times from months to just minutes, accelerating product development and empowering scientists to deliver more robust and differentiated crop protection solutions.

Overcoming fragmented data to protect global crops

As a global agrochemical and seed company serving the agriculture industry, Syngenta’s mission is to secure the global food supply by protecting crops from weeds, fungal diseases and insect pests. With 6,000-plus research and development (R&D) employees across approximately 70 countries, they aim to help growers feed nearly 10 billion people, with shrinking resources and increasingly adverse climate conditions at the forefront. By using digital technologies, R&D aims to deliver better products into the hands of growers, faster, while reducing costs. Having a broad and diverse data landscape that spans everything from microscopic molecular research to macroscopic drone and satellite imagery collected in the last few decades, R&D aims to leverage connected, AI-ready data across systems, projects and regions to explore, analyze and better understand molecule, crop, product and pest behavior. This data exploration and exploitation will help to establish a strong pipeline of products that meet the needs of diverse growers, food chain stakeholders and regulators across different geographies and conditions. 

One way they can achieve this is through the development of a solution that brings together external commercial and internal data on how growers use products in the field. The information gathered provides R&D teams with a clearer picture of usage patterns of their products alongside other products and technologies across crops and regions. With these newfound insights, project and technical managers can better align new product specifications to the realities of grower behavior in various markets, ensuring development efforts target the most relevant needs to reduce costly trial and error. 

While this solution helps researchers understand product behavior at field scale, innovation also depends on what happens in the lab. By integrating Internet of Things (IoT) telemetry from lab instruments, Syngenta gives teams visibility into device availability, instrument activity and maintenance needs, guaranteeing experiments run smoothly and are not delayed by downtime. At the molecular level, lab scientists aim to explore and analyze protein targets in pests like weeds, fungi and insects, with data analytics improving early research precision and quality. Equally important is making these findings broadly accessible and reproducible. Through business intelligence (BI) dashboards and data science applications, nontechnical data users — such as scientists, engineers and project managers — can explore and manage data directly, design new data products, decrease their reliance on IT and have the ability to act more quickly on findings.

The hidden cost of inaccessible data 

Although Syngenta’s R&D teams worked with enormous amounts of data, much of it was locked away in fragmented systems. Decades of legacy applications and technologies — including Oracle, Talend and Vertica — had been built to capture and store information rather than share it, leaving valuable datasets scattered across hundreds of tools and databases. “Even our most experienced analysts struggled just to figure out what data existed and where it was stored, as well as their provenance and ownership. And once they did, they often had to go through lengthy processes for access, meaning scientists and data scientists could wait as long as nine months before they had the information needed to start a project,” Maks Kiamos Shah, Head of Data Design Authority at Syngenta, explained.

Unfortunately, the diversity of data only added to the complexity. Researchers were working with everything from lab results and drone imagery to external contractor reports and commercial data. Bringing those sources together was slow and technically demanding, which dragged out product development. With R&D cycles already stretching eight to 12 years, inefficiencies like these added unnecessary delays at a time when feeding a growing global population demands faster answers. Breaking through these bottlenecks, Syngenta turned to Databricks as the foundation for Gaia, the first crop protection R&D platform for analytics and innovation. Named after the Greek goddess known as Mother Earth, the new platform has grown to encompass hundreds of robust data pipelines and catalogued data products, hundreds of onboarded self-serve power users and dozens of data science applications.

Transforming crop protection R&D with a shared data foundation

With the goal to unify fragmented R&D data, democratize access and enable data domain teams to explore, prepare, manage and share data more effectively, Syngenta created Gaia — which would become their proprietary data intelligence platform. Built on Databricks and AWS, the company migrated from their legacy data warehouses and adopted a federated Data Mesh approach as the foundation of their global data strategy. The latter has been based on the pillars of value-driven use cases, data domain ownership, self-service and data products. This effort began with Delta Lake, the open source storage layer that makes data lakes reliable for analytics. Serving as Gaia’s foundation, it brought Syngenta’s diverse R&D data into one trusted environment. Whether working with lab results, regulatory submissions, drone imagery or external reports, Delta Lake standardized formats and enforced reliability, opening up the possibility for teams across research sites to collaborate on trusted data. It also provided the flexibility to handle both batch and streaming ingestion. With this change, IoT telemetry and commercial data now flowed into the same system, where teams could access historical snapshots of data for reproducibility, audits and long-term projects.

All the while, Unity Catalog acted as Gaia’s central governance solution, managing permissions and access controls while making data products discoverable across domains. It provided the lineage and transparency needed for regulatory and scientific traceability, ensuring teams could see where data originated and how it had been processed, and that it met the standards required for sharing and reuse. Building on that governance framework, Databricks SQL became the core engine driving day-to-day work with the data itself. This component of the platform supported pipelines, queries and BI dashboards while giving Syngenta’s data engineers and analysts a consistent environment to build products, run analyses and monitor pipeline health with greater visibility. 

As more R&D data domains onboarded data and use cases into Gaia, the pipelines, dashboards and BI tools expanded alongside them. Today, Gaia ingests data from more than 70 sources and has already published 200 gold-standard data products, delivered by robust, observable pipelines and data contracts. This growth in both volume and variety meant workloads had to be run more efficiently, requiring infrastructure that could flex with demand. Databricks compute eliminated the overhead of manual resource management, so teams no longer had to think about provisioning or scaling — compute was simply available when they needed it. Just as importantly, it powered the applications and BI dashboards that downstream users depended on. Syngenta has already moved 95% of these workloads onto Databricks serverless, further reducing infrastructure overheads and optimizing costs, while giving teams on-demand access to compute at scale. Scientists and project managers — many of whom had limited background in SQL — could now interact directly with curated data products, reducing reliance on IT and accelerating routine decision-making. The shift also helped retire older bespoke tools and database licenses. 

To enhance accessibility, Syngenta began testing AI/BI Genie, Databricks’ conversational interface that enables users to ask data questions in natural language and receive instant insights. Combined with Databricks Apps, AI/BI Genie offers a path to embed conversational access directly into their data products. As Syngenta expanded their use of Databricks, Databricks Academy played a critical role in upskilling scientists and technical managers. This focus on user enablement and democratization was central to Gaia’s success, turning it into far more than “just a tech platform.”

Finally, Syngenta started exploring how to apply the same principles of governance, consistency and accessibility to their machine learning (ML) efforts. With model development historically split across platforms, like SageMaker, Vertex AI and Dataiku, the Syngenta team began testing Databricks MLflow and MLOps as a path to unification. Teams centralized model training, versioning, deployment and monitoring within Databricks to create a single, sustainable framework for managing machine learning. By unifying data and experimenting with AI/ML efficiently, Syngenta positioned their expansive teams to move faster, reduce costs and deliver more impactful crop protection solutions.

Empowering scientists to innovate without bottlenecks

For Syngenta, Gaia marked a turning point in how their R&D teams worked with data. What had once been siloed, slow and fragmented became accessible, scalable, robust, governed, standardized and actionable, and thus AI-ready, across the organization. As a result, the time to data access dropped from as long as nine months to just a few minutes, while overall time to value was reduced by 70%. Data engineering costs fell by half — thanks to standardized pipelines, fewer standalone tools and the retirement of legacy databases and licenses, which brought additional savings. Gaia now handles more than 2 million queries per month — averaging 70,000 queries every single day — as scientists, engineers and technical managers access insights in real time. Adoption has also spread quickly, supporting about 320 direct users across R&D, while powering more than 60 downstream systems and applications that reach over 1,200 data citizens across the business. Collectively, these changes trimmed thousands of days of manual engineering and analysis efforts.

Beyond the numbers, Gaia changed the way Syngenta’s staff approached their interactions and work with data. No longer reliant on IT for access, or having to prepare data in isolation, they could tap into a collaborative data environment, explore diverse data instantly and answer questions faster and with greater confidence — advancing a strong data-driven culture. These newfound capabilities and solutions accelerate development cycles and increase the quality of innovation by linking real-world observations and usage patterns with molecular-level information.

“With Gaia powered by Databricks, we have radically liberated and democratized access to our treasure trove of data across R&D and boosted significantly the data culture of ownership, trust and sharing in the organization. Gaia’s unprecedented scale and adoption has already shown tangible benefits in the quality and speed of innovation that ultimately strengthens our pipeline of differentiated products,” George Papadatos, Global Head of Data Strategy at Syngenta and the driving force behind Gaia, emphasized.

Gaia’s impact has also reached beyond R&D, scaling the Data Mesh into the enterprise. The envisioned federated architecture ensures that every business unit can build, manage and share data products tailored to their domain while still conforming to shared governance and standards, thus reducing time to value and increasing the quality and reliability of the data.

Looking toward the future, Syngenta plans to take the same approach into more unstructured data, such as documents and knowledge with an equivalent platform that will ingest regulatory, safety and project documents, using GenAI to summarize content, pull out key details and enable simple question-and-answer searches. “With Databricks as the foundation, we’ve transformed crop research and development into a faster, more collaborative and data-driven process — empowering our scientists to bring new sustainable crop protection solutions to market more quickly,” Thomas Jung, Chief Data Officer at Syngenta, said. With Databricks at the core, Syngenta moved from chasing data to driving discovery, giving R&D the speed and confidence needed to tackle the challenges of global food security.