Skip to main content

Accelerating sustainable transportation into the future


reduction in energy consumption


in data infrastructure cost savings

INDUSTRY: Transportation

“Databricks Lakehouse has allowed us to establish a single source of truth and to more easily uncover opportunities to improve the efficiency of our transportation infrastructure to significantly reduce our impact on the environment.”

— Rémi Bihouis, ITNOVEM, IT partner to SNCF

SNCF has a mission to deliver seamless, sustainable passenger and freight transportation with a focus on reducing carbon emissions and leveraging renewable energies to power the future of mobility. The organization’s ultimate goal is to cut CO₂ emissions by 30% by 2030 (compared to 2015) and to reach Net Zero by 2050. However, they struggled to achieve their goals, as legacy systems were incapable of processing massive volumes of IoT data generated by their trains and related electrical equipment. Data silos further slowed collaboration and impacted their ability to deliver timely insights to the business. With Databricks Lakehouse, they can now unify their data and accelerate data processing for business analytics and reporting. With these insights, SNCF can provide an accurate picture of train energy usage and advise management on ways to decrease consumption, with the vision of meeting its energy reduction targets and lowering costs.

Tackling large data volumes and siloed operations

In an organization the size and scale of SNCF, with revenues of €33.5 billion, 270,000 global employees and a 30,000km network of railway lines, containment of costs is crucial at every level. The company is also committed to ambitious ESG targets, with sustainability a key priority for each member of the group. In 2017, SNCF began a project named EMS (Energy Management Solution), designed to monitor and optimize energy consumption on its train network.

With 10,000 to 15,000 trains circulating each day, SNCF’s data systems are faced with ingesting huge volumes of data from onboard train sensors, comprising approximately 90 million rows per month. With the organization’s legacy platforms, processing was often complex and time-consuming, and data silos meant minimal collaboration between teams. Key pain points included deployment and maintenance challenges. SNCF’s cloud architects identified the need to facilitate more efficient access for all data users, including architects, cloud ops, data engineers and analysts, and to provide a common solution to ingest, transform and load data from a range of different sources. There was also a requirement to ensure robust data pipelines that could be easily scaled.

Lakehouse improves data management and performance

To meet these demands, SNCF adopted a lakehouse architecture on Databricks to provide a common data layer for their data processing and BI needs. The Lakehouse has simplified SNCF’s data architecture and enables a single source of data to be accessed by both engineers and analysts, increasing efficiency and lowering costs as a result. SNCF found the migration to the Databricks Lakehouse fast and efficient. It simplified their analytical architecture by ensuring the BI layer is seamlessly integrated into the workflow. And the centralized data layer along with collaboration capabilities enhanced operational efficiency and team productivity. The implementation of the Lakehouse has allowed data to be easily shared and analyzed with Power BI and has avoided the need to duplicate data or an intermediate presentation layer. Communication and collaboration between teams have improved considerably, with the ability to work together seamlessly on the same platform via collaborative notebooks.

Thanks to the simplified cloud-based architecture, data is stored in Delta Lake, where data can be ingested, transformed, enriched and exposed from a single, unified view. Delta Lake meets both big data and BI needs and removes the necessity for duplication — therefore increasing efficiency of data management, reducing the number of components, lowering costs and accelerating the development of new use cases. Data teams can also be confident that the data is always up to date. Rather than refreshing data once or twice a month, data is now updated twice a week. With a single source of truth, data is always current, and replication is no longer required. Furthermore, Delta Lake offers SNCF direct SQL access to the data, without the need for intermediary databases.

Performance and scalability are also essential to SNCF’s data operations. Thanks to the simplicity of auto-scaling and cluster management capabilities in Databricks Lakehouse, data engineers at SNCF can now spin up compute clusters on demand to handle ETL workloads and meet the analytical needs of the business.

More efficient train operations for a better future

As a result of enhanced efficiencies brought by the implementation of the lakehouse architecture, the cost of SNCF’s data infrastructure to support their energy management project was reduced by 30%. Integration with legacy systems was smooth, with no disruption to existing processes, and no loss of data quality. Furthermore, development time has been considerably lowered, especially in terms of ops time, with less time required on data engineering and cloud ops — in some cases, a decrease of 50%. Indeed, Databricks increases the speed of deployment, with development to receipt to production in just one click.

Aymeric François, Data Analyst at SNCF Voyageurs, said, “With 14 million passengers traveling daily, and up to 15,000 trains operating every 24 hours, SNCF required a robust and scalable solution that would handle enormous volumes of data, significantly increase productivity and strengthen the performance of our existing architecture.”

Thanks to Databricks Lakehouse, SNCF is able to precisely measure energy consumption for every five minutes of every train in operation, derived from data transmitted from onboard trackers. This information is aggregated with estimated energy usage figures from trains without trackers, and detailed visual reports can be shared with key stakeholders across the group. This data not only enables the individual businesses to gauge their energy consumption but ensures that SNCF can share the costs accurately. SNCF is also able to use this intelligence to identify factors that can reduce energy wastage — such as shutting down trains at night, or modifying driving patterns. To date, they have already reduced unnecessary energy consumption by 10%.

Aymeric François added, “With a current annual energy bill of around €0.3 billion and 6 TWh consumption equivalent to one nuclear power plant per year, the introduction of Databricks is already supporting our goals to deliver much-needed, precise energy utilization reports across the business. It will also certainly help us identify ways to manage and improve our network’s usage and ultimately accomplish our sustainability targets over the coming years.”
In the near future, one of SNCF’s key goals is to introduce Databricks SQL to optimize code and facilitate even more efficient collaboration between engineers and analysts. This platform will allow them to work concurrently with the same data, provide a daily refresh for reporting, increase analytics performance and help reduce costs further.