Sustainability & Resource Efficiency Reference Architecture for Manufacturing
This architecture helps you understand manufacturing sustainability (carbon footprint) and resource efficiency (electricity, water, chemicals) for your operations while improving profits.

Data and platform flows
- Factories, energy generation facilities, and data centers aim to integrate data from multiple sources into the Lakehouse to understand resource consumption such as electricity, water, and materials. Large volumes of streaming event data from sources such as infrastructure IoT systems or SCADA systems can be processed through standard services like Kafka and Event Grid, or loaded directly into Databricks via Zerobus. In either case, Structured Streaming or Lakeflow Spark Declarative Pipelines are leveraged to incrementally ingest this data into Bronze tables, delivering industry-leading TCO, performance, and scale. Other operational data, such as maintenance records or regulatory documents from ERP systems, can be ingested via Lakeflow Connect, while SAP data can be exposed via the SAP BDC Connector for Databricks. To augment or complement proprietary data, Databricks Marketplace is used to access third-party data sources such as weather and public financial data, and a large partner ecosystem is available for leveraging third-party integration tools and connectors to other common enterprise software.
- As data is ingested from different sources, the Medallion Architecture is used to incrementally and progressively improve the structure and quality of the data. The raw formats and corresponding metadata land in the bronze layer to maintain a historical archive of source, which is especially relevant for telemetry or IoT streaming data. Lakeflow Spark Declarative Pipelines are implemented to clean, merge, and model data with additional logic, such as telemetry resampling and interpolation. Data progresses to the silver layer, which generally represents clean, transactional single-source datasets. Finally, a gold layer can be developed to join datasets together, aggregate data along key dimensions, and calculate important sustainability metrics like Power Usage Efficiency or Consumption Capacity. The gold layer simplifies reporting and speeds up time to insights, enabling key analytical solutions like KPI monitoring or ESG benchmarking.
- With data that is now integrated and transformed, operations teams are positioned with a unified and holistic view of the information needed to run advanced analytics. Here, Databricks SQL is leveraged to monitor resource consumption across multiple facilities and lines while performing ESG benchmarking against peer firms, and AI and machine learning is leveraged to incorporate predictive models for equipment predictive maintenance and energy demand forecasting. These AI and ML models benefit heavily from having access to the clean and trusted Lakehouse data made possible by the upstream steps. Without this, the quality and consistency of predictions would not be as robust.
- Operations teams now aim to serve insights to various stakeholders. Importantly, the Databricks platform is welcoming of all personas, whether they are technical or not. Databricks AI/BI dashboards can be built by SQL experts or business users with natural language to visualize KPIs against energy efficiency or sustainability targets, and Genie Spaces enable end users to interact with the data in natural language. Databricks Apps are built to monitor carbon emissions to evaluate sustainability goals, or to monitor energy usage to reduce resource wastage, providing a customizable front end for any persona or collaborator to take advantage of the power of the Lakehouse. Using Agent Bricks, stakeholders of all technical skill levels can build agents to serve their needs, for example to retrieve up-to-date and timely information on changing regulatory requirements, or to automate preemptive maintenance schedules.
