Honeywell selects Delta Live Tables for streaming data
Processing billions of IoT data points per day
Honeywell’s solutions and services are used in millions of buildings around the world. Helping its customers create buildings that are safe, more sustainable and productive can require thousands of sensors per building. Those sensors monitor key factors such as temperature, pressure, humidity and air quality. In addition to the data collected by sensors inside a building, data is also collected from outside, such as weather and pollution data. Another data set consists of information about the buildings themselves — such as building type, ownership, floor plan, square footage of each floor and square footage of each room. That data set is combined with the two disparate data streams, adding up to a lot of data across multiple structured and unstructured formats, including images and video streams, telemetry data, event data, etc. At peaks, Honeywell ingests anywhere between 200 to 1,000 events per second for any building, which equates to billions of data points per day. Honeywell’s existing data infrastructure was challenged to meet such demand. It also made it difficult for Honeywell’s data team to query and visualize its disparate data so it could provide customers with fast, high-quality information and analysis.
ETL simplified: high-quality, reusable data pipelines
With Delta Live Tables (DLT) on the Databricks Data Intelligence Platform, Honeywell’s data team can now ingest billions of rows of sensor data into Delta Lake and automatically build SQL endpoints for real-time queries and multilayer insights into data at scale. “We didn’t have to do anything to get DLT to scale,” says Dr. Chris Inkpen, Global Solutions Architect at Honeywell Energy and Environmental Solutions. “We give the system more data, and it copes. Out of the box, it’s given us the confidence that it will handle whatever we throw at it.”
Honeywell credits the Databricks Data Intelligence Platform for helping it unify its vast and varied data — batch, streaming, structured and unstructured — into one platform. “We have many different data types. The Databricks Data Intelligence Platform allows us to use things like Apache Kafka and Auto Loader to load and process multiple types of data and treat everything as a stream of data, which is awesome. Once we’ve got structured data from unstructured data, we can write standardized pipelines.”
Honeywell data engineers can now build and leverage their own ETL pipelines with Delta Live Tables and gain insights and analytics quickly. ETL pipelines can be reused regardless of the environment, and data can run in batches or streams. It’s also helped Honeywell’s data team transition from a small team to a larger team. “When we wrote our first few pipelines before DLT existed, only one person could work in one part of the functionality. Now that we’ve got DLT and the ability to have folders with common functionality, we’ve got a really good platform where we can easily spin off different pipelines.”
DLT also helped Honeywell establish standard log files to monitor and cost-justify its product pipelines. “Utilizing DLT, we can analyze which parts of our pipeline need optimization,” says Inkpen. “With standard pipelines, that was much more chaotic.”
Enabling ease, simplicity and scalability across the infrastructure
Delta Live Tables has helped Honeywell’s data team consistently query complex data while offering simplicity of scale. It also enables end-to-end data visualization of Honeywell’s data streams as they flow into its infrastructure, are transformed, and then flow out. “Ninety percent of our ETL is now captured in diagrams, so that’s helped considerably and improves data governance. DLT encourages — and almost enforces — good design,” says Inkpen.
Using the lakehouse as a shared workspace has helped promote teamwork and collaboration at Honeywell. “The team collaborates beautifully now, working together every day to divvy up the pipeline into their own stories and workloads,” says Inkpen.
Meanwhile, the ability to manage streaming data with low latency and better throughput has improved accuracy and reduced costs. “Once we’ve designed something using DLT, we’re pretty safe from scalability issues — certainly a hundred times better than if we hadn’t written it in DLT,” says Inkpen. “We can then go back and look at how we can take a traditional job and make it more performant and less costly. We’re in a much better position to try and do that from DLT.”
Using Databricks and DLT also helps the Honeywell team perform with greater agility, which allows them to innovate faster while empowering developers to respond to user requirements almost immediately. “Our previous architecture made it impossible to know what bottlenecks we had and what we needed to scale. Now we can do data science in near real-time.”
Ultimately, Honeywell can now more quickly provide its customers with the data and analysis they need to make their buildings more efficient, healthier and safer for occupants. “I’m continuously looking for ways to improve our lifecycles, time to market, and data quality,” says Inkpen. “Databricks helps us pull together many different data sources, do aggregations, and bring the significant amount of data we collect from our buildings under control so we can provide customers value.”
Ready to get started? Learn more about Delta Live Tables here.