PRODUCT
SPOTLIGHT: Spark Declarative Pipelines

Honeywell selects Spark Declarative Pipelines for streaming data

Product descriptions:

Delta Lake Lakeflow Connect Lakeflow Spark Declarative Pipelines

Companies are under growing pressure to reduce energy use, while at the same time they are looking to lower costs and improve efficiency. Honeywell delivers industry-specific solutions that include aerospace products and services, control technologies for buildings and industry, and performance materials globally. Honeywell’s Energy and Environmental Solutions division uses IoT sensors and other technologies to help businesses worldwide manage energy demand, reduce energy consumption and carbon emissions, optimize indoor air quality, and improve occupant well-being. Accomplishing this requires Honeywell to collect vast amounts of data. Using Spark Declarative Pipelines on the Databricks Data Intelligence Platform, Honeywell’s data team can now ingest billions of rows of sensor data into Delta Lake and automatically build SQL endpoints for real-time queries and multilayer insights into data at scale — helping Honeywell improve how it manages data and extract more value from it, both for itself and for its customers.

Processing billions of IoT data points per day

Honeywell’s solutions and services are used in millions of buildings around the world. Helping its customers create buildings that are safe, more sustainable and productive can require thousands of sensors per building. Those sensors monitor key factors such as temperature, pressure, humidity and air quality. In addition to the data collected by sensors inside a building, data is also collected from outside, such as weather and pollution data. Another data set consists of information about the buildings themselves — such as building type, ownership, floor plan, square footage of each floor and square footage of each room. That data set is combined with the two disparate data streams, adding up to a lot of data across multiple structured and unstructured formats, including images and video streams, telemetry data, event data, etc. At peaks, Honeywell ingests anywhere between 200 to 1,000 events per second for any building, which equates to billions of data points per day. Honeywell’s existing data infrastructure was challenged to meet such demand. It also made it difficult for Honeywell’s data team to query and visualize its disparate data so it could provide customers with fast, high-quality information and analysis.

ETL simplified: high-quality, reusable data pipelines

With Spark Declarative Pipelines on the Databricks Data Intelligence Platform, Honeywell’s data team can now ingest billions of rows of sensor data into Delta Lake and automatically build SQL endpoints for real-time queries and multilayer insights into data at scale. “We didn’t have to do anything to get Spark Declarative Pipelines to scale,” says Dr. Chris Inkpen, Global Solutions Architect at Honeywell Energy and Environmental Solutions. “We give the system more data, and it copes. Out of the box, it’s given us the confidence that it will handle whatever we throw at it.”

Honeywell credits the Databricks Data Intelligence Platform for helping it unify its vast and varied data — batch, streaming, structured and unstructured — into one platform. “We have many different data types. The Databricks Data Intelligence Platform allows us to use things like Apache Kafka and Auto Loader to load and process multiple types of data and treat everything as a stream of data, which is awesome. Once we’ve got structured data from unstructured data, we can write standardized pipelines.”

Honeywell data engineers can now build and leverage their own ETL pipelines with Spark Declarative Pipelines and gain insights and analytics quickly. ETL pipelines can be reused regardless of the environment, and data can run in batches or streams. It’s also helped Honeywell’s data team transition from a small team to a larger team. “When we wrote our first few pipelines before Spark Declarative Pipelines existed, only one person could work in one part of the functionality. Now that we’ve got Spark Declarative Pipelines and the ability to have folders with common functionality, we’ve got a really good platform where we can easily spin off different pipelines.”

Spark Declarative Pipelines also helped Honeywell establish standard log files to monitor and cost-justify its product pipelines. “Utilizing Spark Declarative Pipelines, we can analyze which parts of our pipeline need optimization,” says Inkpen. “With standard pipelines, that was much more chaotic.”

Enabling ease, simplicity and scalability across the infrastructure

Spark Declarative Pipelines has helped Honeywell’s data team consistently query complex data while offering simplicity of scale. It also enables end-to-end data visualization of Honeywell’s data streams as they flow into its infrastructure, are transformed, and then flow out. “Ninety percent of our ETL is now captured in diagrams, so that’s helped considerably and improves data governance. Spark Declarative Pipelines encourages — and almost enforces — good design,” says Inkpen.

Using the lakehouse as a shared workspace has helped promote teamwork and collaboration at Honeywell. “The team collaborates beautifully now, working together every day to divvy up the pipeline into their own stories and workloads,” says Inkpen.

Meanwhile, the ability to manage streaming data with low latency and better throughput has improved accuracy and reduced costs. “Once we’ve designed something using Spark Declarative Pipelines, we’re pretty safe from scalability issues — certainly a hundred times better than if we hadn’t written it in Spark Declarative Pipelines,” says Inkpen. “We can then go back and look at how we can take a traditional job and make it more performant and less costly. We’re in a much better position to try and do that from Spark Declarative Pipelines.”

Using Databricks and Spark Declarative Pipelines also helps the Honeywell team perform with greater agility, which allows them to innovate faster while empowering developers to respond to user requirements almost immediately. “Our previous architecture made it impossible to know what bottlenecks we had and what we needed to scale. Now we can do data science in near real-time.”

Ultimately, Honeywell can now more quickly provide its customers with the data and analysis they need to make their buildings more efficient, healthier and safer for occupants. “I’m continuously looking for ways to improve our lifecycles, time to market, and data quality,” says Inkpen. “Databricks helps us pull together many different data sources, do aggregations, and bring the significant amount of data we collect from our buildings under control so we can provide customers value.”

Honeywell selects Spark Declarative Pipelines for streaming data

Processing billions of IoT data points per day

ETL simplified: high-quality, reusable data pipelines

Enabling ease, simplicity and scalability across the infrastructure

Ready to get started?