Platform blog

Modern Industrial IoT Analytics on Azure - Part 3

Customers Leverage Azure Databricks for Industrial IoT Analytics

August 20, 2020 in Product

Share this post

In part 2 of this three-part series on Azure data analytics for modern industrial internet of things (IIoT) applications, we ingested real-time IIoT data from field devices into Azure and performed complex time-series processing on Data Lake directly. In this post, we will leverage machine learning for predictive maintenance and to maximize the revenue of a wind turbine while minimizing the opportunity cost of downtime, thereby maximizing profit.

The end result of our model training and visualization will be a Power BI report shown below:

The end-to-end architecture is again shown below.

Machine Learning:  Power Output and Remaining Life Optimization

Optimizing the utility, lifetime, and operational efficiency of industrial assets like wind turbines has numerous revenue and cost benefits. The real-world challenge we explore in this article is maximizing the revenue of a wind turbine while minimizing the opportunity cost of downtime, thereby maximizing our net profit.

Net profit = Power generation revenue - Cost of added strain on equipment

If we push a turbine to a higher RPM, it will generate more energy and therefore more revenue. However, the added strain on the turbine will cause it to fail more often, introducing cost.

To solve this optimization problem, we will create two models:

  1. Predict the power generated of a turbine  given a set of operating conditions
  2. Predict the remaining life of a turbine given a set of operating conditions

We can then produce a profit curve to identify the optimal operating conditions that maximize power revenue while minimizing costs.

Using Azure Databricks with our Gold Delta tables, we will perform feature engineering to extract the fields of interest, train the two models, and finally deploy the models to Azure Machine Learning for hosting.

To calculate the remaining useful lifetime of each Wind Turbine, we can use our maintenance records that indicate when each asset is replaced.

-- Calculate the age of each turbine and the remaining life in days
WITH reading_dates AS (SELECT distinct date, deviceid FROM turbine_power),
	maintenance_dates AS (
	SELECT d.*, datediff(, as datediff_next, datediff(, as datediff_last 
	FROM reading_dates d LEFT JOIN turbine_maintenance nm ON (d.deviceid=nm.deviceid AND ))
SELECT date, deviceid, min(datediff_last) AS age, min(datediff_next) AS remaining_life
FROM maintenance_dates 
GROUP BY deviceid, date;

To predict power output at a six-hour time horizon, we calculate time series shifts using Spark window functions.

SELECT r.*, age, remaining_life,
	-- Calculate the power 6 hours ahead using Spark Windowing and build a feature_table to feed into our ML models
	LEAD(power, 6, power) OVER (PARTITION BY r.deviceid ORDER BY time_interval) as power_6_hours_ahead
FROM gold_readings r 
JOIN turbine_age a ON ( AND r.deviceid=a.deviceid)

  • Ingested real-time IIoT data from field devices into Azure
  • Performed complex time-series processing on Data Lake directly
  • Trained and deployed ML models to optimize the utilization of our Wind Turbine assets
  • Served the data to engineers for operational reporting and data analysts for analytical reporting

The key big data technology that ties everything together is Delta Lake. Delta on ADLS provides reliable streaming data pipelines and highly performant data science and analytics queries on massive volumes of time-series data. Lastly, it enables organizations to truly adopt a Lakehouse pattern by bringing best of breed Azure tools to a write-once, access-often data store.

What’s Next?

Try out the notebook hosted here, learn more about Azure Databricks with this 3-part training series and see how to create modern data architectures on Azure by attending this webinar.

Try Databricks for free

Related posts

See all Product posts