John Deere is a leading manufacturer of agricultural, construction and forestry machinery, diesel engines, drivetrains for a variety of applications ranging from lawn care to heavy equipment. The company collects large transient engineering datasets from John Deere test vehicles in the field, and via telematic data-loggers. The goal is to leverage physics / empirical-based strategies / algorithms for predictive life cycles / damage on engine components. This technology has allowed our organization to do very little re-work of our algorithms which were based in a MATLAB environment (including all the added functionality that MATLAB has robustly built in), so that the algorithms / models execute efficiently and accurately, on duty cycles that may have never been originally defined with engine dynamometer test cells.
An engine engineer can now spin up Spark-enabled parallel compute environments on-demand automatically, to analyze data coming in from around the world. This is an extraordinary capability that allows all domain specialized engineers, not formally trained as data scientists, to apply their understanding of engineering problems successfully and easily. The heavy-lifting needed to enable large data-processing on-demand at the cloud level is drastically simplified. Overall, this may help establish a more well-rounded data science community of engineers. This talk will discuss the challenges and solutions in working with engineering data and applying physics and statistics approaches to tackling our analysis needs.
– Hi, my name’s Meaghan Kosmatka. I work for John Deere and I will be presenting with our Arvind Hosagrahara from MathWorks today.
We’ll be talking about enabling physics and empirical-based algorithms with Spark using MATLAB in Databricks.
So I’m a Senior Engineer at John Deere Power Systems within R&D. I have background in diesel engine calibration tuning and performance. I also have background in embedded controls, tests and methodology development, and I currently work in engine damage modeling based on mechanical fatigue and wear development and implementation. – Hi, and my name is Arvind Hosagrahara. I’m the Chief Solutions Architect for the MathWorks, and the technical lead of a small team of engineers that focuses on getting our technology, working with Spark and working with the rest of the software stack. Now with experience in helping clients like Meaghan, be able to solve their technical problems.
– So we’ll be talking about, can’t see the slides.
First we’ll be talking about an introduction to John Deere and MathWorks, and then we’ll talk about the objective and problem statements of our project. We’ll give an introduction to predictive damage modeling. We’ll talk about the requirements and the challenges within this project. And we’ll also go into solutions and a demo as pertaining to our project, and we’ll go over some conclusions we have.
So John Deere is a company that’s been around since 1837, and we’re a world leader in providing advanced products and services for those who are committed and linked to the land. Those who cultivate, harvest, transform, enrich and build upon the land, to create the, to meet the needs of the world, the dramatically increasing need for food, fuel, shelter and infrastructure. We’re a leading solutions provider of heavy duty off-road equipment within agricultural, construction and forestry machinery. And John Deere power systems in manufacturing manufacturers its own heavy duty diesel engines and drivetrains for internal and external applications. A component of our fundamental design is our strong desire for reliability and durability, as well as our utmost performance of our customer, which is rooted in our deep understanding of our customer usage cycles. Using physics and mechanical fatigue-based damage models, we can understand the durability of our manufactured product.
– The MathWorks is the maker of MATLAB and Simulink, are flagship products for expressing technical computing workflows, MATLAB being the textural environment and Simulink being the graphical environment for people to build their models, to perform the data analysis and to be able to express their analytics in our products and allow them to stay at a level of abstraction with domain specific toolboxes. – John Deere is a multi-faceted company with diesel engines specifically engineered into our equipment from all sizes to all varieties.
We’re most well known for our large row crop tractors and harvesting equipment. But John Deere makes equipment all the way down from lawn and garden equipment to compact and utility tractors. We also are a big player in construction and forestry and John Deere power systems as a diesel engine company has a big presence within OEM, engine manufacturing. And we develop engines such that they go in large yachts, military defense equipment, and commercial and industrial generators.
So now I’ll talk about our application in our project.
We’ll talk about our problem statement first. At a business level, John Deere is seeking to understand the product usage and design the best-in-class uptime and durability and performance. We’re also seeking to improve damage model calculations by increasing agility of the development, migrating onboard predictive maintenance models to the Cloud. At a technical level, we wanna reduce the time to execution of these damaged models, alleviate the demand on the scarce compute resources within the vehicle, enable deep discovery of the data, apply insight into the design and development process, and maximize efficiency by leveraging Cloud Compute on telematic fleet data.
So I wanna talk a little bit about the two concepts of onboard versus offboard damage modeling, and then I’ll get into damage modeling.
Onboard is a concept of basically taking physical conditions and calculating a controller action using an embedded controller and performing and putting that into the embedded controls to calculate and give out the best performance and emissions. At that embedded controller, what you can also do is calculate damage models to do two things. One, diagnostic crosschecks, and two, predictive maintenance or prognostics. We would calculate damage accumulation and damage increment of a specific component on the engine. We’ll then send that data backup to the cloud and the vehicle to alert the vehicle or the customer to alert them that they may wanna come in for maintenance. There are a couple of issues with this model though. One is the rigorous embedded software development process that drives a longer time to insight. The good things about the software being deep process slows down how we develop our components within the vehicle.
The second is that we are limited to physical resources within the embedded controller, which limits the number of damage models we can put on the controller at once. And the third is the loss of the physical transient data like engine speed and temperatures and pressures that can tell us so much about that customer usage cycles, so that we can develop good product that we know will suit the customer’s needs. The next is offboard damage modeling. This is a simple concept of taking a telematic data logger and collecting data passively and putting that into a cloud plus compute environment.
At that Cloud Compute environment, we would then execute our damage models there, which helps us exponentially speed up the time to insight, alleviate stress on the controller, and have more damage models and gain deeper insight from transient logged data, so that we can rebuild those cycles within our dyno labs.
So let’s talk about engine damage modeling, the route of predictive maintenance. So damage is generally defined as 1/N where N is the number of cycles or hours to failure as pertaining to the SN curve. An SN curve is a stressor versus number of cycle curves. And damage modeling is an incremental analysis of stress at each event or cycle, which then can be summed using Miner’s rule. On the right hand side, you’ll see a failed bellows, which is an alignment with the fatigue based damage model strategy.
So to do fatigue calculations, you must understand the two concepts of fatigue cycling and rainflow counting.
Many fatigue based damage models are not based on time or events at a given level, rather they’re based on sequence in which the events occurred. Materials have memory. The best way that I know how to describe this phenomenon is with the two paperclip examples. One at time at load level and the other at cyclical load. On the left side, you’ll see that I pick up a paperclip and I unwind it a total of 1080 degrees, putting a very high maximum amount of stress on this paperclip. However, I’ll set it down and it should never fail. On the right side you’ll see that I pick up another paperclip, and I never unwind it nearly as high as 1080 degrees, never putting that high maximum amount of stress on the paperclip. What you’ll notice is that I cycled the paperclip causing a fatigue effect. This is alternating stress, and this eventually will fracture the paperclip. As you can see, it’s starting to bend and fracture, and eventually I will be able to fail it and pull it apart, which will equal a damage equal to one failure.
So let’s talk about a real world predictive maintenance model feature. So a really cool one that I like to talk about and a very simple one at the same time is turbo speed, turbo wheel failure, or turbo LCF. For those that don’t know, a turbo basically supplies compressed air to engine combustion cylinder, which increases efficiency of the engine and also increases the power. So if we have a failed compressor turbo wheel, we’ll have loss of compression and we’ll also have loss of power. So let’s take a farmer out in the field who only has a very finite window in fall to harvest his crops. For those that don’t know, many farmers nowadays actually harvest 24/7. And they can’t actually tolerate any downtime when they’re harvesting, because if they don’t, if they’re not able to get all that time, they may not be able to get all their crop out of the field and they risk losing money from lost crop. So if we can alert the customer of that happening before, they can get to the dealer and maintain their vehicle before that ever happens. So how do we do predictive maintenance? There are five main steps. The first is using a physics-based first principle equation. As you’ll see on the right, we have turbo wheel stress. The values that go into this are a coefficient, a geometric wheel diameter and the transient input of turbo speed.
We then would calculate virtual stress or strain with those log telematic data input using the input of turbo speed and calculate the output of turbo stress over a drive cycle. We then would rainflow count each of these stress strain cycles or sequences, such that we could solve for the damage increment in the drive-cycle.
After this, we would then calculate the damage accumulation.
And as you’ll see on the right, what you’ll see is over one drive-cycle, that’s about 45 minutes long, we have about a little less than 400 sequences or cycles occurring. What you’ll see is the damage ratcheting up and the remaining useful life ratcheting down. Eventually over time, the damage will equal one which will result in a failed turbo wheel or remaining useful life of zero, meaning no more useful life of that turbo wheel.
Now I’ll talk about the challenges we’re in requirements of this project. The complex physical phenomenon of these models are very hard to model using purely reactive statistics. We’d like to apply the proven statistical techniques to the data. And we know we’d also like to keep the domain expert focused on their line of expertise. A lot of engineers and scientists already use MATLAB and Simulink for their great toolboxes. We don’t wanna have them recode everything without and learn a different language. So we wanna keep them within their domain of MATLAB and Simulink. The third is currently engineers and scientists have little to no control over the upstream data engineering transforms. And we wanna enable self service solutions for the ETL needs in the existing infrastructure.
The fourth is that we’d like to isolate the data and the most damaging portions of the cycle, such that we can explore and reintegrate those conditions back into our testing and field tests, such that we can create the most durable product. So we wanna enable interactive exploration of the data back in MATLAB. And the fifth is handling the time dependent or state-based calculations within Spark. We wanna enable the push down of the MATLAB analytics like rainflow counting at a gross parallelization level, like the vehicle on the cluster.
So let’s talk about the big picture. First, I’ll talk about the data, then I’ll talk about the architecture of the data pipeline, and then I’ll get into the specific ETL process. So currently, we have terabytes of data growing exponentially year over year, and we’re using an EDL or an Enterprise Data Lake, and specifically we use AWS S3 buckets. Our data arrives encrypted and in proprietary file formats, and we have a diverse data files types. We have structured data, hierarchical metadata, and unstructured log data.
So I’ll talk about the architecture of our data pipeline. First, I wanna talk about the first step in creating the damage model analytics using those great MATLAB and Simulink toolboxes that were robustly engineered for engineers and scientists.
Once we do this, we can compile the code, the data algorithms as well as ETL processes that make the data analysis ready using a MATLAB SDK compiler.
We would then create Java or .java files such that they could be used on a databricks MATLAB provision cluster. The next step is to actually flow the data in and then compute on it. First, we take the data from a telematic data logger and send it to a cell network where it then lands on an On-Prem File Server. From that On-Prem File Server, we use Apache NiFi as our ETL process. At Apache NiFi, we employ a MATLAB production server client call, which holds our ETL algorithm that we developed within MATLAB, decoding, cleaning, and enriching our data. After we do this, we ingest the data into the EDL, such that it can be mounted on a Databricks file server.
From this Databricks file server, we can now run our MATLAB analytics for damage modeling and predictive maintenance on a Databricks cluster that is provisioned to run MATLAB code.
Now we can calculate vehicle damage at scale for our test fleet vehicles or customer vehicle population.
Now, I wanna specifically talk about our ETL process or Extract Transform Load process of the data.
The data can be taken either from EDL or AWS S3 raw bucket or the On-Prem server. And then within NiFi, the MATLAB production server client call would run a proprietary decoder to decrypt the contents into a sterile environment, then process and enrich the log metadata and the raw text contents of the input files, the transactional transient data.
And then we would cleanse the data and perform data anomaly detection. After that, we would create the tabular structured analysis-ready data that can be used within Databricks or Delta Lake, and we would output it to a parquet to our EDL, such that it could be mounted onto the DBFS or the Databricks file server for usage within the Cloud Compute environment.
Now, I’ll turn it over to Arvind for a quick demonstration.
– This is a quick demonstration of the workflows to leverage Databricks from MATLAB. Creating a shared Spark session in MATLAB would allow users to specify Spark properties and build a shared Spark context that leverages Databricks connect. To connect to Databricks cluster running on the Cloud and this handle to the Spark session can be used to point to load data from the Cloud-based storage and slice it and dice it from within the MATLAB environment. Once the data has loaded, MATLAB offers a lot of features to allow users to explore their data if necessary, inspecting it through the rich visualization available in MATLAB and focusing on areas of interest in their stored data, as well as a lot of functions that are designed to simplify, cleansing, and fixing the data integrity issues that is required to make the data ready for analysis. In this particular case, our turbocharger had missing values that we use the fill missing function to fill in, allowing us to perform cycle counting.
To develop a predictive model, we leverage domain specific toolboxes, in this case, the predictive maintenance toolbox, and rainflow function to be able to quickly visualize the range and averages of our cycles in our given a data set. This allows us to very quickly apply models to the data. In this case, the simplified model that uses the rainflow function in the break to maintenance toolbox allows us to measure increments of damage as it builds up through the drive-cycle. The model is developed in MATLAB can be pushed down to Databricks Spark using the MATLAB compiler SDK. The resulting Java file from this exercise can be used on Databricks either as a library or as a spark-submit job. In this particular example, we compile it into a spark-submit job. And show how you could use the data engineering features to quickly stage the job for execution on Databricks Spark. This particular Java file is pushed up to the cluster at which point the user can declaratively define his cluster infrastructure, specifying the scale of how many Spark workers he needs. Enabling the MATLAB run time will make this a MATLAB capable cluster. Once you have a MATLAB capable cluster, that cluster definition which contains the necessary configuration to execute the job can be attached to a job object, giving it a name.
By adjusting the properties of the job, one can attach the cluster definition to the job as well as other features of the jobs such as enabling job status notifications on start, success, and failure, and specifying what to run as a part of the job. In this case, pointing it to our Java file. The ability to set the task allows us to trigger this as a part of our execution. And finally, it’s possible to control the job itself by creating the job on the Databricks system, executing it and refreshing details about the job. On completion, the results can be fetched from the output location and visualized in MATLAB.
And now let’s take a quick look at the tooling and the workflows for expressing what Meaghan was discussing.
For starters, it’s possible to elastically provision MATLAB based compute on Spark, that has access to the underlying cloud data storage layers and allows users such as Meaghan to perform data engineering automation by creating jobs and scheduling them or kicking them off on demand, working with the security layers that Databricks offers.
The first step of her journey starts with the impact of exploration of her data, where she can get a selection of her data for analysis, slice it, dice it, bring it into MATLAB for efficient exploration of the data that are sitting on the Cloud. This gives her access to forensic portions of her data, isolating sections of interest and enables the large Addie of MATLAB domain specific toolboxes and apps to write less and do more.
The last part of the workflow would be to take this compute that is being designed and push it down to run on the cluster.
And this can be done either as Sparks of the jobs or its libraries that run on existing clusters, enabling MATLAB users, multiple semantics to be able to execute that analytics at scale on a Spark Cluster. – And so we have a couple of conclusions that we’ve come to within this project. Through all better, is that
we have come to better insight with the product usage in a fraction of the time by leveraging the best-in-class tooling, Databricks and MathhWork products. Domain specialized engineers can now use their existing code and models within a Spark environment with little to no rework. And engineers and scientists can self-serve their upstream ETL infrastructure needs to suit their analytics requirements. And we get the same great insight at scale. So thanks very much for joining our presentation.
Deere & Company
Meaghan is a Senior Engineer at John Deere Power Systems within the Applied Mechanics group and is currently working on development of high-fidelity mechanical damage/product life models and methodologies to further John Deere's™ understanding of product usage. Meaghan has a vast knowledge of diesel engine development having held positions in engine controller software development, engine and aftertreatment calibration, base engine development and product verification/validation. With her multi-faceted skills, she has helped John Deere to continually increase their capabilities allowing the company to develop a deep understanding of their customer usage and develop robust solutions to meet their needs.
Arvind Hosagrahara leads a team that helps organizations deploy MATLAB algorithms in critical engineering applications, with a focus on integrating MATLAB into the enterprise IT/OT systems. Arvind has extensive hands-on experience developing MATLAB and Simulink applications and integrating them with external technologies. He has helped design the software and workflow for a variety of production applications focusing on robustness, security, scalability, maintainability, usability, and forward compatibility across automotive, energy and production, finance and other industries.