Using Apache Spark for Predicting Degrading and Failing Parts in Aviation

Download Slides

Throughout naval aviation, data lakes provide the raw material for generating insights into predictive maintenance and increasing readiness across many platforms. Successfully leveraging these data lakes can be technically challenging. However, the data they hold can inform maintenance decisions and help fleets improve readiness by revealing detectable conditions prior to component degradation and failure. Civilian and military aviation datasets are extremely large and heterogeneous. The authors are successfully using Spark to help overcome these challenges within ETL pipelines. Spark also facilitates ad-hoc and recurring reporting for aircraft component health checks at scale, which are created in collaboration with in-house engineering departments which flag recorded flights for known issues. Spark ML is used to flag anomalous data by fitting regression models to historical data and comparing model outputs to observed flights. Feature deviation from model output is measured for each new flight, and flights that appear to be anomalously out of expected ranges are flagged for human review.

Apache Spark has enabled a small team to handle a large volume of data spanning hundreds of schemas. The team has used Spark to parallelize aircraft component health scoring algorithms decreasing the running time of models to hours instead of days or weeks. Because of Spark’s speed and versatility, it has become a major component within an official reporting architecture, and has successfully flagged parts prior to failure. A few shortcomings of Spark have also been encountered, including data visualization that is still being performed in Pandas. The authors will discuss and elaborate on their team’s successful utilization of these tools, and future directions. Key Takeaways: -Civilian and military aviation data is difficult to work with due to volume and variety – Spark is specifically designed to tackle these issues – Spark is playing a major role in a small specialized team’s aviation reporting and analysis architecture undefined undefined undefined

Watch more Spark + AI sessions here
Try Databricks for free

Video Transcript

– Good morning everybody. My name is Matt. I’m here with my colleague Chris, we are data scientists at the knock TST enterprise research data science group at knock TST. That’s the nerds group at knock TST. In this presentation, we’re going to talk a little bit about who we are and what we do, and specifically about how we’re using Apache Spark to address component degradation and unscheduled maintenance in the context of aviation system. So here’s our standard disclaimer, we don’t speak for all of God or all of the Navy, please feel free to read this on your own time. So who are? we we’re a team of about 20 data scientists and engineers were based at knock TST. That’s a training command the Naval Aviation warfare centre Training Systems Division. They are in charge of procuring and maintaining naval aviation platforms training for naval aviation platforms. So think these big huge flight simulators. So there’s contracts All told, total about a billion dollars a year. So it’s largest modeling and simulation hub in the world. We’re based in Orlando, Florida. So we’re right next to the University of Central Florida, so largest undergraduate Metropolitan research university in the world. So that’s a great, great for our talent pipeline. So we take the best and brightest out of UCF. And we turn them into civil servants and contractor support personnel and our group, all line kind of the same goal. So Chris and I are both came out of that talent pipeline at UCF. Orlando, we’re also right next to Disney World and Universal Studios. Some of you may have heard of that. And we’re also just a quick drive from the Space Coast. So with the return to space on it, Cape Kennedy, with the corkscrew transport daily mission to its exciting place to be. So what are we working on down in Orlando? Right now we’re working on a tool called the morning health assessment and readiness to and it’s not just us this is a major collaboration, both in and out of the Navy, public, private and academic but heart is the research heart is a real time dashboard, showing the maintenance status of an entire fleet of aircraft. So specifically, we’re interested in understanding causes and being able to predict unscheduled maintenance actions, so not scheduled maintenance action, as opposed to schedule maintenance action is not something that is expected to occur. It’s, it’s not related to a component service life or scheduled inspection or anything like that. It’s kind of like, you know, if you were to imagine that particular valve is not responding the way that it should, somewhere in the aircraft not responding to upstream inputs. And that’s, you know, halfway through its expected service life, that’d be an example of an unscheduled maintenance action. So the heart dashboard is responsible for showing the indicators that our team has come up with and the other teams that we’re working with have come up with kind of showing how each aircraft in the fleet is doing along each of several metrics. So we try to roll that up into subsystems and into the ultimate airframe. And then we show that for the entire fleet, we’ll get a little bit more into how we do that. And how we do those roll up scores and what’s actually being calculated. little later in the presentation.

Developing HHART

So first we’re going to discuss what we’re going to actually present. So how we’re developing heart the process that goes into developing heart. It all starts with our subject matter experts are these the people that are the designers and the engineers and the maintainers on the aircraft responsible for making sure that it’s capable of executing the mission. Next, we’ll talk about how we set up our data pipelines to support our use cases, our ETL pipelines. Next we’ll talk about what kind of metrics we’re looking at to try to determine the health of an aircraft and the health of the subsystem and ultimately fuel down into the component level. Then we’ll discuss how we’re deploying So what kind of considerations Should we take into account when we’re considering where we’re going to deploy and how we’re going to deploy and how we’re going to actually put all this into motion. And finally, we’ll discuss how we incorporate user feedback into the development of the tool in terms both of the user interface and the user experience and the features that we’re creating. So making sure that you know people are still getting value from this tool, and that is being able to be used effectively when making the kinds of maintenance decisions.

Subject matter expert (SME)

So first of all, I want to kind of describe what our subject matter experts on what role they play in the process of subject matter experts, as I mentioned previously, our engineers, designers and maintainers. So here we’ve got three examples of how a subject matter and a subject matter expert might view on the platform types of platforms that we work on. So aviation related platforms. Here on the left, we’ve got an SSR 71 Blackbird kind of with the propulsion and fuel subsystems highlighted here. The call outs relate to mostly components and those two subsystems. In the middle you’ve got an example of the space transportation system. Some of you may know a better is the space shuttle. There’s a couple of call outs related to maybe airframe proportions and aerodynamics. And on the far right, we have a satellite hub and the call outs there are kind of higher level right so they’re kind of like the thing that assist a systems engineer might consider the large scale kind of work replaceable components. So this this would be more towards a, like an operations and sustainment style engineer might, this might be how they think about the system. Um, so this is just a few of the the different types of subject matter experts that are responsible really for kind of becoming the data translators for us as the data scientists and the work that we do. So, to kind of get a little bit into how that happens, really the key ingredient is collaboration, collaboration among the data scientists and subject matter experts as well as leadership and other components that you can see here. What it really takes to make this a success is for all sides to kind of have a generalist style approach to, to to developing this kind of this kind of tool, everyone kind of has to be willing to learn from one another. And specifically, you have to be willing to learn both from inside and outside of your team. So communication with both inside and outside of your team is really key ingredients. So once you start to get engaged with those other external entities, you can kind of start to see how the pieces fit together and better understand Which component in which subsystem you’re trying to analyse. And ultimately, you, you start to educate both inside and outside of your team. You start trying to learn from people and sharing what you’ve learned, and all that contributes to this foundation of trust this culture of trust. So it’s really important that you that you have that key ingredient there. And why is that important? Well, say subject matter expert, when it comes to me as a data scientist and ask me a question like, Hey, you know, whenever we have an aircraft that’s, you know, at maybe some number of thousands of feet, and the pilot kind of slams the throttle forward and pulls back and then slams forward again. And then there’s a barrel roll and loop de loop. And, you know, does the does my subsystem perform the way it should, so maybe a fuel systems engineer might look at that go was the is the fuel system is still able to adequately deliver, you know, proper fuel pressure and amounts throughout the aircraft. And so as a data scientist, I’m going to try to identify the condition and the fleet. I’m going to try to look through all my data and try to find when that happens, then I’m going Compare the fuel system behavior to whatever’s expected except that I’m not an engineer, I’m not a, I’m not a subject matter expert, I don’t necessarily know, you know, to start off how that system is supposed to perform. So I’m going to reach back out to my network of subject matter experts and reference, they’re going to point me probably to some requirements, documents or case studies or, you know, just things that they, as the experts have kind of Incorporated. So without trust, they’re not going to come to me and I’m not going to get to learn that information. And they might not have their questions answered. So it’s really a huge effort truly working with with industry and other public partners about God and private partners as well and academic, it’s really a big effort to try to come together and make this happen. So where the rubber really starts to meet the road here is when we when we say okay, we’re going to do predictive maintenance on an aircraft. Well, it’s not quite that simple. You really have to break it down into its constituent parts. First, you need to to look at that and say, Okay, I’m gonna break it down into subsystems. So here we have a sample work breakdown for space shuttle, but you can Kind of get the idea, you can see some of the subsystems, they’re called out. So really breaking it down into a level that can be understood and addressed is is kind of the key ingredient here. Especially since a lot of this, a lot of the aircraft that are flying today are what you might call legacy systems. So they weren’t maybe necessarily designed with the data science first approach in mind. And they’ve also been around for quite a while. So some of the expertise has kind of moved on. And, and, you know, there’s, it’s a tricky, it’s kind of a tricky thing to address, because you’ve also got, you know, some tribal knowledge in some groups. And our goal is really to pull that together and make sure that, you know, the knowledge that’s it’s captured anywhere can be applied to everywhere, so that we can, institutionally kind of apply the expertise to every single flight that we see. So while we’re doing this, we’re going to be kind of analysing some data and we might see some data that doesn’t quite look right. So first thing that eyes data scientist I’m going to do is I’m going to go out to my subject matter Experts say hey, you know, I’m looking at this subsystem, I’m looking at this data point doesn’t quite look right. So they might come back to me and they say, Oh, that’s just kind of, you know, a function of the way that particular component works. And if you analyse like this, and it actually does make sense in the context of the rest of the flight, but if it does look weird to them, then it could just be that it’s one of the idiosyncrasies of the data capture platform or just a function of the complexity of the aircraft. So I’m gonna go and hand it off to my colleague, Chris, and he’ll walk you through some of our data science skills. – So thanks, Matt, for the handoff and continuing to mirror what Matt has been saying, so far, we’re going to talk about some of the data complexity.

Data complexity

For it’s complicated as the platform is mechanically,

the data is equally complex. mechanical systems rely on the interactions between various components, which introduce countless confounding effects into our data.

It is a mix isolating and explaining behaviours of the data

even more difficult to further stack on top of that, because of software revisions and sensors added after development and different priorities and placed on different systems that aspects of recording rates can vary wildly from purely discrete information to thousand hertz signals and everything in between.

to further support this post development insight

have been added as added as the rest of the system has advanced through the years. Different software versions are deployed and can lead to varying types of data and columns which may not be consistent or even may not be consistent between the same aircraft.

So this brings us to our ETL pipeline to handle all of this complexity. In mixed data, we use some custom Python code in combination with the Apache Spark schemas to manage our ETL pipelines. We have two of them they’re quite similar but serve different purposes and deal with two different types of input data.

ETL pipeline

First is our batch analysis pipeline. This pipeline was built to handle already decoded flight data that’s delivered to us daily and adjusts to our on premises compute cluster.

Once we receive this data,

it goes to our staging warehouse where it’s then cleaned and validated before being an output to our analysis warehouse.

The analysis warehouses are working where house where any of our feature development data analysis and model development is completed, running in parallel, but on a remote system that we don’t have access to is our streaming pipeline.

It takes in encoded flight data as it is uploaded

from the aircraft, in which Kefka handles

launching the decoding application,

and then sends it over to Spark streaming to go to the same cleaning and validation that our batch analysis uses.

Once spark has finished with it, the clean invalidated code has been sent over along with our features and models output to another analysis warehouse that’s accessible from the dashboard.

So this brings us to our data science cycle.

Our data science cycle

While it’s similar to most of the other data science, life cycles that are out there, we found some things that we can focus on to make it easier for us. Dealing with this complex mixed data is difficult and bringing the problem down to the slowest level has been really helpful in dividing the problem out and making it actually something we can tackle.

The learning and understanding of a particular subsystem you’re working on is absolutely critical. Being able to make progress.

Not developing this understanding means you’re going to be asking more questions. This means gonna take up both your times and it’s going to cause a delay in the progress to be made on the projects. And then once you’re finally up to speed with the domain for that subsystem, you can continue to work with the speed that define the problem you’re trying to, to identify and create some potential indicators to find the component degradation that you’re after.

Once these initial indicators have been analysed, and you work through with this mes features can then begin to develop out of these indicators. And once the features are developed, you just work with me to refine them false positive rates in our use case needs to be practically zero, as the only thing worse than unscheduled maintenance is useless maintenance.

This cycle continues to iterate.

As both the SMI and the data scientists continue to learn more about the subsystem data with every loop of this scenario.

And this brings us to feature development. There are two main types of features we develop. There’s logic based features and deep learning based features or machine learning. Based features, whichever you feel like calling it, logic based features are made by us essentially augmenting this MusicBrainz with our data science superpowers.

Feature development

behaviours This means have learned over the years on official threshold and other tribal knowledge can be turned into code that operates on nothing, nothing but logic to detect degrading components. And while there exists, error and fault detection on the aircraft, adding a new fault or an error on board may take a prohibitively long time it could be you know, the better part of a decade or so, depending on how intense of an error it is to get it to the entire process.

But what we can do is we can enhance existing error detection in enhance new errors by creating features that measure the exact same parameters apply it historically all in a matter of days for a process that used to take years previously.

Overall, the logic based features we’ve developed so far have been moderately predictive in terms of finding component degradation.

On the bottom left on the graph using the NASA the NASA dash link data set, we have an example of a feature We may create for something like this. In this case, we’re looking at the rate of change of the fuel flow from the fuel flowing from one tank to an engine. And let’s say we have a theoretical threshold set at a rate of 200 pounds per hour is the fastest that feature should change.

The potential feature in this case would count the number of times that we exceed that rate of change. And then that would be outputs to the dashboard as a feature to be viewed.

Machine learning or deep learning features on the other hand, were made to find things that humans simply can’t. We use a combination of spark ml library and pi torch for developing these features are our main use cases for them is learning what normal behaviour is and detecting complex parameter interactions. Predicting normal behaviour is useful for when the behaviour of one component depends on the other components around it.

Going back to that confounding effect I was speaking to earlier, and the logic itself was not enough to capture the complexity of this interaction.

From the deep learning features we’ve developed so far. We found them to be highly predictive of component degradation, but they’re not always the most Were things because hey sometimes have unexplainable behaviours, and that’s something we need to minimise by working with this mes.

On the bottom right here we have an example of a deep learning feature that may be made, again using the NASA dash link data set. And in this case, it’s one of our virtual sensor feature models. We’ll discuss those on the next slide. But in the meantime, it attempts to predict the amount of fuel flow that’s occurring in orange from this tank, given the rest of the aircraft’s state and its recent states, it performs well overall, and it tracks the fuel flow, the actual fuel flow in blue overall.

So this brings us to a little more discussion

about the deep learning models that we use, which one has to take a minute and discuss a little further detail. The first type of deep learning model we use is anomaly detection. The goal here being defined unexplainable or errant behaviour in an otherwise normal normally noisy data signal.

pictured on the top right there’s a graph is the graph from the previous slide, which shows our virtual sensor models fuel flow prediction against the actual fuel flow.

In this case, you can see we highlighted two areas

on that graph that appear to be a potential anomalies, however, we found them not to be as not only did the virtual sensor model pick up on the behaviour and follow it, but we also looked into that flight in particular manually to determine that that was actually a normal behaviour in response to something occurring in the aircraft.

The second type of deep learning model we use

are virtual sensors. These models excel in knowing what normal behaviour is for a parameter, and they can pull double duty as anomaly detectors as well.

They have the help of being able to model the parameter the behaviour of a parameter that otherwise wouldn’t exist or to help fill in with data dropout that sometimes occurs. For targeting specific interactions between large amounts of parameters. We use information to pression. We take the input data, pass it to an autoencoder that compresses it, and attempts to reconstruct it on the other side by looking at the reconstruction error between the model input and output that’s in the bottom right. We can use image classification then to determine what might be going wrong with the data.

Deep learning models

And this brings us to the point Phase of everything we’ve developed our features which established our features, refined them work with the semies to, to learn about them, and now we need to display them back to them in an explainable and an accurate way. And so before anything ever displayed, however, it needs to have some meaning of good and bad defined these features output some score, but we don’t actually know what they mean at the moment. So this is where feature score normalisation comes in. We normalise the features using techniques like Johnson transforms and Z scores. And then we work with this means to define a direction of bad for the features performance. This falls into one of your three general directions for distribution you have left tailed, right tailed and two tailed and then we colour code the future score based on how far from the mean you get. We can see here that just one sigma away you’ll mostly be green, a little over a little over one sigma away, you’ll be going yellow and then once you cross to Simo away, you’ll you’ll end up being red. displaying a few hundred features at all times is kind of unwieldy You won’t be able to realistically go through them all and pull any kind of meaning out of it quickly. So to do that we need to aggregate scores. On the bottom left here is a snippet from our dashboard that shows the aircraft’s aggregate score, it kind of shows gives the SMI and insight into the aircraft as a whole, and the entire fleet all at once. So you’re able to quickly target the aircraft that are showing as the the least healthy aircraft. And then once the mea identifies an aircraft that wants to, they want to look further into that, whether it be something that’s station where they’re at or a particular aircraft that may have a problem that they saw earlier and down the road. And then they can drill down further like in the bottom right image there, where each column is a flight and each row is a feature for that subsystem, with that blue line representing a component replacement. So looking through time from left to right, we can see that this aircraft subsystem features started out doing okay. They were pretty green and then we slowly gained more yellow and more orange And then we started getting intermittent red, before finally getting eventually complete, consistent red, which indicates that that part has failed at this point. And then once we hit that blue line, we see the component was replaced, and everything went back to being okay. But having this dashboard is great hosting, securing and updating the dashboard, and the tool just doesn’t happen automatically, which is why I’ll turn it back over to Matt, to discuss the deployment of art. – Thank you, Chris.

Yeah, that’s right. So some of the things that we look at when we’re deploying we want to focus, our main focus is on speed security and live monitoring.

You probably know these better

by their other names development,

security and operations. For speed for us, it made sense to so we’ve got our batch pipeline in our analysis pipeline. For us, just where we sit in the organisation, it made sense for our batch processing pipeline is for a particular use case right now, to run on some on prem hardware.

So we went ahead and set that up

and for the streaming stuff to be done in the cloud. So in both cases, we try to automate as much as we can infrastructures code, just trying to make sure that our builds are reproducible.

For security as a defence organisations are top priority.

So we try to integrate ourselves into the risk management framework, the RMF, and make sure that we’re operating under a CTO or continuing authority to operate. One of the things that really helps a lot in this regard is an effort of the Air Force.

Chief software officer office, it’s called the platform one, its goal is to harden cots products, open source products, so they can be deployed to a secure registry to secure like a secure Docker registry.

So the idea there is you have your code,

it gets built, gets tested and scanned,

and if your code passes their scans,

then it gets pushed up to the secure registry. And you know, there’s some work on the back end to make it work.

But that’s the goal, right? So trying to get code that’s built, tested automatically.

Scan using, you know,

these these container and static analysis tools

And then ultimately pushed up to a secure registry where where can then be deployed on any, any certified cloud platform.

So we primarily deployed to internal Navy clouds, and looking at looking to expand that kind of. So ultimately, the last last component there is live monitoring systems. This is our kind of operations dashboard. And it works well for ensuring that your technology is working well.

But one of the things that you also need

to be making sure is happening is that you’re communicating with your users. And it’s difficult to dashboard that user happiness, right? So you kind of need to reach out to your users at all stages. Or we found that that’s kind of the critical ingredient for us is constant communication with our users. So critical at all stages to the to the development cycle here. Everything, you know, we have data questions about them. We have data science questions, we have questions about maybe how we should deploy and when we should deploy and where, what the what form the tool shells really take, what features should we include what should we not We we bring in our users kind of at all stages. And for us, it really helps because our end users are the same people that assisted in the development of the tool, right are subject matter experts. So making sure that we’re incorporating their needs into the tool itself, and that the tool continues to be useful to them that it continues to be useful in a decision support role, when we’re making these kinds of decisions. Another thing that we found is to be important is that these features are accurate and explainable. So if we want to, if we’re gonna, if we’re gonna highlight an aircraft and say, you know, this aircraft is gonna need some scheduled maintenance soon, we need to be able to defend that decision whether we’re using a logic based feature or a machine learning or deep learning based feature. It needs to be understood what exactly that feature is calculating so that we can make sure that that our engineers and our end users ultimately understand what they’re looking at with this tool.

Identify Incorporating feedback

So, you know, communication is really critical at all points in this development process. So to speak. Open up the development cycle real quick. This is our heart process laid out slightly differently. So, you know, we’ve got this kind of continuous feedback loop going on starts with collaboration with this mes and you know, you and this mes are we in this means really are learning from one another, going back and forth. And then when we start to develop a product and we iterate, we say, Okay, here’s what it looks like now, what do you think, and they’ll come back to us with some feedback on ultimately refining, you know, mining out those those false positives, making sure we’re not generating any sort of nuisance alerts, showing our strategies on visualisation, and coming up with a deployment strategy and making sure that we’re able to iterate quickly and fix fix problems. And when they when they start to come up. And just capturing feedback at all stages, that user cycle, making sure that we’re incorporating what our users are telling us.

So, with that in mind, we’ve got now a process that we’ve been able to tailor to news subsystems and new components. And we’re looking to expand into new platforms as well, ultimately to meet the needs of the subject matter experts that reside in this particular domains.

So what have you learned so far? One of the most important things that we’ve learned is that the data science role is is kind of a, it helps really to be had that domain expertise. And it’s difficult to get that unless you’re deeply embedded into the organisation. And so having the ability to having a kind of the ability to have subject matter experts, right beside us, when we’re developing this kind of thing, has really been invaluable. That’s one of the key takeaways, I think, that we have, we have found and coming up with this kind of predictive maintenance system for this complex mechanical system is we the system experts absolutely have to be the central part of the conversation. Also, we, we found that, you know, without finding without having the ability to learn from those experts, and from you know, kind of teaching those experts what can be done with such vast quantities of data. You know, you have to be willing to learn on both sides. We have We as data scientists had to learn a lot from our SME’s, we’ve got plenty more to learn. And at the same time, you know, kind of teaching them like what kind of questions are they allowed now ask that they can leverage the entire set of Fleet data. And finally, it’s being being such a large scale effort and working with such a complex system, especially a legacy system. It can’t be done with just one team very difficult. So having having the ability to lean on our data science partners, inside and outside the organisation, being able to leverage work that’s been done throughout the organization is also very helpful. What do we hope to where do we hope to take heart in the near future? Well, obviously, I collaborate with partners, we’re looking at starting to include additional subsystems and components and platforms. So the new aircraft, we’re looking to include those and kind of roll up this, this cycle and this process that we’ve sort of hammered out over the last few years kind of a product, defining it and applying it to new platforms. So the other thing we want to do really is be able to deploy this kind of thing at the edge. So, you know, if an aircraft flies, there’s no real reason. If we’re, if we’re sure we can diagnose a particular degradation, there’s no real reason for us to have to insert ourselves in the middle of process there, you know, we can say, Well, okay, we know how to detect this. So you know, when you fly, you don’t have to send us the data, you know, you can run this kind of right there at the edge, right where the aircraft is. And then, you know, if there is a particular feature that has a high confidence of correctly diagnosing a potential issue, then the action can be taken there, but we do have to be careful there. It’s important that there’s sort of a formal process in place to be able to review that kind of feature because , you know, me as a data scientist, you know, I might get wrong. It’s up to you that we have, we have a we have a great team behind us and they’re there to make sure that that nothing falls through the cracks on mine as well. You know, so while we’re kind of working towards applying this vast enterprise knowledge base to All the flies You know, there also needs to be you know, some sanity checks in place to where the code that you know Chris and I write at the nerds is also going to be sanity check validated by Review Board of a consisting of engineers and in platform designers and what have you. So pushing that out to the edge and working on all the details on the way there is also something that we’re very interested in doing. So, we sort of discussed so far all the complex aviation use case, but we also feel this process can be extended to other complex mechanical systems as well.

Watch more Spark + AI sessions here
Try Databricks for free
« back
About Christopher Miller

Data Scientist working on the Naval Enterprise Research and Data Science Team (N.E.R.D.S.) @ NAWCTSD in Orlando. Chris has a Bachelor’s in Computer Engineering from the University of Central Florida. His background resides with many years of experience in the DoD space in modeling, simulation, and test and evaluation.

About Matthew Proetsch

U.S. Navy 

Lead Data Scientist in the Naval Enterprise Research and Data Science (N.E.R.D.S.) @ NAWCTSD working predicting maintenance for aviation platforms. MS from UC Berkeley in Information and Data Science.