Oil and gas companies have thousands of wells in production. These wells frequently require action or maintenance. However, it is impossible for engineers to monitor them all. Thus, a model to determine whether or not a well is online based off real-time data streams would be of significant value in order to keep wells running and producing. Creation and deployment of such a model presents challenges with both people and technology. To be of value, it must be utilized, and ergo must both be accurate enough to provide value and be understandable and trusted by its end users. These necessities require careful balancing of complexity and readability. Additionally, in a world where opening just one additional application, window, or screen can be a roadblock for adoption, integrating with currently used technology is a must. Thus, a method of model creation that gives end users insight and can be deployed directly in existing systems would be ideal.
To accomplish this, Databricks is used to pull well data directly from an OSIsoft PI database. Then, statistical analysis is performed and python models are created utilizing various techniques including outlier removal, SMOTE (oversampling), hyperparameter tuning, and gradient boosting. Then, utilizing a custom python function, created models are combed for features and output as code which can be placed directly in PI as an AF Analysis, thereby allowing these models to live in users’ existing systems. This deployment method offers pros and cons but maximizes usability and potential for providing immediate business value. This talk will also include discussion centered around the need to sometimes temper the data science mindset of academic perfection in the pursuit of usability and adoption.
– Hi everyone, I’m Tristan Arbus I’m a data-scientist for Devon Energy Corporation, in Oklahoma City.
I’ve titled this presentation, “Is This Thing On?” which in this case refers both to the Well State Model that I’ve created which is designed to determine whether or not a well is in production, hardly running, as well to non-data scientists working closely with data-science models and wondering is this thing even working right? The goal here is to solve both of these issues detectable problems, and the people problem.
So this is one of my favorite quotes and one that I use to guide my project philosophy, “People ignore design that ignores people.” Frank Chimero is not a data-scientist, he is a designer, one who works in media so think branding books, he’s the type of guy you might call up to make you a new company logo. So obviously he wasn’t thinking about data-science when he penned this. I think it holds true for anything that’s people facing. Including applications in data-science and their end users. So in the case of this particular project, we have to take careful steps to make sure our processes, models, and results were adjustable and usable by engineers and techs. Which I’ll talk a lot more about along the way.
So with that here’s what here’s what I’ll be covering. First, in order to better understand the problem and why we care, I’ll talk a little bit about what production is exactly. Then I’ll discuss how we came up with the solution that was usability and adoption, and I’ll close out by talking about the good, the bad, and the ugly about what we do.
So before all that, this right here is the first thing that you, your end user, clients, so on and so forth, should be asking when starting a data-science project of any kind. What is your goal? And your goal should guide how you approach the problem from the beginning. So what’s more important? Model accuracy, or model readability. Does it need to be understandable? Who is buying the deed, how will you get it? The answers to questions like these could prescribe certain model types certain processes or methods of employment. In the case of this particular project, we need to make sure you come up with a solution that is understandable, as accurate as possible and ideally is immediately usable by engineers and field personnel without additional software and equipment. So keeping those goals in mind, let’s start talking a little bit about exactly what production is and what were here to solve.
Production simply put is the phase of a wells life-cycle, where were extracting hydrocarbons by the time a well gets to production a huge investment has already been made and its not til we gather our product essentially that we begin to profit. It’s also the phase of our well that lasts the longest.
Typically frac wells have some sort of artificial lift system to aid in hydrocarbon extraction. There are many forms of artificial lifts such as gas lift injection, or electric submersible pumps. But for this project we decided to initially focus on rod pumps due to a controller that provides a regularly available supervised learning data set. So basically in order to extract oil and gas the beam system rocks back and forth operating the string of rods below, these work similarly to the pistons insides its cylinder. Converting the reciprocating motion into vertical fluid movement and thereby lifting oil from the reservoir through the well up to the surface. And for the sake of this project it’s not incredibly important to understand exactly where for everything in this diagram works. Just a very basic understanding that we’re dealing with certain gas rates and pressures suffices just fine.
So lets take a look at those. This is an example of just some of the parameters we receive from the field, it’s displayed on a time scale. So here were looking at various pressures and gas rates over about a one month period. Choke Differential Pressure is the difference between the pressure and the pump jack on the backside of the well. Total Gas Rate is pretty self-explanatory it’s the rate of gas leaving the well ward. And Tubing Pressure is the pressure within the production tubing string. Which I could think of as the outside shell of the piston. These parameters are known to form the backbone of our model.
In this screenshot we actually have two fairly distinct shutdown events. Labeled as periods of off, well they look pretty obvious in this context this is an idealized case. There not all that simple or clean and we do need some sort of model that’s more than just some simple threshold. And I’m also gonna talk a little bit about why you want to know why they’re shutdown . This are obvious reasons one being if your rod pump is offline you want to get it back online. If we’re not making money when we should be that’s a bad thing. A better understanding of when we’re shutdown or not will also help our reporting and give us a better understanding of our rate of production while online. And also we’d like to optimize production while online and learn more about predicting shutdown events before they happen. So in order to do this we first need to know when we’re shutdown and when we’re not. This is of course keeping in mind that we have thousands of wells in production at any point in time and it would be impossible to manage all this and many more.
Like I mentioned before thankfully and for myself included we don’t really need to understand exactly what these parameters mean to a production engineer. What we need to do instead is work closely with production. Subject matter experts are SME’s to get a better understanding of what we might expect the parameters to during a shutdown event. So what your looking at here is the initial whiteboarding session during which exactly that was discussed. We were able to hold in on the fact that certain parameters should drop to zero and if they don’t always, certain parameters should get quiet which would lead to maybe analyzing the standard deviation or the range and certain parameters would be rising or falling which means we may want to analyze it slow. This is all lets say that something is simple as a whiteboarding session like this with an SME can lay critical ground work before on.
The architecture used to get production data from the field and into the hands of our engineers and technical personnel is also a critical factor in how we approach the problem. Each device in the field is attached to a controller which communicates wirelessly with supervisory control and data acquisitions SCADA system. The data is then organized and stored in our PI Asset Framework or PI AF which acts as a hierarchal asset centric repository for our data. I have it circled here because the PI AF is going to act largely as the linchpin for data access and knowledge appointment for us as well. Has a built enameled in it’s engine and is also where people already go for all their production data needs. Exposing that data via real time alerts and for surveillance tools and dashboards similar to some of the screenshots I’ve already shown. Well, I guess this is definitely an industry where additional piece of software, one window or just an additional mouse clicker to connect as a barrier to adoption. So the closer we can bring our models to the existing systems, the better.
I mentioned this earlier, but i wanna talk about the controller for a second. Rod pumps are somewhat unique in that they output a run status from the controller that tells us whether we’re running or shut down. What this means for us is that we essentially have a baked in supervised learning data set we can tell our model where we’re shut down for where to learn from with some level of confidence. So why are we trying to solve a problem that already has a controller telling us the answer? There are a few reasons. Like I mentioned rod pump is unique. And while we have about 750 wells operated with rod pumps, we have closer to 6000 wells operating other lift types, lift types for which we don’t have a supervised learning data set. So if we can prove the ability to recognize shut down events, our SMEs can be convinced to manually go through some data, designated shut down events by hand and creating that supervised learning data set for us for other lifetimes. Without some sort of proof upfront that it works could be difficult to get that sort of time commitment.
So here’s a less idealized example of what our data might look like. As well as some solutions to the issues we see. Because of ratty data will want to perform some sort of smoothing. In our case done via simple moving averages. We can deal with missing data by defining certain data quality filters that can check the parameters are within expected ranges and are correctly communicating. PI can help us deal with variable data frequency connectively as it interpolates already even if we asked PI for a data point everything 15 minutes, it will automatically do that as long as we check for data quality. And lastly, it’s important to recognize that we have multiple parameters we want to analyze. But we don’t want our model to fail if we’re missing just one, which is extremely common. Taking this into consideration, we chose to go ahead and make multiple models, one for each parameter separately, and combine them in the end, depending on how many accurate parameters we have at the time with good data quality.
PI System Explorer is where we did a lot of our backup work. Here we can perform calculations on existing parameters, normally referred to as attributes of PI, and output new attributes using the built in calculation engine. So here, first we set up a new attributes check for data quality, by checking for bad values and making sure that there have been enough events and we’re still getting communication for that parameter. Then we calculate the basic statistics again, like I mentioned, the moving average slope, a standard deviation and a range. These will not conveniently live within the PI AF, along with all of our other parameters for easy access and can act as inputs to our models.
So at this point, we could very easily make a simple trigger using the same calculation engine that says something to the effect of I expect my well to be shut down as by difference or pressures quiet. So I’ll take a guess and say that I’m shut down if my standard deviation is less than five. We could easily program that alarm directly into the PI AF, but to me, that’s like sticking your finger to the wind to check for direction. You can probably get close ish but for sure there’s a better way to do it. Better tools, better methods and ideas exist. So instead of sticking our finger into the pool of data, or taking our best guess what can we do?
Decision trees are the answer. After all, they’re just large sequences of if then statements anyway (mumbles). Hopefully they can hit that sweet spot we’re looking for between accuracy and understandability and we should be able to figure out how to get them back into the Asset Framework.
First it means to get our data out though. Most people familiar with Seeq utilize it as a visual data analytics processor where you can search process cleanse, find patterns, etcetera. It can connect to a number of databases and including in our case, the PI AF, and we do use it sometimes. However, there’s also a little something called Seeq data lab, which allows us to utilize seeq as a data grabber that can pull data from the PI AF to databricks through seeq. So here we’re just using DB utils to install the seeq library, import the spy function and use it to connect to the seeq database. And from there, it’s as simple as searching for assets along your wells path, grabbing the assets you want to request and pulling them. Here I’m pulling a few parameters on a 15 minute interval over the course of about a month and a half. In reality for modeling I would also be pulling all of our calculated Data Quality in statistical parameters and pulling over a much larger timeframe.
With that in mind, here’s our modeling workflow within data bricks, we’ll import our data and filtered based on data quality. Like I already mentioned, perform a training and validation split and over sampled the training set. Do some basic hyper parameter tuning before performing a grid search and training our final tree. And then finally, to solve our deployment issues, we’ll export the final tree as code that can be interpreted by the asset frameworks analytics engine. We already discussed how we import and filter our data. So let’s jump right into talking about the training validation split.
I initially tried basic 8020 training validation split, using 80% of the data to train and 20% to validate the model, but random issues with generalization. So in an effort to increase our models ability to generalize across an entire field of wells, we asked our engineers to choose our validation set for us. That is we asked them to determine a set of wells that never mind that the model works on those wells, it will work everywhere. We asked for the validation set to span everything they typically see, as far as varying data quality and parameter availability goes. This sort of manual validation set selection proved to perform way better than even more out of quote advanced techniques such as K fold cross validation. This is simply a case where our subject matter knowledge is superior to randomness, which not only helps the model to generalize, but helps the engineers and end users to gain more knowledge, influence and ownership over the modeling process, which is going to help with overall understanding and adoption.
One of the biggest hurdles to overcome is that we have an imbalanced data set. In fact, 5% or less than 5% of our data represents wells in a shutdown state. Typically, a few methods exist to deal with such an issue, including under sampling over sampling and awaited classes i ended up using the Synthetic Minority Over-sampling Technique or SMOTE. In case you’re wondering, because I did have to look this up, SMOTE is the past tense of smite. But we’re not talking about divine punishment for our data. Now the SMOTE process is part of the imbalance learning Python package, and allows us to create new synthetic data for what to train. What you’re looking at here’s an example of how SMOTE work on a three class classification problem, where the yellow class greatly out numbers the teal and purple classes in the original data set on the left. By applying SMOTE to the data set, we can achieve something like what’s shown on the right over sampled data set with new samples that mimic the distribution of samples in the original data. There are a number of different algorithms within SMOTE, mostly based around which algorithm is used to identify samples to consider during resampling. For the purpose of well steak, we ended up using SPM SMOTE, which is a version that uses the support vectors from a support vector machine algorithm to create new samples.
Find the ideal parameters for our decision tree. We can do a grid search, which we will do, but first, we should find some good starting points. One way to do this is by iterating on one parameter while keeping all the other parameters default and checking your model accuracy. Our chosen accuracy criteria is AUC, which stands for area under the curve. It represents the probability that a random positive sample is ranked more highly than a random negative sample. And a higher AUC means a more accurate model. As mentioned earlier, we’re going to make a different model for each parameter and statistics. So we’re gonna wanna run this process for every set of inputs. And here we see the results for keeping all variables except for tree depth constant for our differential pressure and total gas rate based models. And it looks like the training and testing AUC diverged in the eight to 12 frames for both, which means in order to prevent over fitting then maximize accuracy, something in that realm, that’d be a good place to start our grid search. And we can repeat this process for all of our other decision tree parameters.
So that’s exactly what I did. And then we performed our grid search. So here you can see the best and worst results of what I would consider to be a brute force grid search performed on our parameters. I iterated on the column seen here. Shown here are the top and bottom performers. And we chose this method of grid search mainly to gain some greater clarity into the benefits of adding depth to our trees, as well as so we can harvest the best parameters for trees at different depths.
The reason for that was because we wanted to start with smaller, more understandable decision trees. So if you handed this image to an engineer who’s familiar with production, it’s easy for them to make sense of the decisions that are being made. So to follow these decisions, you just start at the top of the tree and follow it down. In this case, the first choice you make is arguing pressure range less than seven. If it is we go to the left if not go to the right. Whatever box we end up in tells us if we’re more likely to be shut down or running. So if we were to follow the logic down the left half of the tree, you might say something like, if my range is low, and my standard deviation is low, I’m shut down. The lower my standard deviation, the more confident I am about that. If my standard deviation is a bit higher, but my tuning pressure is low, I’m probably still shut down. But if I tune in pressures higher, I can’t really be sure. And honestly, that’s really good. The ability of our end users and engineers to understand exactly how a model is making its decisions is extremely important. However, remember our deepest, or sorry, our most accurate trees were something in the realm of seven to 10 splits fit. And we still have to figure out how to get these back to the PI AF. So luckily for us as mentioned earlier, decision tree is really just a sequence of if then statements.
So, this Python function here is written to converting decision tree for its features and print out code that can be copied pasted into the PI AF. Essentially, we have written some code to write some code. Function requires only two inputs, the tree object and the feature names. It takes the feature names and performs some text formatting, and then reverses through the tree printing if statements for each internal node. Once it gets to a leaf node, it utilizes the Gini value of that node to calculate the probability that the well is shut down at that leads us we’re able to output exact if then statements you would follow to make decisions based off that tree. I should note that we could probably write these if then statements ourselves in about 10 minutes from just looking at the tree. But that’s not where the major benefit for this function comes in.
This is where it comes in. So if you remember earlier, the ideal depth again was about 10. So good luck writing code from this guy with our columns custom function, we can easily get the code on the right to paste in the pot. And something that I wanna stress here is not just the scalability of the code, but the scalability of the idea. We chose decision trees as our model of choice for a few reasons. One of which was to give increased insight and understanding into the model structure and decision making process. There’s only a relatively small conceptual jump to make from understanding a small readable tree to a larger one. And that understanding into our model or as additional trust and buy in all important things our solution needs to have. It’s also worth mentioning that our increase in accuracy isn’t that high when making a deeper tree, in this case, less than 1% but that’s fine for a few reasons. First, it’s an increase all the same. And according to our hyper parameter tuning, we’re not over fitting so that’s okay. Second, it offers increased granularity. So all the actual designation of running versus shut down hasn’t increased accuracy much. Those decisions are now spread across many more leaf nodes with a much wider range in our Gini impurity values. This means when we run the model, and view our percent chance for shut down, we’ll be able to assess with more insight into the models confidence at any point time. And lastly, in this rare case, I’m okay with the tree being big for the sake of being big we’re not sacrificing accuracy or over trading. So we have an opportunity for our end users to utilize a model that they can understand but can necessarily read. And this paves the path to other potential models in the future that may have less understandability and an increased trust in the data science process as a whole.
So here, you can see the code, pasted back into our PI environment. And based on the coloring, our PI is parsing out the code exactly as we like. The only thing we did was add a data quality check on top to make sure we only execute the model when we have appropriate data quality.
So let’s take a look at how our model performs on the shutdown event we looked at earlier. Here we have our parameters and our two shutdown events circled. Again, you can see the run status controller triggers a shutdown. Right below the run status, we have our models, the tubing pressure model in blue, the delta pressure model purple, tone gas rate model in red, and an average of all of them in yellow. You can see most of the time our combined model floats in the 40 to 60 range, not really exhibiting much confidence in anything really, until we get to shut down with it, where it climbs up dramatically, making it extremely easy to recognize. Below the models on the bottom trace have something called Event Frames, which are function of Pi and consists of some back end logic running on our models that allows us to boil things down to event level and recognizing query periods of downtime that we consider to be a shutdown event. You might notice that the red total gastric models often flatlined at 40 our total gastric model ended up being less than liable. And therefore we combined it with a heuristic approach allowing the model to run only sometimes.
This slide is mostly just to show you that I’m not cherry picking examples. This was a screenshot, essentially capturing one moment in time for all of our wells where we trained our model. On top, they’re sorted by those with lowest shut down confidence. And you can see they’re all running. According to the controller. Below, we have wells with the highest shutdown confidence and most, but not all of them are shutdown according to the controller. So overall, the model is performing well on a macro-scale. But let’s talk a little bit about more when the controller is wrong.
Do we have an example when our models actually outperform the controller? Again, I have our shutdown events circled and it may be a little hard to read, but the run status controller is actually outputting status of run. However looking at our parameters and going based off of our subject matter experts and knowledge we’re clearly shut down. The model recognizes this and very easily creates an event frame that recognizes the shutdown event. Showing that even after training on the controller based supervised learning data set, our models can even outperform the controller in some cases. So with this proven out for the next phases of the project, we’ll be moving on to training similar models for other types of lift that don’t have a controller. Now their end users can see and understand how the models work and perform in a production environment. There’s no motivation to put in the work upfront, and create a supervised learning data set. And we’re working on those sorts of things now.
As most data science projects, there’s always room for improvement. Firstly, and largely beyond the scope of this particular project is the data quality and approvement to data collection. The better our data set is up front, the more likely we’re able to get information out of it. I would have liked to spend a little more time feature engineering, why I started teased about putting our finger to the data window and choosing random thresholds for manual approach. That’s kind of what we did with features. We took some base level statistics and ran with the visit puts into the model. I believe the models are benefits from some additional time exploring and engineering the inputs, and my last two bullets here sort of go hand in hand. There are model types that can outperform decision trees. And then I would like to utilize. But our choices isn’t trees was based on other factors such as readability and ease of deployment, and I believe was the right decision. But now we’re starting to get that buy in, we may be able to move on to more complex models and deployment solutions in the future. Because, yes, planning decision tree as code in our PI system is not very long term friendly. You run into issues with performance tracking, potential model updates, so on and so forth. And so as we have some great stuff in production right now, we’re still figuring out the best ways to support it and transition it into a more future proof solution.
So just a quick thank you to the engineers of IT support members of the analytics team and others who’ve worked on this project with me, to the spark as well for having me and also thank you for all of you for listening and watching. I’ve really enjoyed working on this project and being part of the conference.
Tristan Arbus graduated from Johns Hopkins University in 2010 with a BS in Physics and a BS in Mechanical Engineering. He has over 8 years of experience in oil and gas as a company man, drilling engineer, and data scientist. He has been with Devon Energy since 2014 and is currently focused on solving complex oil and gas problems through the use of technology and artificial intelligence.