HomepageData + AI Summit 2022 Logo
Watch on demand

Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning

On Demand


  • Session


  • Hybrid


  • Data Science, Machine Learning and MLOps


  • Gesundheitswesen und Biowissenschaften


  • Intermediate


  •  Moscone South | Level 2 | 215


  • 35 min
Download session slides


In our presentation, we will walk through a model created to predict repeat admissions to substance abuse treatment centers. The goal is to predict early who will be at high risk for relapse so care can be tailored to put additional focus on these patients. We used the Treatment Episode Data Set (TEDS) Admissions data set, which includes every publicly funded substance abuse treatment admission in the US.

While longitudinal data is not available in the data set, we were able to predict with 88% accuracy and an f-score of 0.85 which admissions were first or repeat admissions. Our solution used a scikit-learn Random Forest model and leveraged MLFlow to track model metrics to choose the most effective model. Our pipeline tested over 100 models of different types ranging from Gradient Boosted Trees to Deep Neural Networks in Tensorflow.

To improve model interpretability, we used Shapley values to measure which variables were most important for predicting readmission. These model metrics along with other valuable data are visualized in an interactive Power BI dashboard designed to help practitioners understand who to focus on during treatment. We are in discussions with companies and researchers who may be able to leverage this model in substance abuse treatment centers in the field.

Session Speakers

Kelsey Emnett

Lead Data Scientist, AI Engineering

Kimberly Clark

Jennifer Morizzo

Associate Data Scientist


Das Beste des Data+AI Summits anzeigen

Watch on demand