HomepageData + AI Summit 2022 Logo
Watch on demand

Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Science, Machine Learning and MLOps

Secteur

  • Santé et sciences du vivant

Difficulty

  • Intermediate

Room

  •  Moscone South | Level 2 | 215

Duration

  • 35 min
Download session slides

Vue d'ensemble

In our presentation, we will walk through a model created to predict repeat admissions to substance abuse treatment centers. The goal is to predict early who will be at high risk for relapse so care can be tailored to put additional focus on these patients. We used the Treatment Episode Data Set (TEDS) Admissions data set, which includes every publicly funded substance abuse treatment admission in the US.

While longitudinal data is not available in the data set, we were able to predict with 88% accuracy and an f-score of 0.85 which admissions were first or repeat admissions. Our solution used a scikit-learn Random Forest model and leveraged MLFlow to track model metrics to choose the most effective model. Our pipeline tested over 100 models of different types ranging from Gradient Boosted Trees to Deep Neural Networks in Tensorflow.

To improve model interpretability, we used Shapley values to measure which variables were most important for predicting readmission. These model metrics along with other valuable data are visualized in an interactive Power BI dashboard designed to help practitioners understand who to focus on during treatment. We are in discussions with companies and researchers who may be able to leverage this model in substance abuse treatment centers in the field.

Session Speakers

Kelsey Emnett

Lead Data Scientist, AI Engineering

Kimberly Clark

Jennifer Morizzo

Associate Data Scientist

Maritz

Visionnez les temps forts du Data+AI Summit

Watch on demand