The goal of this post is to investigate potential features for predicting patient no-shows. The following are the details of the No-Show Prediction Challenge and the Clinic Booking log Optimization Solution:
Business Pain Definition:
- No-Show Patients are defined as “patients who did not keep nor cancelled scheduled appointments”.
- According to academic research, No-Show Patients rates have been shown to range from 11%–31% in general medicine clinics.
- No-show appointments lead to a reduction in the quality of patient care, reduction in productivity, financial losses to the hospital and impaired outcomes of patient care.
- A clinic suffering from 15K no-shows within one year is looking at an approximately $1 million in annual loss. This scenario is an optimistic one.
- With the avg. no-show rate of 62 per day, the annual cost of missed appointments is 3 million USD per hospital.
A Machine Learning model that predicts the probability of patients’ no-show. It will be based on various patterns of no-show cases and patient data such as:
Age, gender, previous admissions days, appointment age, appointment type, address, marital status, previous no-show rate, demographic data, date, etc.
In addition we will examine the factor-in of external datasets (such as weather and public transportations).
The ML model scoring results will be part of a smart MVP solution that will optimize clinics’ appointment lists.
We collected electronic medical record (EMR) data and appointment data which include patient, provider and clinical visit characteristics over a 6-year period from 12 various clinics:
- Patients Demographic Data (2013–2019)
- Visits (2013–2019)
- No Shows (2013–2019)
- External data source (2013–2019).
High Level Architecture:
A hybrid architecture that enables optimized appointment listing based on the probability which had been obtained from the real-time prediction machine learning model
The above diagram describes a hybrid architecture that integrates to the existing clinic reservation application. Each time a patient asks to reserve a new slot the predictive ML model will find the optimal slot based on the API score result.
NoShow Classification Model Steps:
- Data Preparation & Data Exploration.
- Feature Engineering.
- Imbalanced Data Techniques.
- Feature Importance & Feature Selection.
- Training & Cross Validation.
- Scoring & Model Deployment.
Data Preparation & Exploration:
In the first step we combined 3 datasets together.
Then we extracted some date fields and rearranged all columns as a single dataset.
Next step: exploring our data set and visualizing our aggregated data by column graph.
We can notice a normal growth of total appointments per month during the 2013–2019 timeline period
Marital status doesn’t seem to correlate to no-show rate.
Following the exploration step we are ready to engineer our attributes and find a strongly relevant feature (one that holds information not found in any other feature) and relevant features, such as: patient previous no-show- total value.
“Imbalanced data typically refers to a problem with classification problems where the classes are not represented equally”.
You can read more about tactics to combat imbalanced data here:8 Tactics to Combat Imbalanced Classes in Your Machine Learning DatasetHas this happened to you? You are working on your dataset. You create a classification model and get 90% accuracy…machinelearningmastery.com
Anyhow, this is the average no-show rate prior the imbalanced data fix.
The tactic that I chose was Random Under-Sampling for the majority class (‘show’ records).
Training and Scoring:
Confusion Matrix & ROC:
Our receiver operating characteristic curve that illustrate our binary classifier, here we can understand our true positive rate against the false positive rate:
Baseline Comparison (Azure AutoML):