Clients: Guilford County Integrated Data & Services (Alice Mahood, Director of Integrated Data & Services and Shelbi Flynn, Integrated Data and Services Program Manager); Guilford County Emergency Services (Jeff Hutchens, Emergency Services Deputy Director)
Instructors: Michael Fichman and Matthew Harris
Team: Leila Bagenstos, Lauren Choi, and Chloe Robinson
In Guilford County, as in jurisdictions across the United States, a small number of residents account for a disproportionate number of calls to EMS. These residents, known as “super users,” often have complex, co-occurring needs: chronic illness, mobility limitations, housing instability, and limited access to primary care. For many of them, EMS represents the last accessible avenue to seek medical care and social support. But Guilford County paramedics and EMTs, who respond to approximately 85,000 calls per year, face significant time and capacity constraints, and they are not fully equipped to address the underlying social and medical conditions driving repeat calls.
The Adult Resource Team (ART) was created in 2023 to fill that gap, connecting high-frequency, non-emergency EMS users to social services and wraparound support. But the ART’s current case management process depends on ad hoc referrals from first responders, a system that is imprecise and difficult to scale.
This report documents the development and evaluation of a predictive risk model to better identify patients at high risk of becoming “super users.” Using three years of anonymized EMS dispatch records covering roughly 108,000 patients, we built a predictive model that scores patients by their likelihood of becoming super users in the following year, before they reach crisis-level EMS use. The model was trained on visit history, clinical complaint patterns, demographics, and neighborhood-level socioeconomic indicators, and evaluated against a held-out test set using metrics matched to the ART’s real operational constraints.
The selected model, a penalized logistic regression, achieves a precision-recall AUC of 0.255 and identifies true future super users at roughly 40% precision when the top 88 patients are flagged. Put differently: at the ART’s current annual capacity of 88 clients, the model identifies approximately four in ten of all future super users in Guilford County. That is more than double the estimated precision of the current ad hoc referral system.
Finally, a cost-benefit analysis demonstrates that model-guided outreach is cost-positive across a wide range of call-reduction assumptions, and that the program’s return on investment grows substantially as ART staffing expands beyond its current two-worker baseline.
Key findings are summarized in the Cost-Benefit Analysis and Conclusion sections.
High-frequency EMS users, commonly referred to as “super users,” represent a small share of the patient population but account for a disproportionate share of emergency call volume. In Guilford County, as in many jurisdictions, these patients often have complex, co-occurring social and medical needs, such as housing instability, chronic illness, behavioral health conditions, and mobility impairments, that make emergency services an accessible but sub-optimal primary care pathway.
The Guilford County Adult Resource Team (ART) was established to provide coordinated social services case management to high-frequency EMS users. ART case workers conduct home visits, connect patients to community resources, and work to reduce avoidable emergency services use over a typical three-month engagement period. With a current staff of two social workers, the ART can serve approximately 88 patients per year.
The primary challenge this project addresses is identification: the current referral system relies on ad hoc referrals from EMS personnel, which is both capacity-constrained and imprecise. A data-driven model that can score patients by their predicted likelihood of future super-user status would allow the ART to prioritize outreach toward those most likely to benefit and could scale naturally as the program grows.
Prediction design. Rather than predicting a rolling window (which produces artificially low risk scores for patients who appear near the end of the data), we divide the full three-year dataset into three fixed temporal segments: an antecedent window (Year 1) in which patient history is used to build features; a 30-day buffer that prevents feature-to-outcome leakage; and an outcome window (approximately Year 3, ~12 months) in which super-user status is evaluated.
Note on race as a predictor. Individual patient race is intentionally excluded from the model’s input features for client liability reasons. Area-level demographic variables (ZCTA-level percent Black residents, percent in poverty, etc.) are retained as proxies for neighborhood-level social determinants of health. Race is retained in the dataset only to accurately analyze potential disparities in model accuracy and precision.
High-frequency EMS users have been studied extensively in the public health and emergency medicine literature. Key findings from this literature are described below. Guilford County’s “super users” share important features in common with high-frequency EMS users elsewhere, though there are also meaningful points of divergence, noted below.
Who are super users? Studies consistently find that a small share of EMS patients accounts for a disproportionate share of call volume; roughly 5–10% of patients generate 25–40% of calls (Kiser et al., 2019; Ruger et al., 2014). The literature also emphasizes the heterogeneity of this population: frequent ED and EMS users are not a monolithic group, and effective interventions depend on understanding the mix of medical, social, and behavioral needs in a given community (LaCalle & Rabin, 2010).
Bimodal age distribution. Several studies identify a bimodal age distribution among EMS super users, with peaks among younger adults (18–34) and older adults (65+). The younger peak is often associated with behavioral health crises and substance use; the older peak with chronic illness, falls, and mobility limitations. In Guilford County, however, we do not observe a strong bimodal pattern in the dispatch data. Super user status is most concentrated in middle-aged and older adults, with less elevation among the 18–34 age group than the literature would suggest.
Substance use and behavioral health. Multiple studies identify alcohol use and behavioral health crises as drivers of high-frequency EMS use, particularly in urban settings. Los Angeles Fire Department data found that the top 40 EMS super users were predominantly male, homeless, and involved alcohol use (Capp et al., 2016). San Francisco data shows a similar pattern (Weiss et al., 2021). This does not appear to hold in Guilford County: our analysis finds little association between substance use or intoxication complaints and super-user status. The most diagnostically distinctive complaint pattern is “lift assist,” which reflects mobility impairment and physical frailty rather than behavioral crises.
Social determinants. A Canadian study of frequent EMS callers found that loneliness, poverty, and poor quality of life, rather than acute medical need, were the primary drivers of frequent use (Booker et al., 2019). Guilford County data is consistent with this framing: super users are concentrated in ZIP codes with elevated poverty rates and in majority-Black neighborhoods, and the lift-assist complaint pattern suggests patients relying on EMS for basic mobility support rather than emergency care.
Effectiveness of case management interventions. Programs combining proactive outreach, care coordination, and connection to community resources have demonstrated meaningful reductions in EMS utilization. Reference benchmarks from comparable programs include:
These benchmarks inform the Cost-Benefit Analysis below.
Predictive modeling. Several jurisdictions have applied machine learning to EMS super-user prediction, with logistic regression and random forests being the most common approaches. Both are implemented here; the Model Selection section describes the rationale for the final choice.
Guilford County provided us with three years of anonymized EMS dispatch data, spanning 2023 to 2025. The initial dataset included nearly 220,000 dispatches. Variables in the dataset included:
During the initial data cleaning process, this dispatch-level dataset was transformed into a person-level one, with one row per patient and summary tables derived from their full call history. A public insurance flag was created from the insurance name field, identifying patients covered by Medicare or Medicaid. Demographic labels for race, gender, housing status, and age were standardized to ensure consistent categorization. The three clinical description fields were condensed into bigrams (two word phrases) to facilitate text analysis and identify complaint patterns associated with super user status. Finally, patient ZIP codes were joined to American Community Survey (ACS) data to add neighborhood context, including racial composition, household poverty rates, and housing cost burden. The result was a clean dataset of 107,553 unique patients.
Super user status is defined using two complementary thresholds, consistent with both published literature and the ART’s program criteria:
A patient flagged by either criterion is classified as a super user
(super_either), which is the single outcome definition used
throughout this analysis.
annual_counts <- data_ordered %>%
filter(!is.na(dispatch_year)) %>%
count(pt_full_name, dispatch_year, name = "calls_in_year") %>%
group_by(pt_full_name) %>%
summarise(
max_calls_any_year = max(calls_in_year, na.rm = TRUE),
calls_2022 = sum(calls_in_year[dispatch_year == 2022], na.rm = TRUE),
calls_2023 = sum(calls_in_year[dispatch_year == 2023], na.rm = TRUE),
calls_2024 = sum(calls_in_year[dispatch_year == 2024], na.rm = TRUE),
.groups = "drop"
)
data_by_person <- data_by_person %>%
left_join(annual_counts, by = "pt_full_name") %>%
mutate(
max_calls_any_year = replace_na(max_calls_any_year, 0L),
calls_2022 = replace_na(calls_2022, 0L),
calls_2023 = replace_na(calls_2023, 0L),
calls_2024 = replace_na(calls_2024, 0L),
super_total15 = if_else(visits_3y >= 15, 1L, 0L),
super_any_year10 = if_else(max_calls_any_year >= 10, 1L, 0L),
super_either = if_else(super_total15 == 1L | super_any_year10 == 1L, 1L, 0L)
)
data_by_person %>%
summarise(
`Total patients` = n(),
`Super users` = sum(super_either, na.rm = TRUE),
`Percent of patients` = scales::percent(mean(super_either, na.rm = TRUE), accuracy = 0.1)
) %>%
knitr::kable()
| Total patients | Super users | Percent of patients |
|---|---|---|
| 107553 | 1283 | 1.2% |
Super users are defined as patients meeting either threshold: ≥ 15 total EMS visits across the observation period, or ≥ 10 visits within any single calendar year. The table below summarises several key characteristics of super users compared to the broader patient population.
data_by_person %>%
mutate(
group = if_else(super_either == 1L, "Super users", "Non-super users"),
white_flag = if_else(race_clean == "White", 1, 0, missing = NA_real_),
black_flag = if_else(race_clean == "Black", 1, 0, missing = NA_real_),
housing_flag = if_else(homeless2 == "Homeless", 1, 0, missing = NA_real_),
public_insurance_flag = as.numeric(public_insurance),
lift_assist_flag = as.numeric(bigram_lift_assist),
sickle_cell_flag = as.numeric(bigram_sickle_cell)
) %>%
group_by(group) %>%
summarise(
`Patients` = n(),
`Median age` = round(median(age, na.rm = TRUE), 1),
`% Black` = round(mean(black_flag, na.rm = TRUE) * 100, 1),
`% White` = round(mean(white_flag, na.rm = TRUE) * 100, 1),
`% Homeless` = round(mean(housing_flag, na.rm = TRUE) * 100, 1),
`% Public insurance` = round(mean(public_insurance_flag, na.rm = TRUE) * 100, 1),
`% Lift assist` = round(mean(lift_assist_flag, na.rm = TRUE) * 100, 1),
`% Sickle cell` = round(mean(sickle_cell_flag, na.rm = TRUE) * 100, 1),
.groups = "drop"
) %>%
gt() %>%
tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
| group | Patients | Median age | % Black | % White | % Homeless | % Public insurance | % Lift assist | % Sickle cell |
|---|---|---|---|---|---|---|---|---|
| Non-super users | 106270 | 52 | 43.3 | 47.3 | 0.8 | 19.8 | 1.5 | 0.1 |
| Super users | 1283 | 63 | 54.3 | 43.6 | 3.4 | 27.4 | 21.8 | 0.8 |
Super users are not evenly distributed across Guilford County. The map below shows concentration in ZIP codes with higher poverty rates and majority-Black populations, consistent with the broader pattern of social vulnerability driving EMS dependence.
Super users are substantially more likely to be Black than non-super users, consistent with the geographic concentration of super users in majority-Black ZIP codes.
Homelessness is substantially elevated among super users, though most super users are not homeless, suggesting that housing instability is a contributing factor but not the primary driver of high frequency EMS use in Guilford County.
Super users skew slightly more male but the gender difference is modest, unlike some urban programs where super users are overwhelmingly male.
Super user status peaks among middle-aged and older adults (50–79) rather than showing the bimodal distribution described in some of the academic literature. This is one indication that high frequency EMS use in Guilford County is driven more by chronic illness, social isolation, and mobility limitations than by behavioral health crises.
Super users are significantly more likely to carry public insurance (Medicare or Medicaid), reflecting both the age profile of the population and its concentration in lower-income households.
The word clouds below show the most common two-word phrases in the
chief complaint, dispatch reason, and impression text fields for super
users (Super user = 1) and non-super users
(Super user = 2. Many of the most common phrases, like
“chest pain”, are prominent among super users and non-super users alike.
Others, like “lift assist,” are far overrepresented among super
users.
“Lift assist” is the single complaint that is most disproportionately associated with calls by super users. It reflects a combination of likely risk factors: mobility limitations, social isolation, age, and unmet social and chronic medical needs.
The full dataset is divided into three non-overlapping periods.
visit_data <- read_csv("model_visit.csv") %>% clean_names()
required_cols <- c(
"pt_full_name", "time_dispatched", "age", "age_group", "gender_clean",
"homeless2", "public_insurance", "zip_code", "dispatch_reason",
"type_of_call", "type_destination", "bigram_lift_assist", "bigram_sickle_cell"
)
missing_cols <- setdiff(required_cols, names(visit_data))
if (length(missing_cols) > 0) {
stop(paste("Missing required columns:", paste(missing_cols, collapse = ", ")))
}
optional_numeric_cols <- c(
"zcta_pct_65_plus", "zcta_pct_black", "zcta_pct_hh_poverty", "zcta_pct_cost_burdened"
)
for (nm in optional_numeric_cols) {
if (!nm %in% names(visit_data)) visit_data[[nm]] <- NA_real_
}
visit_data <- visit_data %>%
mutate(
time_dispatched = coalesce(
suppressWarnings(ymd_hms(time_dispatched, tz = "UTC")),
suppressWarnings(ymd_hm(time_dispatched, tz = "UTC")),
suppressWarnings(mdy_hms(time_dispatched, tz = "UTC")),
suppressWarnings(mdy_hm(time_dispatched, tz = "UTC")),
suppressWarnings(as.POSIXct(time_dispatched, tz = "UTC"))
)
) %>%
filter(!is.na(time_dispatched)) %>%
mutate(dispatch_year = year(time_dispatched)) %>%
arrange(pt_full_name, time_dispatched)
data_start <- min(visit_data$time_dispatched)
data_end <- max(visit_data$time_dispatched)
total_days <- as.numeric(difftime(data_end, data_start, units = "days"))
antecedent_days <- 365L
buffer_days <- 30L
outcome_days <- as.integer(total_days - antecedent_days - buffer_days)
antecedent_end <- data_start + days(antecedent_days)
buffer_end <- antecedent_end + days(buffer_days)
outcome_end <- buffer_end + days(outcome_days)
cat(glue(
"The dataset covers {round(total_days / 365.25, 1)} years of EMS visits ",
"({format(data_start, '%B %d, %Y')} to {format(data_end, '%B %d, %Y')}).\n\n",
"Antecedent window (feature building): ",
"{format(data_start, '%B %d, %Y')} to {format(antecedent_end, '%B %d, %Y')} ",
"({antecedent_days} days)\n",
"Buffer: ",
"{format(antecedent_end, '%B %d, %Y')} to {format(buffer_end, '%B %d, %Y')} ",
"({buffer_days} days)\n",
"Outcome window (super-user label): ",
"{format(buffer_end, '%B %d, %Y')} to {format(outcome_end, '%B %d, %Y')} ",
"({outcome_days} days)\n"
))
## The dataset covers 3 years of EMS visits (January 01, 2023 to December 31, 2025).
##
## Antecedent window (feature building): January 01, 2023 to January 01, 2024 (365 days)
## Buffer: January 01, 2024 to January 31, 2024 (30 days)
## Outcome window (super-user label): January 31, 2024 to December 31, 2025 (700 days)
visit_data <- visit_data %>%
mutate(
visit_window = case_when(
time_dispatched < antecedent_end ~ "antecedent",
time_dispatched < buffer_end ~ "buffer",
time_dispatched <= outcome_end ~ "outcome",
TRUE ~ "after_outcome"
)
)
visit_data %>%
count(visit_window) %>%
filter(visit_window != "after_outcome") %>%
mutate(
`Window` = recode(visit_window,
antecedent = "Antecedent",
buffer = "Buffer",
outcome = "Outcome"),
`Visits` = n,
`Share of total` = scales::percent(n / sum(n), accuracy = 0.1)
) %>%
select(`Window`, `Visits`, `Share of total`) %>%
knitr::kable()
| Window | Visits | Share of total |
|---|---|---|
| Antecedent | 71199 | 32.6% |
| Buffer | 6052 | 2.8% |
| Outcome | 141076 | 64.6% |
For each patient in the antecedent window, we compute a set of visit-history features that serve as predictors: total call volume, how often calls cluster within 7 or 30 days of a prior call, and flags for lift-assist and sickle-cell complaint history. We also compute interaction terms (e.g. “public insurance x patient age,” “lift assist x zip code”).
antecedent_visits <- visit_data %>%
filter(visit_window == "antecedent") %>%
arrange(pt_full_name, time_dispatched) %>%
group_by(pt_full_name) %>%
mutate(
visit_index = row_number(),
prior_visits = visit_index - 1,
prev_visit_time = dplyr::lag(time_dispatched),
days_since_prev_visit = as.numeric(difftime(time_dispatched, prev_visit_time, units = "days")),
repeat_within_7d = if_else(!is.na(days_since_prev_visit) & days_since_prev_visit <= 7, 1L, 0L),
repeat_within_30d = if_else(!is.na(days_since_prev_visit) & days_since_prev_visit <= 30, 1L, 0L),
bigram_lift_assist = replace_na(as.integer(bigram_lift_assist), 0L),
bigram_sickle_cell = replace_na(as.integer(bigram_sickle_cell), 0L),
bigram_lift_assist_count_so_far = cumsum(bigram_lift_assist),
bigram_sickle_cell_count_so_far = cumsum(bigram_sickle_cell),
prior_lift_assist_count = dplyr::lag(bigram_lift_assist_count_so_far, default = 0L),
prior_sickle_cell_count = dplyr::lag(bigram_sickle_cell_count_so_far, default = 0L),
prior_lift_assist_any = if_else(prior_lift_assist_count > 0, 1L, 0L),
prior_sickle_cell_any = if_else(prior_sickle_cell_count > 0, 1L, 0L)
) %>%
ungroup()
patient_features <- antecedent_visits %>%
group_by(pt_full_name) %>%
arrange(time_dispatched, .by_group = TRUE) %>%
summarise(
last_visit_time = max(time_dispatched),
age = last(age),
age_group = last(age_group),
race_clean = last(race_clean),
gender_clean = last(gender_clean),
homeless2 = last(homeless2),
public_insurance = last(public_insurance),
zip_code = last(zip_code),
dispatch_reason = last(dispatch_reason),
type_of_call = last(type_of_call),
type_destination = last(type_destination),
zcta_pct_65_plus = last(zcta_pct_65_plus),
zcta_pct_black = last(zcta_pct_black),
zcta_pct_hh_poverty = last(zcta_pct_hh_poverty),
zcta_pct_cost_burdened = last(zcta_pct_cost_burdened),
total_antecedent_visits = n(),
days_since_prev_visit = last(days_since_prev_visit),
repeat_within_7d_ever = max(repeat_within_7d, na.rm = TRUE),
repeat_within_30d_ever = max(repeat_within_30d, na.rm = TRUE),
repeat_within_7d_count = sum(repeat_within_7d, na.rm = TRUE),
repeat_within_30d_count = sum(repeat_within_30d, na.rm = TRUE),
prior_lift_assist_count = max(bigram_lift_assist_count_so_far, na.rm = TRUE),
prior_sickle_cell_count = max(bigram_sickle_cell_count_so_far, na.rm = TRUE),
prior_lift_assist_any = max(prior_lift_assist_any, na.rm = TRUE),
prior_sickle_cell_any = max(prior_sickle_cell_any, na.rm = TRUE),
.groups = "drop"
)
cat(patient_features %>% nrow(), "unique patients appear in the antecedent window.\n")
## 44369 unique patients appear in the antecedent window.
A patient is labeled as a positive case if they crossed the super-user threshold during the outcome window and had not already crossed it in the antecedent window. This targets patients who are on a trajectory toward high-frequency EMS use and have not yet been identified.
antecedent_totals <- antecedent_visits %>%
group_by(pt_full_name) %>%
summarise(
antecedent_total_visits = n(),
antecedent_max_year_count = max(table(dispatch_year)),
.groups = "drop"
)
outcome_visits_counts <- visit_data %>%
filter(visit_window == "outcome") %>%
group_by(pt_full_name) %>%
summarise(
outcome_total_visits = n(),
outcome_max_year_count = max(table(dispatch_year)),
.groups = "drop"
)
patient_labeled <- patient_features %>%
left_join(antecedent_totals, by = "pt_full_name") %>%
left_join(outcome_visits_counts, by = "pt_full_name") %>%
mutate(
outcome_total_visits = replace_na(outcome_total_visits, 0L),
outcome_max_year_count = replace_na(outcome_max_year_count, 0L),
already_super = as.integer(
antecedent_total_visits >= 15 | antecedent_max_year_count >= 10
),
cumulative_total_visits = antecedent_total_visits + outcome_total_visits,
cumulative_max_year_count = pmax(antecedent_max_year_count, outcome_max_year_count),
became_super_user = as.integer(
already_super == 0 & (cumulative_total_visits >= 15 | cumulative_max_year_count >= 10)
),
became_super_user = factor(became_super_user, levels = c(0, 1), labels = c("no", "yes"))
) %>%
filter(already_super == 0)
patient_labeled %>%
count(became_super_user) %>%
mutate(
Outcome = became_super_user,
Patients = n,
Share = scales::percent(n / sum(n), accuracy = 0.1)
) %>%
select(Outcome, Patients, Share) %>%
knitr::kable()
| Outcome | Patients | Share |
|---|---|---|
| no | 43357 | 98.6% |
| yes | 614 | 1.4% |
We fit two models — penalized logistic regression and random forest — using 5-fold cross-validation to tune hyperparameters. Both models use the same feature set; the logistic regression additionally includes interaction terms between age group, public insurance status, and complaint history, which are expected to have compound effects on risk.
model_df <- patient_labeled %>%
select(
became_super_user, pt_full_name, last_visit_time,
age, age_group, gender_clean, homeless2, public_insurance,
dispatch_reason, type_of_call, type_destination, zip_code,
zcta_pct_65_plus, zcta_pct_black, zcta_pct_hh_poverty, zcta_pct_cost_burdened,
total_antecedent_visits, days_since_prev_visit,
repeat_within_7d_ever, repeat_within_30d_ever,
repeat_within_7d_count, repeat_within_30d_count,
prior_lift_assist_count, prior_sickle_cell_count,
prior_lift_assist_any, prior_sickle_cell_any
)
patient_split <- initial_split(model_df, prop = 0.75, strata = became_super_user)
train_df <- training(patient_split)
test_df <- testing(patient_split)
cat(nrow(train_df), "patients in the training set;", nrow(test_df), "in the held-out test set.\n")
## 32978 patients in the training set; 10993 in the held-out test set.
cv_splits <- vfold_cv(train_df, v = 5, strata = became_super_user)
rec_base <- recipe(became_super_user ~ ., data = train_df) %>%
update_role(pt_full_name, new_role = "id") %>%
update_role(last_visit_time, new_role = "id") %>%
step_mutate(
age_group = as.factor(age_group),
gender_clean = as.factor(gender_clean),
homeless2 = as.factor(homeless2),
public_insurance = as.factor(public_insurance),
dispatch_reason = as.factor(dispatch_reason),
type_of_call = as.factor(type_of_call),
type_destination = as.factor(type_destination),
zip_code = as.factor(zip_code)
) %>%
step_unknown(all_nominal_predictors(), new_level = "missing") %>%
step_novel(all_nominal_predictors()) %>%
step_other(dispatch_reason, threshold = 0.02, other = "other_dispatch") %>%
step_other(type_of_call, threshold = 0.02, other = "other_call") %>%
step_other(type_destination, threshold = 0.02, other = "other_dest") %>%
step_other(zip_code, threshold = 0.02, other = "other_zip") %>%
step_impute_median(all_numeric_predictors()) %>%
step_zv(all_predictors())
rec_log <- rec_base %>%
step_dummy(all_nominal_predictors()) %>%
step_interact(terms = ~ starts_with("age_group"):starts_with("public_insurance")) %>%
step_interact(terms = ~ starts_with("age_group"):starts_with("prior_lift_assist")) %>%
step_interact(terms = ~ starts_with("age_group"):starts_with("prior_sickle_cell")) %>%
step_interact(terms = ~ starts_with("public_insurance"):starts_with("prior_lift_assist")) %>%
step_interact(terms = ~ starts_with("public_insurance"):starts_with("prior_sickle_cell")) %>%
step_normalize(all_numeric_predictors()) %>%
step_zv(all_predictors()) %>%
step_downsample(became_super_user)
rec_rf <- rec_base %>%
step_downsample(became_super_user)
log_spec <- logistic_reg(penalty = tune(), mixture = tune()) %>%
set_engine("glmnet") %>%
set_mode("classification")
rf_spec <- rand_forest(trees = 1500, mtry = tune(), min_n = tune()) %>%
set_engine("ranger", importance = "permutation", probability = TRUE) %>%
set_mode("classification")
log_wf <- workflow() %>% add_recipe(rec_log) %>% add_model(log_spec)
rf_wf <- workflow() %>% add_recipe(rec_rf) %>% add_model(rf_spec)
log_grid <- grid_regular(
penalty(range = c(-8, -2)),
mixture(range = c(0, 0.5)),
levels = c(8, 4)
)
rf_params <- rf_wf %>%
extract_parameter_set_dials() %>%
finalize(train_df) %>%
update(
mtry = mtry(range = c(2L, floor(ncol(train_df) * 0.8))),
min_n = min_n(range = c(5L, 100L))
)
rf_grid <- grid_latin_hypercube(rf_params, size = 25)
cls_metrics <- metric_set(pr_auc, roc_auc)
ctrl <- control_grid(save_pred = TRUE, verbose = TRUE)
log_tuned <- tune_grid(log_wf, resamples = cv_splits, grid = log_grid,
metrics = cls_metrics, control = ctrl)
rf_tuned <- tune_grid(rf_wf, resamples = cv_splits, grid = rf_grid,
metrics = cls_metrics, control = ctrl)
log_best <- select_best(log_tuned, metric = "pr_auc")
rf_best <- select_best(rf_tuned, metric = "pr_auc")
log_final_wf <- finalize_workflow(log_wf, log_best)
rf_final_wf <- finalize_workflow(rf_wf, rf_best)
log_fit_final <- fit(log_final_wf, data = train_df)
rf_fit_final <- fit(rf_final_wf, data = train_df)
Both models are evaluated on the held-out test set. Because super users are rare (just over 1% of patients), we prioritize precision-recall AUC over ROC-AUC, and top-K precision over accuracy.
log_test_preds <- predict(log_fit_final, test_df, type = "prob") %>%
bind_cols(test_df) %>% mutate(model = "log")
rf_test_preds <- predict(rf_fit_final, test_df, type = "prob") %>%
bind_cols(test_df) %>% mutate(model = "rf")
val_preds <- bind_rows(log_test_preds, rf_test_preds) %>%
mutate(model = factor(model, levels = c("log", "rf")))
val_summary <- val_preds %>%
group_by(model) %>%
summarise(
pr_auc = pr_auc_vec(became_super_user, .pred_yes, event_level = "second"),
roc_auc = roc_auc_vec(became_super_user, .pred_yes, event_level = "second"),
brier = mean((as.integer(became_super_user == "yes") - .pred_yes)^2),
.groups = "drop"
)
patient_ranked <- val_preds %>%
group_by(model) %>%
arrange(desc(.pred_yes), .by_group = TRUE) %>%
mutate(patient_rank = row_number()) %>%
ungroup()
topk_summary <- patient_topk_summary(val_preds, top_k_values = c(22, 44, 88, 100, 150, 200))
Cross-validated out-of-fold predictions provide a less optimistic estimate of performance than the test set alone, and are used for threshold analysis below.
log_best_oof <- collect_predictions(log_tuned) %>%
filter(penalty == log_best$penalty, mixture == log_best$mixture) %>%
mutate(model = "log")
rf_best_oof <- collect_predictions(rf_tuned) %>%
filter(mtry == rf_best$mtry, min_n == rf_best$min_n) %>%
mutate(model = "rf")
oof_preds <- bind_rows(log_best_oof, rf_best_oof) %>%
mutate(model = factor(model, levels = c("log", "rf")))
oof_preds %>%
group_by(model) %>%
summarise(
`PR-AUC` = round(pr_auc_vec(became_super_user, .pred_yes, event_level = "second"), 3),
`ROC-AUC` = round(roc_auc_vec(became_super_user, .pred_yes, event_level = "second"), 3),
`Brier score` = round(mean((as.integer(became_super_user == "yes") - .pred_yes)^2), 3),
.groups = "drop"
) %>%
rename(Model = model) %>%
mutate(Model = recode(Model, log = "Logistic regression", rf = "Random forest")) %>%
knitr::kable()
| Model | PR-AUC | ROC-AUC | Brier score |
|---|---|---|---|
| Logistic regression | 0.196 | 0.885 | NA |
| Random forest | 0.153 | 0.892 | 0.127 |
Logistic regression outperforms random forest on PR-AUC and top-K precision at the ART’s operational capacity of 88 patients per year, making it the preferred model. The ROC-AUC scores are nearly identical across both models.
topk_88_precision <- topk_summary %>%
filter(top_k == 88) %>%
select(model, topk_88_precision = topk_precision)
val_summary %>%
left_join(topk_88_precision, by = "model") %>%
mutate(
Model = recode(model, log = "Logistic regression", rf = "Random forest"),
`PR-AUC` = round(pr_auc, 3),
`ROC-AUC` = round(roc_auc, 3),
`Brier score` = round(brier, 3),
`Precision at top 88` = scales::percent(topk_88_precision, accuracy = 0.1)
) %>%
select(Model, `PR-AUC`, `ROC-AUC`, `Brier score`, `Precision at top 88`) %>%
gt() %>%
tab_header(
title = "Model comparison",
subtitle = "PR-AUC and precision at top 88 are the primary criteria given class imbalance."
) %>%
tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels()) %>%
tab_options(table.font.size = 13, heading.title.font.size = 14,
heading.subtitle.font.size = 12,
column_labels.border.bottom.width = px(1),
table_body.border.bottom.width = px(1))
| Model comparison | ||||
| PR-AUC and precision at top 88 are the primary criteria given class imbalance. | ||||
| Model | PR-AUC | ROC-AUC | Brier score | Precision at top 88 |
|---|---|---|---|---|
| Logistic regression | 0.255 | 0.919 | NA | 40.9% |
| Random forest | 0.209 | 0.920 | 0.129 | 35.2% |
best_model_label <- val_summary %>%
slice_max(pr_auc, n = 1, with_ties = FALSE) %>%
pull(model) %>% as.character()
best_fit <- if (best_model_label == "rf") rf_fit_final else log_fit_final
cat("Selected model:", ifelse(best_model_label == "rf", "Random forest", "Logistic regression"), "\n")
## Selected model: Logistic regression
At a risk score threshold of 0.30, the model flags roughly as many patients as the ART can screen in a week. The confusion matrix shows the tradeoff between precision and recall at this operating point.
cm_threshold <- 0.30
best_preds_thresh <- val_preds %>%
filter(as.character(model) == best_model_label) %>%
mutate(
.pred_class = factor(
if_else(.pred_yes >= cm_threshold, "yes", "no"),
levels = c("no", "yes")
)
)
cm <- conf_mat(best_preds_thresh, truth = became_super_user, estimate = .pred_class)
autoplot(cm, type = "heatmap") +
scale_fill_gradient(low = "#F5F5F5", high = "#4A7BB7") +
labs(
title = sprintf("Confusion matrix — %s at threshold %.2f",
ifelse(best_model_label == "rf", "Random forest", "Logistic regression"),
cm_threshold),
subtitle = "Rows = actual label; columns = predicted label"
) +
theme(legend.position = "none")
summary(cm, event_level = "second") %>%
filter(.metric %in% c("accuracy", "sens", "spec", "precision", "f_meas")) %>%
mutate(
Metric = recode(.metric, accuracy = "Accuracy", sens = "Recall",
spec = "Specificity", precision = "Precision", f_meas = "F1 score"),
Value = round(.estimate, 3)
) %>%
select(Metric, Value) %>%
gt() %>%
tab_header(
title = sprintf("Classification metrics at threshold %.2f", cm_threshold),
subtitle = "Held-out test set"
) %>%
tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
| Classification metrics at threshold 0.30 | |
| Held-out test set | |
| Metric | Value |
|---|---|
| Accuracy | 0.717 |
| Recall | 0.928 |
| Specificity | 0.713 |
| Precision | 0.047 |
| F1 score | 0.090 |
The threshold analysis shows how precision and recall trade off across the full range of possible cutoffs. The ART can adjust the operating threshold depending on whether they want to maximize true positives (lower threshold) or minimize screening effort on false positives (higher threshold).
threshold_grid <- tibble(threshold = seq(0.05, 0.80, by = 0.05))
oof_threshold_perf <- threshold_grid %>%
crossing(oof_preds) %>%
mutate(
pred_class_thresh = factor(
if_else(.pred_yes >= threshold, "yes", "no"), levels = c("no", "yes")
)
) %>%
group_by(model, threshold) %>%
summarise(
accuracy = accuracy_vec(became_super_user, pred_class_thresh),
recall = sens_vec(became_super_user, pred_class_thresh, event_level = "second"),
specificity = spec_vec(became_super_user, pred_class_thresh, event_level = "second"),
precision = precision_vec(became_super_user, pred_class_thresh, event_level = "second"),
.groups = "drop"
)
oof_threshold_perf %>%
pivot_longer(cols = c(accuracy, recall, specificity, precision),
names_to = "metric", values_to = "value") %>%
ggplot(aes(x = threshold, y = value, color = model)) +
geom_line(linewidth = 0.8) +
geom_point(size = 1.2, alpha = 0.7) +
geom_vline(xintercept = cm_threshold, linetype = "dotted",
color = "grey40", linewidth = 0.6) +
scale_color_manual(
values = c(log = "#4A7BB7", rf = "#A64E4E"),
labels = c(log = "Logistic regression", rf = "Random forest")
) +
scale_y_continuous(limits = c(0, 1), labels = percent_format(accuracy = 1)) +
facet_wrap(~ metric, ncol = 2) +
labs(
title = "Performance across decision thresholds (cross-validated)",
subtitle = sprintf("Dotted line = operating threshold (%.2f)", cm_threshold),
x = "Risk score threshold", y = NULL, color = NULL
)
Both models substantially overpredict risk across the score range. This is a deliberate consequence of downsampling the majority class (non-super users) during training, which inflates predicted probabilities in order to improve the model’s ability to rank positive cases. Because the ART uses the model output as a ranked priority list rather than as a literal probability estimate, calibration is less important than ranking quality (PR-AUC and top-K precision). The predicted scores should be interpreted as relative risk scores, not as estimates of the true probability of becoming a super user.
val_preds %>%
mutate(prob_bin = ntile(.pred_yes, 10)) %>%
group_by(model, prob_bin) %>%
summarise(
mean_pred = mean(.pred_yes, na.rm = TRUE),
actual_rate = mean(became_super_user == "yes", na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
ggplot(aes(x = mean_pred, y = actual_rate, color = model)) +
geom_abline(linetype = "dashed", color = "grey60", linewidth = 0.5) +
geom_line(linewidth = 0.7, alpha = 0.8) +
geom_point(aes(size = n), alpha = 0.7) +
scale_color_manual(
values = c(log = "#4A7BB7", rf = "#A64E4E"),
labels = c(log = "Logistic regression", rf = "Random forest")
) +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_size_continuous(range = c(1, 4), guide = "none") +
facet_wrap(~ model) +
labs(
title = "Calibration by predicted risk decile",
subtitle = "Points near the dashed diagonal indicate well-calibrated predictions",
x = "Average predicted risk", y = "Observed event rate", color = NULL
)
The model performs consistently across racial groups with no evidence of systematic over- or under-prediction for Black patients relative to white patients, which is an important equity check given the racial concentration of super users in the Guilford County data.
threshold <- 0.30
val_preds_thresh <- val_preds %>%
left_join(patient_labeled %>% select(pt_full_name, race_clean), by = "pt_full_name") %>%
mutate(
.pred_class_thresh = factor(
if_else(.pred_yes >= threshold, "yes", "no"), levels = c("no", "yes")
)
)
safe_metric <- function(fn, truth, estimate, ...) {
tryCatch(fn(truth, estimate, ...), error = function(e) NA_real_)
}
val_preds_thresh %>%
group_by(model, race_clean) %>%
summarise(
Patients = n(),
`True super users` = sum(became_super_user == "yes", na.rm = TRUE),
Flagged = sum(.pred_class_thresh == "yes", na.rm = TRUE),
`Actual rate` = round(mean(became_super_user == "yes", na.rm = TRUE), 3),
`Predicted rate` = round(mean(.pred_class_thresh == "yes", na.rm = TRUE), 3),
Recall = round(safe_metric(sens_vec, became_super_user, .pred_class_thresh, event_level = "second"), 3),
Precision = round(safe_metric(precision_vec, became_super_user, .pred_class_thresh, event_level = "second"), 3),
Specificity = round(safe_metric(spec_vec, became_super_user, .pred_class_thresh, event_level = "second"), 3),
.groups = "drop"
) %>%
rename(Model = model, Race = race_clean) %>%
mutate(Model = recode(Model, log = "Logistic regression", rf = "Random forest")) %>%
gt() %>%
tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
| Model | Race | Patients | True super users | Flagged | Actual rate | Predicted rate | Recall | Precision | Specificity |
|---|---|---|---|---|---|---|---|---|---|
| Logistic regression | American Indian / Alaska Native | 23 | 0 | 7 | 0.000 | 0.304 | NA | 0.000 | 0.696 |
| Logistic regression | Asian | 272 | 0 | 66 | 0.000 | 0.243 | NA | 0.000 | 0.757 |
| Logistic regression | Black | 4763 | 96 | 1483 | 0.020 | 0.311 | 0.927 | 0.060 | 0.701 |
| Logistic regression | Hispanic | 600 | 0 | 81 | 0.000 | 0.135 | NA | 0.000 | 0.865 |
| Logistic regression | Other / Unknown | 10 | 1 | 10 | 0.100 | 1.000 | 1.000 | 0.100 | 0.000 |
| Logistic regression | Pacific Islander | 25 | 0 | 7 | 0.000 | 0.280 | NA | 0.000 | 0.720 |
| Logistic regression | White | 5299 | 69 | 1601 | 0.013 | 0.302 | 0.928 | 0.040 | 0.706 |
| Logistic regression | NA | 1 | 0 | 1 | 0.000 | 1.000 | NA | 0.000 | 0.000 |
| Random forest | American Indian / Alaska Native | 23 | 0 | 4 | 0.000 | 0.174 | NA | 0.000 | 0.826 |
| Random forest | Asian | 272 | 0 | 52 | 0.000 | 0.191 | NA | 0.000 | 0.809 |
| Random forest | Black | 4763 | 96 | 1370 | 0.020 | 0.288 | 0.927 | 0.065 | 0.726 |
| Random forest | Hispanic | 600 | 0 | 80 | 0.000 | 0.133 | NA | 0.000 | 0.867 |
| Random forest | Other / Unknown | 10 | 1 | 8 | 0.100 | 0.800 | 1.000 | 0.125 | 0.222 |
| Random forest | Pacific Islander | 25 | 0 | 7 | 0.000 | 0.280 | NA | 0.000 | 0.720 |
| Random forest | White | 5299 | 69 | 1567 | 0.013 | 0.296 | 0.913 | 0.040 | 0.712 |
| Random forest | NA | 1 | 0 | 0 | 0.000 | 0.000 | NA | NA | 1.000 |
Raw call volume in the antecedent window is the single most powerful predictor; patients who already call frequently are the most likely to continue doing so. When prior call count is excluded, geography (zip code), patient age, and a history of call clustering emerge as the next most important features.
feature_labels <- tribble(
~feature, ~label,
"total_antecedent_visits", "Total EMS calls in prior year",
"age", "Patient age",
"days_since_prev_visit", "Days since last EMS call",
"repeat_within_30d_count", "Calls within 30 days of a prior call (count)",
"repeat_within_7d_count", "Calls within 7 days of a prior call (count)",
"repeat_within_30d_ever", "Ever had two calls within 30 days",
"repeat_within_7d_ever", "Ever had two calls within 7 days",
"prior_lift_assist_count", "Lift-assist complaints (count)",
"prior_sickle_cell_count", "Sickle cell complaints (count)",
"prior_lift_assist_any", "Any prior lift-assist complaint",
"prior_sickle_cell_any", "Any prior sickle cell complaint",
"zcta_pct_65_plus", "ZIP code: % residents 65+",
"zcta_pct_black", "ZIP code: % Black residents",
"zcta_pct_hh_poverty", "ZIP code: % households in poverty",
"zcta_pct_cost_burdened", "ZIP code: % cost-burdened households",
"zip_code_other_zip", "ZIP code",
"age_group_X65.79", "Age group: 65-79",
"age_group_X35.49", "Age group: 35-49",
"gender_clean_Male", "Gender: male",
"dispatch_reason_Traffic.Transportation.Incident", "Dispatch reason: traffic incident",
"dispatch_reason_Unconscious.Fainting.Near.Fainting", "Dispatch reason: unconscious or fainting",
"dispatch_reason_Cardiac.Arrest.Death", "Dispatch reason: cardiac arrest or death"
)
if (best_model_label == "rf") {
imp_df <- extract_fit_engine(rf_fit_final)$variable.importance %>%
enframe(name = "feature", value = "importance")
} else {
imp_df <- tidy(extract_fit_parsnip(log_fit_final)) %>%
filter(term != "(Intercept)") %>%
rename(feature = term, importance = estimate) %>%
mutate(direction = if_else(importance >= 0, "Increases risk", "Decreases risk"),
importance = abs(importance)) %>%
select(feature, importance, direction)
}
imp_df %>%
left_join(feature_labels, by = "feature") %>%
mutate(label = coalesce(label, feature)) %>%
{ if (!"direction" %in% names(.)) mutate(., direction = "Increases risk") else . } %>%
arrange(desc(importance)) %>%
slice_head(n = 10) %>%
mutate(
pct_importance = importance / sum(importance),
pct_label = percent(pct_importance, accuracy = 0.1)
) %>%
arrange(pct_importance) %>%
mutate(label = factor(label, levels = label)) %>%
ggplot(aes(x = pct_importance, y = label, color = direction)) +
geom_point(size = 3.5) +
geom_segment(aes(x = 0, xend = pct_importance, yend = label),
linewidth = 0.5, alpha = 0.3) +
geom_text(aes(label = pct_label), hjust = -0.3, size = 3.2, color = "grey30") +
scale_color_manual(
values = c("Increases risk" = "#A64E4E", "Decreases risk" = "#4A7BB7"),
drop = FALSE
) +
scale_x_continuous(labels = percent_format(accuracy = 1),
expand = expansion(mult = c(0, 0.25))) +
labs(title = "Feature importance: top predictors",
x = "Share of total importance", y = NULL, color = NULL)
val_summary %>%
mutate(
Model = recode(model, log = "Logistic regression", rf = "Random forest"),
`PR-AUC` = round(pr_auc, 3),
`ROC-AUC` = round(roc_auc, 3),
`Brier score` = round(brier, 3)
) %>%
select(Model, `PR-AUC`, `ROC-AUC`, `Brier score`) %>%
gt() %>%
tab_header(
title = "Overall model performance",
subtitle = "Evaluated on the held-out test set (25% of patients)"
) %>%
tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
| Overall model performance | |||
| Evaluated on the held-out test set (25% of patients) | |||
| Model | PR-AUC | ROC-AUC | Brier score |
|---|---|---|---|
| Logistic regression | 0.255 | 0.919 | NA |
| Random forest | 0.209 | 0.920 | 0.129 |
topk_summary %>%
filter(top_k %in% c(22, 88, 100, 200)) %>%
mutate(
Model = recode(model, log = "Logistic regression", rf = "Random forest"),
`Patients flagged` = flagged_patients,
`True positives` = true_future_high_users,
Precision = scales::percent(topk_precision, accuracy = 0.1),
`Capture rate` = scales::percent(topk_capture, accuracy = 0.1),
`All positives` = total_future_yes
) %>%
select(Model, `Patients flagged`, `True positives`, Precision, `Capture rate`, `All positives`) %>%
gt() %>%
tab_header(
title = "Model performance at operational capacity thresholds",
subtitle = "22 = current simultaneous case load; 88 = annual capacity; 200 = extended outreach"
) %>%
tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
| Model performance at operational capacity thresholds | |||||
| 22 = current simultaneous case load; 88 = annual capacity; 200 = extended outreach | |||||
| Model | Patients flagged | True positives | Precision | Capture rate | All positives |
|---|---|---|---|---|---|
| Logistic regression | 22 | 9 | 40.9% | 5.4% | 166 |
| Logistic regression | 88 | 36 | 40.9% | 21.7% | 166 |
| Logistic regression | 100 | 40 | 40.0% | 24.1% | 166 |
| Logistic regression | 200 | 64 | 32.0% | 38.6% | 166 |
| Random forest | 22 | 4 | 18.2% | 2.4% | 166 |
| Random forest | 88 | 31 | 35.2% | 18.7% | 166 |
| Random forest | 100 | 36 | 36.0% | 21.7% | 166 |
| Random forest | 200 | 59 | 29.5% | 35.5% | 166 |
At the ART’s current annual capacity of 88 patients, the logistic regression model identifies a meaningfully higher share of true future super users than the random forest and substantially outperforms the current ad hoc nomination system (estimated at roughly 20% precision). At increased capacity levels (i.e. if the Emergency Services Department hires additional ART case workers), the model’s advantage increases because it can flag more at-risk individuals than individual paramedics and EMTs can identify on an ad hoc basis.
workers_current <- 2L
caseload_per_sw <- 11L
case_duration_mo <- 3L
months_per_year <- 12L
patients_per_sw_year <- caseload_per_sw * (months_per_year / case_duration_mo)
sw_annual_salary <- 37426
cost_per_call <- 1500
benchmarks <- tibble(
program = c("Onslow County", "San Diego", "Wausau, WI"),
patients_served = c(213, 51, 13),
reduction_rate = c(2/3, 1/3, 2/3),
total_savings = c(1200000, 314000, 100000),
savings_per_pt = total_savings / patients_served
)
outcome_call_volume <- val_preds %>%
filter(as.character(model) == best_model_label) %>%
left_join(
outcome_visits_counts %>% rename(outcome_calls = outcome_total_visits),
by = "pt_full_name"
) %>%
mutate(outcome_calls = replace_na(outcome_calls, 0L)) %>%
arrange(desc(.pred_yes)) %>%
mutate(patient_rank = row_number())
# Use the full dataset super user count as the cap, not just those
# who appear in the modeling test set.
max_superusers <- sum(data_by_person$super_either, na.rm = TRUE)
cohort_size <- patients_per_sw_year
cohort_avg_calls <- outcome_call_volume %>%
filter(patient_rank <= max_superusers) %>%
mutate(cohort = ceiling(patient_rank / cohort_size)) %>%
group_by(cohort) %>%
summarise(
cohort_start = min(patient_rank),
cohort_end = max(patient_rank),
n_patients = n(),
avg_calls_in_window = mean(outcome_calls, na.rm = TRUE),
avg_calls_annual = avg_calls_in_window * (365 / outcome_days),
.groups = "drop"
)
benchmark_calls_per_pt <- c("Onslow County" = 20, "San Diego" = 18, "Wausau, WI" = 22)
benchmarks <- benchmarks %>%
mutate(
calls_per_pt_yr = benchmark_calls_per_pt[program],
calls_averted_total = patients_served * calls_per_pt_yr * reduction_rate,
implied_cost_per_call = total_savings / calls_averted_total
)
implied_cost_per_call_wtd <- weighted.mean(
benchmarks$implied_cost_per_call, w = benchmarks$patients_served
)
The table below summarizes outcomes from three comparable community paramedicine programs. These programs serve as benchmarks for what Guilford County’s ART might achieve at scale. Call reduction estimates range from one-third (San Diego) to two-thirds (Onslow County, Wausau), and per-patient savings range from roughly $5,600 to $7,700. The Guilford County analysis uses a conservative $1,500 per-call cost estimate and models three reduction scenarios bracketed by these benchmarks.
| Reference programs: comparable community paramedicine interventions | |||||||
| Patient-weighted average implied cost per call: $538. Guilford County analysis uses $1500 per call. | |||||||
| Program | Patients | Call reduction | Calls per patient per year | Calls averted | Total savings | Implied cost per call | Savings per patient |
|---|---|---|---|---|---|---|---|
| Onslow County | 213 | 67% | 20 | 2840 | $1,200,000 | $423 | $5,634 |
| San Diego | 51 | 33% | 18 | 306 | $314,000 | $1,026 | $6,157 |
| Wausau, WI | 13 | 67% | 22 | 191 | $100,000 | $524 | $7,692 |
The table below projects gross savings, staffing costs, and net savings across staffing levels under the mid-range scenario (50% call reduction). The analysis assumes each case worker carries 11 simultaneous cases on a 3-month rotation, for an annual caseload of 44 patients per worker. Gross savings are computed from the actual outcome-window call volumes of the top-ranked patients identified by the model, so the savings estimates reflect the real expected call volume of the patients the ART would serve, not a uniform average. The marginal net savings column shows the incremental return from each additional hire; if this figure is negative, it indicates that the cost of the new hire exceeds what the program can be expected to save. At a call reduction of 50%, the model suggests the program remains consistently cost-positive up to approximately 15 social workers.
reduction_scenarios <- c(low = 1/3, mid = 1/2, high = 2/3)
max_workers <- ceiling(max_superusers / patients_per_sw_year) + 3L
cba_table <- crossing(
n_workers = seq(workers_current, max_workers),
scenario = names(reduction_scenarios)
) %>%
mutate(
reduction_rate = reduction_scenarios[scenario],
patients_per_year = pmin(n_workers * patients_per_sw_year, max_superusers),
staffing_cost = n_workers * sw_annual_salary
) %>%
rowwise() %>%
mutate(
gross_savings = {
n_full <- floor(patients_per_year / cohort_size)
n_remain <- patients_per_year %% cohort_size
full_s <- cohort_avg_calls %>%
filter(cohort <= n_full) %>%
summarise(s = sum(n_patients * avg_calls_annual * reduction_rate * cost_per_call)) %>%
pull(s)
part_s <- 0
if (n_remain > 0 && n_full < nrow(cohort_avg_calls)) {
nc <- cohort_avg_calls %>% filter(cohort == n_full + 1) %>% pull(avg_calls_annual)
if (length(nc) > 0) part_s <- n_remain * nc * reduction_rate * cost_per_call
}
full_s + part_s
},
net_savings = gross_savings - staffing_cost
) %>%
ungroup() %>%
group_by(scenario) %>%
arrange(n_workers, .by_group = TRUE) %>%
mutate(marginal_net_savings = net_savings - dplyr::lag(net_savings, default = NA_real_)) %>%
ungroup()
marginal_threshold <- cba_table %>%
filter(!is.na(marginal_net_savings)) %>%
group_by(scenario) %>%
summarise(
last_worthwhile_hire = if_else(
any(marginal_net_savings > 0, na.rm = TRUE),
max(n_workers[marginal_net_savings > 0], na.rm = TRUE),
NA_integer_
),
.groups = "drop"
)
cba_table %>%
filter(scenario == "mid") %>%
mutate(
Workers = n_workers,
`Patients per year` = patients_per_year,
`Calls averted` = round(gross_savings / (cost_per_call * reduction_rate)),
`Gross savings` = scales::dollar(gross_savings, accuracy = 1000),
`Staffing cost` = scales::dollar(staffing_cost, accuracy = 1000),
`Net savings` = scales::dollar(net_savings, accuracy = 1000),
`Marginal net savings` = if_else(is.na(marginal_net_savings), "—",
scales::dollar(marginal_net_savings, accuracy = 1000))
) %>%
select(Workers, `Patients per year`, `Calls averted`,
`Gross savings`, `Staffing cost`, `Net savings`, `Marginal net savings`) %>%
gt() %>%
tab_header(
title = "Projected costs and savings — mid scenario (50% call reduction)",
subtitle = paste0(
"$", format(cost_per_call, big.mark = ","), " per EMS call | ",
"$", format(sw_annual_salary, big.mark = ","), " annual salary | ",
caseload_per_sw, " patients per worker | ",
case_duration_mo, "-month case duration"
)
) %>%
tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels()) %>%
tab_style(
style = cell_fill(color = "#EEF4FB"),
locations = cells_body(rows = Workers == workers_current)
)
| Projected costs and savings — mid scenario (50% call reduction) | ||||||
| $1,500 per EMS call | $37,426 annual salary | 11 patients per worker | 3-month case duration | ||||||
| Workers | Patients per year | Calls averted | Gross savings | Staffing cost | Net savings | Marginal net savings |
|---|---|---|---|---|---|---|
| 2 | 88 | 309 | $232,000 | $75,000 | $157,000 | — |
| 3 | 132 | 408 | $306,000 | $112,000 | $194,000 | $37,000 |
| 4 | 176 | 512 | $384,000 | $150,000 | $234,000 | $40,000 |
| 5 | 220 | 660 | $495,000 | $187,000 | $308,000 | $74,000 |
| 6 | 264 | 781 | $586,000 | $225,000 | $361,000 | $53,000 |
| 7 | 308 | 869 | $652,000 | $262,000 | $390,000 | $29,000 |
| 8 | 352 | 985 | $739,000 | $299,000 | $439,000 | $49,000 |
| 9 | 396 | 1074 | $805,000 | $337,000 | $468,000 | $29,000 |
| 10 | 440 | 1138 | $853,000 | $374,000 | $479,000 | $11,000 |
| 11 | 484 | 1206 | $905,000 | $412,000 | $493,000 | $14,000 |
| 12 | 528 | 1292 | $969,000 | $449,000 | $520,000 | $27,000 |
| 13 | 572 | 1382 | $1,036,000 | $487,000 | $550,000 | $30,000 |
| 14 | 616 | 1463 | $1,097,000 | $524,000 | $573,000 | $23,000 |
| 15 | 660 | 1508 | $1,131,000 | $561,000 | $570,000 | -$3,000 |
| 16 | 704 | 1611 | $1,208,000 | $599,000 | $609,000 | $40,000 |
| 17 | 748 | 1662 | $1,247,000 | $636,000 | $610,000 | $1,000 |
| 18 | 792 | 1726 | $1,295,000 | $674,000 | $621,000 | $11,000 |
| 19 | 836 | 1777 | $1,332,000 | $711,000 | $621,000 | $0 |
| 20 | 880 | 1805 | $1,354,000 | $749,000 | $605,000 | -$16,000 |
| 21 | 924 | 1848 | $1,386,000 | $786,000 | $600,000 | -$5,000 |
| 22 | 968 | 1894 | $1,420,000 | $823,000 | $597,000 | -$3,000 |
| 23 | 1012 | 1946 | $1,460,000 | $861,000 | $599,000 | $2,000 |
| 24 | 1056 | 1987 | $1,490,000 | $898,000 | $592,000 | -$7,000 |
| 25 | 1100 | 2026 | $1,520,000 | $936,000 | $584,000 | -$8,000 |
| 26 | 1144 | 2067 | $1,551,000 | $973,000 | $578,000 | -$7,000 |
| 27 | 1188 | 2115 | $1,586,000 | $1,011,000 | $576,000 | -$2,000 |
| 28 | 1232 | 2166 | $1,625,000 | $1,048,000 | $577,000 | $1,000 |
| 29 | 1276 | 2196 | $1,647,000 | $1,085,000 | $561,000 | -$15,000 |
| 30 | 1283 | 2204 | $1,653,000 | $1,123,000 | $530,000 | -$32,000 |
| 31 | 1283 | 2204 | $1,653,000 | $1,160,000 | $492,000 | -$37,000 |
| 32 | 1283 | 2204 | $1,653,000 | $1,198,000 | $455,000 | -$37,000 |
| 33 | 1283 | 2204 | $1,653,000 | $1,235,000 | $418,000 | -$37,000 |
At the current staffing level of two social workers, the program generates meaningful positive net savings under all three reduction scenarios. Net savings grow with additional case worker hire until 15.
ggplot(cba_table, aes(x = n_workers, y = net_savings, color = scenario, group = scenario)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.5) +
geom_line(linewidth = 0.9) +
geom_point(size = 2.5) +
geom_vline(
data = marginal_threshold %>% filter(!is.na(last_worthwhile_hire)),
aes(xintercept = last_worthwhile_hire, color = scenario),
linetype = "dotted", linewidth = 0.7, show.legend = FALSE
) +
scale_x_continuous(breaks = seq(workers_current, max_workers, 1)) +
scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "k", accuracy = 1)) +
scale_color_manual(
values = c(low = "#B5C8E0", mid = "#4A7BB7", high = "#2A4F7A"),
labels = c(low = "Conservative (1/3 reduction)",
mid = "Mid-range (1/2 reduction)",
high = "Optimistic (2/3 reduction)")
) +
labs(
title = "Projected net savings by staffing level",
subtitle = paste0("Capped at ", max_superusers,
" total super users. Dotted lines = marginal utility threshold."),
x = "Number of social workers", y = "Annual net savings", color = "Reduction scenario"
)
The marginal savings chart shows the incremental return from each additional hire. Each successive hire serves a lower-risk cohort of patients (because the model has already prioritized the highest-risk cases), so marginal savings decline with each hire, though they remain positive well beyond the current two-worker team
cba_table %>%
filter(!is.na(marginal_net_savings)) %>%
ggplot(aes(x = n_workers, y = marginal_net_savings, color = scenario, group = scenario)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.5) +
geom_line(linewidth = 0.9) +
geom_point(size = 2.5) +
scale_x_continuous(breaks = seq(workers_current + 1, max_workers, 1)) +
scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "k", accuracy = 1)) +
scale_color_manual(
values = c(low = "#B5C8E0", mid = "#4A7BB7", high = "#2A4F7A"),
labels = c(low = "Conservative (1/3 reduction)",
mid = "Mid-range (1/2 reduction)",
high = "Optimistic (2/3 reduction)")
) +
labs(
title = "Marginal net savings per additional case worker hired",
subtitle = "Lines crossing zero mark the point where the next hire no longer pays for itself",
x = "Total social workers on staff",
y = "Incremental annual net savings", color = "Reduction scenario"
)
This project demonstrates that a machine learning model can substantially improve the Guilford County ART’s ability to identify patients at risk of becoming high-frequency EMS users before they reach crisis-level utilization. The key takeaways are:
The logistic regression model achieves a PR-AUC of 0.255 and identifies true future super users at roughly 40% precision when the top 88 patients are flagged, more than double the estimated precision of the current ad hoc referral system. In other words, at the ART’s current annual capacity of 88 clients per year, the model identifies four in ten of all future super users in the county.
The Guilford County super user population is distinctive. Unlike the academic literature on super users suggests, Guilford County’s super users are not primarily driven by substance use or behavioral health crises. The dominant pattern is physical frailty and mobility impairment — “lift assist” is the single most diagnostically distinctive complaint — combined with social vulnerability (poverty, public insurance, majority-Black ZIP codes). Interventions calibrated to this profile will likely look different from programs designed for younger populations with high prevalence of substance-use.
The program is cost effective and should grow. At the current two-worker staffing level, the ART already generates substantial net savings under all three call-reduction scenarios. The marginal utility analysis suggests the program remains cost-positive with up to approximately 15 social workers, representing a roughly fourfold expansion from current capacity. The model enables this growth by scaling referral precision as caseload increases.
Equity monitoring is essential. The model shows consistent performance across racial groups in the held-out test set, but this should be monitored prospectively as the model is deployed. False-positive rates, which translate to screening burden on patients who are not enrolled, should be tracked by race and reviewed regularly.
Next steps for deployment:
CARES: A Community-wide Collaboration Identifies Super-utilizers and Reduces Their 9-1-1 Call, Emergency Department, and Hospital Visit Rates: Prehospital Emergency Care: Vol 21 , No 6 (June 2017). https://doi.org/10.1080/10903127.2017.1335820
Sanko, S. G., & Eckstein, M. (2013). Characteristics of the Most Frequent “Super-Users” of Emergency Medical Services. Annals of Emergency Medicine, 62(4), S145. https://doi.org/10.1016/j.annemergmed.2013.07.230
Community Paramedicine Models for Reducing Use of Emergency Resources—RHIhub Toolkit. (n.d.). Retrieved May 7, 2026, from https://www.ruralhealthinfo.org/toolkits/community-paramedicine/2/reducing-use-of-emergency-resources
Hall, M. K., Raven, M. C., Hall, J., Yeh, C., Allen, E., Rodriguez, R. M., Tangherlini, N. L., Sporer, K. A., & Brown, J. F. (2015). EMS-STARS: Emergency Medical Services “Superuser” Transport Associations: An Adult Retrospective Study. Prehospital Emergency Care, 19(1), 61–67. https://doi.org/10.3109/10903127.2014.936630
LaCalle, E., & Rabin, E. (2010). Frequent users of emergency departments: The myths, the data, and the policy implications. Annals of Emergency Medicine, 56(1), 42–48. https://doi.org/10.1016/j.annemergmed.2010.01.032
MacKinney, A. C., & Stamy, C. (2021). Serving High Need/High Cost Patients in the Emergency Department.
Agarwal, G., Lee, J., McLeod, B., Mahmuda, S., Howard, M., Cockrell, K., & Angeles, R. (2019). Social factors in frequent callers: A description of isolation, poverty and quality of life in those calling emergency medical services frequently. BMC Public Health, 19(1), 684. https://doi.org/10.1186/s12889-019-6964-1
The HOME Team: Evaluating the Effect of an EMS-based Outreach Team to Decrease the Frequency of 911 Use Among High Utilizers of EMS | Prehospital and Disaster Medicine | Cambridge Core. (n.d.). Retrieved May 7, 2026, from https://www.cambridge.org/core/journals/prehospital-and-disaster-medicine/article/abs/home-team-evaluating-the-effect-of-an-emsbased-outreach-team-to-decrease-the-frequency-of-911-use-among-high-utilizers-of-ems