Team

Clients: Guilford County Integrated Data & Services (Alice Mahood, Director of Integrated Data & Services and Shelbi Flynn, Integrated Data and Services Program Manager); Guilford County Emergency Services (Jeff Hutchens, Emergency Services Deputy Director)

Instructors: Michael Fichman and Matthew Harris

Team: Leila Bagenstos, Lauren Choi, and Chloe Robinson


Executive Summary

In Guilford County, as in jurisdictions across the United States, a small number of residents account for a disproportionate number of calls to EMS. These residents, known as “super users,” often have complex, co-occurring needs: chronic illness, mobility limitations, housing instability, and limited access to primary care. For many of them, EMS represents the last accessible avenue to seek medical care and social support. But Guilford County paramedics and EMTs, who respond to approximately 85,000 calls per year, face significant time and capacity constraints, and they are not fully equipped to address the underlying social and medical conditions driving repeat calls.

The Adult Resource Team (ART) was created in 2023 to fill that gap, connecting high-frequency, non-emergency EMS users to social services and wraparound support. But the ART’s current case management process depends on ad hoc referrals from first responders, a system that is imprecise and difficult to scale.

This report documents the development and evaluation of a predictive risk model to better identify patients at high risk of becoming “super users.” Using three years of anonymized EMS dispatch records covering roughly 108,000 patients, we built a predictive model that scores patients by their likelihood of becoming super users in the following year, before they reach crisis-level EMS use. The model was trained on visit history, clinical complaint patterns, demographics, and neighborhood-level socioeconomic indicators, and evaluated against a held-out test set using metrics matched to the ART’s real operational constraints.

The selected model, a penalized logistic regression, achieves a precision-recall AUC of 0.255 and identifies true future super users at roughly 40% precision when the top 88 patients are flagged. Put differently: at the ART’s current annual capacity of 88 clients, the model identifies approximately four in ten of all future super users in Guilford County. That is more than double the estimated precision of the current ad hoc referral system.

Finally, a cost-benefit analysis demonstrates that model-guided outreach is cost-positive across a wide range of call-reduction assumptions, and that the program’s return on investment grows substantially as ART staffing expands beyond its current two-worker baseline.

Key findings are summarized in the Cost-Benefit Analysis and Conclusion sections.


Introduction

High-frequency EMS users, commonly referred to as “super users,” represent a small share of the patient population but account for a disproportionate share of emergency call volume. In Guilford County, as in many jurisdictions, these patients often have complex, co-occurring social and medical needs, such as housing instability, chronic illness, behavioral health conditions, and mobility impairments, that make emergency services an accessible but sub-optimal primary care pathway.

The Guilford County Adult Resource Team (ART) was established to provide coordinated social services case management to high-frequency EMS users. ART case workers conduct home visits, connect patients to community resources, and work to reduce avoidable emergency services use over a typical three-month engagement period. With a current staff of two social workers, the ART can serve approximately 88 patients per year.

The primary challenge this project addresses is identification: the current referral system relies on ad hoc referrals from EMS personnel, which is both capacity-constrained and imprecise. A data-driven model that can score patients by their predicted likelihood of future super-user status would allow the ART to prioritize outreach toward those most likely to benefit and could scale naturally as the program grows.

Prediction design. Rather than predicting a rolling window (which produces artificially low risk scores for patients who appear near the end of the data), we divide the full three-year dataset into three fixed temporal segments: an antecedent window (Year 1) in which patient history is used to build features; a 30-day buffer that prevents feature-to-outcome leakage; and an outcome window (approximately Year 3, ~12 months) in which super-user status is evaluated.

Note on race as a predictor. Individual patient race is intentionally excluded from the model’s input features for client liability reasons. Area-level demographic variables (ZCTA-level percent Black residents, percent in poverty, etc.) are retained as proxies for neighborhood-level social determinants of health. Race is retained in the dataset only to accurately analyze potential disparities in model accuracy and precision.


Literature Review

High-frequency EMS users have been studied extensively in the public health and emergency medicine literature. Key findings from this literature are described below. Guilford County’s “super users” share important features in common with high-frequency EMS users elsewhere, though there are also meaningful points of divergence, noted below.

Who are super users? Studies consistently find that a small share of EMS patients accounts for a disproportionate share of call volume; roughly 5–10% of patients generate 25–40% of calls (Kiser et al., 2019; Ruger et al., 2014). The literature also emphasizes the heterogeneity of this population: frequent ED and EMS users are not a monolithic group, and effective interventions depend on understanding the mix of medical, social, and behavioral needs in a given community (LaCalle & Rabin, 2010).

Bimodal age distribution. Several studies identify a bimodal age distribution among EMS super users, with peaks among younger adults (18–34) and older adults (65+). The younger peak is often associated with behavioral health crises and substance use; the older peak with chronic illness, falls, and mobility limitations. In Guilford County, however, we do not observe a strong bimodal pattern in the dispatch data. Super user status is most concentrated in middle-aged and older adults, with less elevation among the 18–34 age group than the literature would suggest.

Substance use and behavioral health. Multiple studies identify alcohol use and behavioral health crises as drivers of high-frequency EMS use, particularly in urban settings. Los Angeles Fire Department data found that the top 40 EMS super users were predominantly male, homeless, and involved alcohol use (Capp et al., 2016). San Francisco data shows a similar pattern (Weiss et al., 2021). This does not appear to hold in Guilford County: our analysis finds little association between substance use or intoxication complaints and super-user status. The most diagnostically distinctive complaint pattern is “lift assist,” which reflects mobility impairment and physical frailty rather than behavioral crises.

Social determinants. A Canadian study of frequent EMS callers found that loneliness, poverty, and poor quality of life, rather than acute medical need, were the primary drivers of frequent use (Booker et al., 2019). Guilford County data is consistent with this framing: super users are concentrated in ZIP codes with elevated poverty rates and in majority-Black neighborhoods, and the lift-assist complaint pattern suggests patients relying on EMS for basic mobility support rather than emergency care.

Effectiveness of case management interventions. Programs combining proactive outreach, care coordination, and connection to community resources have demonstrated meaningful reductions in EMS utilization. Reference benchmarks from comparable programs include:

  • Onslow County, NC: 213 patients served; approximately 2/3 reduction in EMS call volume; $1.2M in estimated savings (~$5,600 per patient)
  • San Diego “TREAT” Program: 51 patients; approximately 1/3 reduction; $314K in savings (~$6,200 per patient)
  • Wausau, WI: 13 patients; approximately 2/3 reduction; $100K in savings (~$7,700 per patient)
  • Colorado Springs CARES Program: A community-wide collaboration that reduced 911 call, emergency department, and hospital visit rates among identified super users through coordinated case management (Kiser et al., 2017)
  • San Francisco HOME Team: An EMS-based outreach team that demonstrated super users respond well to intervention by specially trained paramedics, successfully redirecting them to more appropriate services (Nevin & Sohoni, 2016)

These benchmarks inform the Cost-Benefit Analysis below.

Predictive modeling. Several jurisdictions have applied machine learning to EMS super-user prediction, with logistic regression and random forests being the most common approaches. Both are implemented here; the Model Selection section describes the rationale for the final choice.


Data Preparation

Guilford County provided us with three years of anonymized EMS dispatch data, spanning 2023 to 2025. The initial dataset included nearly 220,000 dispatches. Variables in the dataset included:

  1. Use patterns: Time of call, total number of EMS calls associated with each patient, and calls within past 48 hours.
  2. Clinical descriptions: Three fields – dispatch reason, chief complaint, and impressions – detailing patients’ reasons for calling EMS, from the perspective of the dispatcher, patient, and first responder respectively.
  3. Demographics: Race, age, and gender.
  4. Socioeconomic and health access indicators: Insurance and homelessness status.
  5. Geography: Zip codes.

During the initial data cleaning process, this dispatch-level dataset was transformed into a person-level one, with one row per patient and summary tables derived from their full call history. A public insurance flag was created from the insurance name field, identifying patients covered by Medicare or Medicaid. Demographic labels for race, gender, housing status, and age were standardized to ensure consistent categorization. The three clinical description fields were condensed into bigrams (two word phrases) to facilitate text analysis and identify complaint patterns associated with super user status. Finally, patient ZIP codes were joined to American Community Survey (ACS) data to add neighborhood context, including racial composition, household poverty rates, and housing cost burden. The result was a clean dataset of 107,553 unique patients.

Super User Flag Derivation

Super user status is defined using two complementary thresholds, consistent with both published literature and the ART’s program criteria:

  • Total visits threshold: ≥ 15 EMS visits across the full three-year observation window
  • Annual visits threshold: ≥ 10 EMS visits within any single calendar year

A patient flagged by either criterion is classified as a super user (super_either), which is the single outcome definition used throughout this analysis.

annual_counts <- data_ordered %>%
  filter(!is.na(dispatch_year)) %>%
  count(pt_full_name, dispatch_year, name = "calls_in_year") %>%
  group_by(pt_full_name) %>%
  summarise(
    max_calls_any_year = max(calls_in_year, na.rm = TRUE),
    calls_2022 = sum(calls_in_year[dispatch_year == 2022], na.rm = TRUE),
    calls_2023 = sum(calls_in_year[dispatch_year == 2023], na.rm = TRUE),
    calls_2024 = sum(calls_in_year[dispatch_year == 2024], na.rm = TRUE),
    .groups = "drop"
  )

data_by_person <- data_by_person %>%
  left_join(annual_counts, by = "pt_full_name") %>%
  mutate(
    max_calls_any_year = replace_na(max_calls_any_year, 0L),
    calls_2022         = replace_na(calls_2022, 0L),
    calls_2023         = replace_na(calls_2023, 0L),
    calls_2024         = replace_na(calls_2024, 0L),
    super_total15      = if_else(visits_3y >= 15,          1L, 0L),
    super_any_year10   = if_else(max_calls_any_year >= 10, 1L, 0L),
    super_either       = if_else(super_total15 == 1L | super_any_year10 == 1L, 1L, 0L)
  )

data_by_person %>%
  summarise(
    `Total patients`      = n(),
    `Super users`         = sum(super_either, na.rm = TRUE),
    `Percent of patients` = scales::percent(mean(super_either, na.rm = TRUE), accuracy = 0.1)
  ) %>%
  knitr::kable()
Total patients Super users Percent of patients
107553 1283 1.2%

Exploratory Analysis

Super User Overview

Super users are defined as patients meeting either threshold: ≥ 15 total EMS visits across the observation period, or ≥ 10 visits within any single calendar year. The table below summarises several key characteristics of super users compared to the broader patient population.

data_by_person %>%
  mutate(
    group                 = if_else(super_either == 1L, "Super users", "Non-super users"),
    white_flag            = if_else(race_clean == "White",    1, 0, missing = NA_real_),
    black_flag            = if_else(race_clean == "Black",    1, 0, missing = NA_real_),
    housing_flag          = if_else(homeless2  == "Homeless", 1, 0, missing = NA_real_),
    public_insurance_flag = as.numeric(public_insurance),
    lift_assist_flag      = as.numeric(bigram_lift_assist),
    sickle_cell_flag      = as.numeric(bigram_sickle_cell)
  ) %>%
  group_by(group) %>%
  summarise(
    `Patients`           = n(),
    `Median age`         = round(median(age, na.rm = TRUE), 1),
    `% Black`            = round(mean(black_flag,            na.rm = TRUE) * 100, 1),
    `% White`            = round(mean(white_flag,            na.rm = TRUE) * 100, 1),
    `% Homeless`         = round(mean(housing_flag,          na.rm = TRUE) * 100, 1),
    `% Public insurance` = round(mean(public_insurance_flag, na.rm = TRUE) * 100, 1),
    `% Lift assist`      = round(mean(lift_assist_flag,      na.rm = TRUE) * 100, 1),
    `% Sickle cell`      = round(mean(sickle_cell_flag,      na.rm = TRUE) * 100, 1),
    .groups = "drop"
  ) %>%
  gt() %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
group Patients Median age % Black % White % Homeless % Public insurance % Lift assist % Sickle cell
Non-super users 106270 52 43.3 47.3 0.8 19.8 1.5 0.1
Super users 1283 63 54.3 43.6 3.4 27.4 21.8 0.8

Super Users by ZIP Code

Super users are not evenly distributed across Guilford County. The map below shows concentration in ZIP codes with higher poverty rates and majority-Black populations, consistent with the broader pattern of social vulnerability driving EMS dependence.

Demographic Distributions

Super users are substantially more likely to be Black than non-super users, consistent with the geographic concentration of super users in majority-Black ZIP codes.

Homelessness is substantially elevated among super users, though most super users are not homeless, suggesting that housing instability is a contributing factor but not the primary driver of high frequency EMS use in Guilford County.

Super users skew slightly more male but the gender difference is modest, unlike some urban programs where super users are overwhelmingly male.

Super user status peaks among middle-aged and older adults (50–79) rather than showing the bimodal distribution described in some of the academic literature. This is one indication that high frequency EMS use in Guilford County is driven more by chronic illness, social isolation, and mobility limitations than by behavioral health crises.

Super users are significantly more likely to carry public insurance (Medicare or Medicaid), reflecting both the age profile of the population and its concentration in lower-income households.

Clinical Signal Word Clouds

The word clouds below show the most common two-word phrases in the chief complaint, dispatch reason, and impression text fields for super users (Super user = 1) and non-super users (Super user = 2. Many of the most common phrases, like “chest pain”, are prominent among super users and non-super users alike. Others, like “lift assist,” are far overrepresented among super users.

“Lift assist” is the single complaint that is most disproportionately associated with calls by super users. It reflects a combination of likely risk factors: mobility limitations, social isolation, age, and unmet social and chronic medical needs.


Modeling

Temporal Window Definition

The full dataset is divided into three non-overlapping periods.

visit_data <- read_csv("model_visit.csv") %>% clean_names()

required_cols <- c(
  "pt_full_name", "time_dispatched", "age", "age_group", "gender_clean",
  "homeless2", "public_insurance", "zip_code", "dispatch_reason",
  "type_of_call", "type_destination", "bigram_lift_assist", "bigram_sickle_cell"
)

missing_cols <- setdiff(required_cols, names(visit_data))
if (length(missing_cols) > 0) {
  stop(paste("Missing required columns:", paste(missing_cols, collapse = ", ")))
}

optional_numeric_cols <- c(
  "zcta_pct_65_plus", "zcta_pct_black", "zcta_pct_hh_poverty", "zcta_pct_cost_burdened"
)
for (nm in optional_numeric_cols) {
  if (!nm %in% names(visit_data)) visit_data[[nm]] <- NA_real_
}

visit_data <- visit_data %>%
  mutate(
    time_dispatched = coalesce(
      suppressWarnings(ymd_hms(time_dispatched, tz = "UTC")),
      suppressWarnings(ymd_hm(time_dispatched,  tz = "UTC")),
      suppressWarnings(mdy_hms(time_dispatched, tz = "UTC")),
      suppressWarnings(mdy_hm(time_dispatched,  tz = "UTC")),
      suppressWarnings(as.POSIXct(time_dispatched, tz = "UTC"))
    )
  ) %>%
  filter(!is.na(time_dispatched)) %>%
  mutate(dispatch_year = year(time_dispatched)) %>%
  arrange(pt_full_name, time_dispatched)
data_start <- min(visit_data$time_dispatched)
data_end   <- max(visit_data$time_dispatched)
total_days <- as.numeric(difftime(data_end, data_start, units = "days"))

antecedent_days <- 365L
buffer_days     <- 30L
outcome_days    <- as.integer(total_days - antecedent_days - buffer_days)

antecedent_end <- data_start + days(antecedent_days)
buffer_end     <- antecedent_end + days(buffer_days)
outcome_end    <- buffer_end + days(outcome_days)

cat(glue(
  "The dataset covers {round(total_days / 365.25, 1)} years of EMS visits ",
  "({format(data_start, '%B %d, %Y')} to {format(data_end, '%B %d, %Y')}).\n\n",
  "Antecedent window (feature building): ",
  "{format(data_start, '%B %d, %Y')} to {format(antecedent_end, '%B %d, %Y')} ",
  "({antecedent_days} days)\n",
  "Buffer: ",
  "{format(antecedent_end, '%B %d, %Y')} to {format(buffer_end, '%B %d, %Y')} ",
  "({buffer_days} days)\n",
  "Outcome window (super-user label): ",
  "{format(buffer_end, '%B %d, %Y')} to {format(outcome_end, '%B %d, %Y')} ",
  "({outcome_days} days)\n"
))
## The dataset covers 3 years of EMS visits (January 01, 2023 to December 31, 2025).
## 
## Antecedent window (feature building): January 01, 2023 to January 01, 2024 (365 days)
## Buffer: January 01, 2024 to January 31, 2024 (30 days)
## Outcome window (super-user label): January 31, 2024 to December 31, 2025 (700 days)
visit_data <- visit_data %>%
  mutate(
    visit_window = case_when(
      time_dispatched <  antecedent_end ~ "antecedent",
      time_dispatched <  buffer_end     ~ "buffer",
      time_dispatched <= outcome_end    ~ "outcome",
      TRUE                              ~ "after_outcome"
    )
  )

visit_data %>%
  count(visit_window) %>%
  filter(visit_window != "after_outcome") %>%
  mutate(
    `Window`         = recode(visit_window,
                              antecedent = "Antecedent",
                              buffer     = "Buffer",
                              outcome    = "Outcome"),
    `Visits`         = n,
    `Share of total` = scales::percent(n / sum(n), accuracy = 0.1)
  ) %>%
  select(`Window`, `Visits`, `Share of total`) %>%
  knitr::kable()
Window Visits Share of total
Antecedent 71199 32.6%
Buffer 6052 2.8%
Outcome 141076 64.6%

Feature Engineering

For each patient in the antecedent window, we compute a set of visit-history features that serve as predictors: total call volume, how often calls cluster within 7 or 30 days of a prior call, and flags for lift-assist and sickle-cell complaint history. We also compute interaction terms (e.g. “public insurance x patient age,” “lift assist x zip code”).

antecedent_visits <- visit_data %>%
  filter(visit_window == "antecedent") %>%
  arrange(pt_full_name, time_dispatched) %>%
  group_by(pt_full_name) %>%
  mutate(
    visit_index              = row_number(),
    prior_visits             = visit_index - 1,
    prev_visit_time          = dplyr::lag(time_dispatched),
    days_since_prev_visit    = as.numeric(difftime(time_dispatched, prev_visit_time, units = "days")),
    repeat_within_7d         = if_else(!is.na(days_since_prev_visit) & days_since_prev_visit <= 7,  1L, 0L),
    repeat_within_30d        = if_else(!is.na(days_since_prev_visit) & days_since_prev_visit <= 30, 1L, 0L),
    bigram_lift_assist       = replace_na(as.integer(bigram_lift_assist), 0L),
    bigram_sickle_cell       = replace_na(as.integer(bigram_sickle_cell), 0L),
    bigram_lift_assist_count_so_far = cumsum(bigram_lift_assist),
    bigram_sickle_cell_count_so_far = cumsum(bigram_sickle_cell),
    prior_lift_assist_count  = dplyr::lag(bigram_lift_assist_count_so_far, default = 0L),
    prior_sickle_cell_count  = dplyr::lag(bigram_sickle_cell_count_so_far, default = 0L),
    prior_lift_assist_any    = if_else(prior_lift_assist_count > 0, 1L, 0L),
    prior_sickle_cell_any    = if_else(prior_sickle_cell_count > 0, 1L, 0L)
  ) %>%
  ungroup()

patient_features <- antecedent_visits %>%
  group_by(pt_full_name) %>%
  arrange(time_dispatched, .by_group = TRUE) %>%
  summarise(
    last_visit_time          = max(time_dispatched),
    age                      = last(age),
    age_group                = last(age_group),
    race_clean               = last(race_clean),
    gender_clean             = last(gender_clean),
    homeless2                = last(homeless2),
    public_insurance         = last(public_insurance),
    zip_code                 = last(zip_code),
    dispatch_reason          = last(dispatch_reason),
    type_of_call             = last(type_of_call),
    type_destination         = last(type_destination),
    zcta_pct_65_plus         = last(zcta_pct_65_plus),
    zcta_pct_black           = last(zcta_pct_black),
    zcta_pct_hh_poverty      = last(zcta_pct_hh_poverty),
    zcta_pct_cost_burdened   = last(zcta_pct_cost_burdened),
    total_antecedent_visits  = n(),
    days_since_prev_visit    = last(days_since_prev_visit),
    repeat_within_7d_ever    = max(repeat_within_7d,  na.rm = TRUE),
    repeat_within_30d_ever   = max(repeat_within_30d, na.rm = TRUE),
    repeat_within_7d_count   = sum(repeat_within_7d,  na.rm = TRUE),
    repeat_within_30d_count  = sum(repeat_within_30d, na.rm = TRUE),
    prior_lift_assist_count  = max(bigram_lift_assist_count_so_far, na.rm = TRUE),
    prior_sickle_cell_count  = max(bigram_sickle_cell_count_so_far, na.rm = TRUE),
    prior_lift_assist_any    = max(prior_lift_assist_any, na.rm = TRUE),
    prior_sickle_cell_any    = max(prior_sickle_cell_any, na.rm = TRUE),
    .groups = "drop"
  )

cat(patient_features %>% nrow(), "unique patients appear in the antecedent window.\n")
## 44369 unique patients appear in the antecedent window.

Outcome Label

A patient is labeled as a positive case if they crossed the super-user threshold during the outcome window and had not already crossed it in the antecedent window. This targets patients who are on a trajectory toward high-frequency EMS use and have not yet been identified.

antecedent_totals <- antecedent_visits %>%
  group_by(pt_full_name) %>%
  summarise(
    antecedent_total_visits   = n(),
    antecedent_max_year_count = max(table(dispatch_year)),
    .groups = "drop"
  )

outcome_visits_counts <- visit_data %>%
  filter(visit_window == "outcome") %>%
  group_by(pt_full_name) %>%
  summarise(
    outcome_total_visits   = n(),
    outcome_max_year_count = max(table(dispatch_year)),
    .groups = "drop"
  )

patient_labeled <- patient_features %>%
  left_join(antecedent_totals,     by = "pt_full_name") %>%
  left_join(outcome_visits_counts, by = "pt_full_name") %>%
  mutate(
    outcome_total_visits   = replace_na(outcome_total_visits,   0L),
    outcome_max_year_count = replace_na(outcome_max_year_count, 0L),
    already_super = as.integer(
      antecedent_total_visits >= 15 | antecedent_max_year_count >= 10
    ),
    cumulative_total_visits   = antecedent_total_visits + outcome_total_visits,
    cumulative_max_year_count = pmax(antecedent_max_year_count, outcome_max_year_count),
    became_super_user = as.integer(
      already_super == 0 & (cumulative_total_visits >= 15 | cumulative_max_year_count >= 10)
    ),
    became_super_user = factor(became_super_user, levels = c(0, 1), labels = c("no", "yes"))
  ) %>%
  filter(already_super == 0)

patient_labeled %>%
  count(became_super_user) %>%
  mutate(
    Outcome  = became_super_user,
    Patients = n,
    Share    = scales::percent(n / sum(n), accuracy = 0.1)
  ) %>%
  select(Outcome, Patients, Share) %>%
  knitr::kable()
Outcome Patients Share
no 43357 98.6%
yes 614 1.4%

Model Setup and Training

We fit two models — penalized logistic regression and random forest — using 5-fold cross-validation to tune hyperparameters. Both models use the same feature set; the logistic regression additionally includes interaction terms between age group, public insurance status, and complaint history, which are expected to have compound effects on risk.

model_df <- patient_labeled %>%
  select(
    became_super_user, pt_full_name, last_visit_time,
    age, age_group, gender_clean, homeless2, public_insurance,
    dispatch_reason, type_of_call, type_destination, zip_code,
    zcta_pct_65_plus, zcta_pct_black, zcta_pct_hh_poverty, zcta_pct_cost_burdened,
    total_antecedent_visits, days_since_prev_visit,
    repeat_within_7d_ever, repeat_within_30d_ever,
    repeat_within_7d_count, repeat_within_30d_count,
    prior_lift_assist_count, prior_sickle_cell_count,
    prior_lift_assist_any, prior_sickle_cell_any
  )

patient_split <- initial_split(model_df, prop = 0.75, strata = became_super_user)
train_df      <- training(patient_split)
test_df       <- testing(patient_split)

cat(nrow(train_df), "patients in the training set;", nrow(test_df), "in the held-out test set.\n")
## 32978 patients in the training set; 10993 in the held-out test set.
cv_splits <- vfold_cv(train_df, v = 5, strata = became_super_user)

rec_base <- recipe(became_super_user ~ ., data = train_df) %>%
  update_role(pt_full_name,    new_role = "id") %>%
  update_role(last_visit_time, new_role = "id") %>%
  step_mutate(
    age_group        = as.factor(age_group),
    gender_clean     = as.factor(gender_clean),
    homeless2        = as.factor(homeless2),
    public_insurance = as.factor(public_insurance),
    dispatch_reason  = as.factor(dispatch_reason),
    type_of_call     = as.factor(type_of_call),
    type_destination = as.factor(type_destination),
    zip_code         = as.factor(zip_code)
  ) %>%
  step_unknown(all_nominal_predictors(), new_level = "missing") %>%
  step_novel(all_nominal_predictors()) %>%
  step_other(dispatch_reason,  threshold = 0.02, other = "other_dispatch") %>%
  step_other(type_of_call,     threshold = 0.02, other = "other_call") %>%
  step_other(type_destination, threshold = 0.02, other = "other_dest") %>%
  step_other(zip_code,         threshold = 0.02, other = "other_zip") %>%
  step_impute_median(all_numeric_predictors()) %>%
  step_zv(all_predictors())

rec_log <- rec_base %>%
  step_dummy(all_nominal_predictors()) %>%
  step_interact(terms = ~ starts_with("age_group"):starts_with("public_insurance")) %>%
  step_interact(terms = ~ starts_with("age_group"):starts_with("prior_lift_assist")) %>%
  step_interact(terms = ~ starts_with("age_group"):starts_with("prior_sickle_cell")) %>%
  step_interact(terms = ~ starts_with("public_insurance"):starts_with("prior_lift_assist")) %>%
  step_interact(terms = ~ starts_with("public_insurance"):starts_with("prior_sickle_cell")) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_zv(all_predictors()) %>%
  step_downsample(became_super_user)

rec_rf <- rec_base %>%
  step_downsample(became_super_user)

log_spec <- logistic_reg(penalty = tune(), mixture = tune()) %>%
  set_engine("glmnet") %>%
  set_mode("classification")

rf_spec <- rand_forest(trees = 1500, mtry = tune(), min_n = tune()) %>%
  set_engine("ranger", importance = "permutation", probability = TRUE) %>%
  set_mode("classification")

log_wf <- workflow() %>% add_recipe(rec_log) %>% add_model(log_spec)
rf_wf  <- workflow() %>% add_recipe(rec_rf)  %>% add_model(rf_spec)

log_grid <- grid_regular(
  penalty(range = c(-8, -2)),
  mixture(range = c(0, 0.5)),
  levels = c(8, 4)
)

rf_params <- rf_wf %>%
  extract_parameter_set_dials() %>%
  finalize(train_df) %>%
  update(
    mtry  = mtry(range  = c(2L, floor(ncol(train_df) * 0.8))),
    min_n = min_n(range = c(5L, 100L))
  )
rf_grid <- grid_latin_hypercube(rf_params, size = 25)

cls_metrics <- metric_set(pr_auc, roc_auc)
ctrl        <- control_grid(save_pred = TRUE, verbose = TRUE)

log_tuned <- tune_grid(log_wf, resamples = cv_splits, grid = log_grid,
                       metrics = cls_metrics, control = ctrl)
rf_tuned  <- tune_grid(rf_wf,  resamples = cv_splits, grid = rf_grid,
                       metrics = cls_metrics, control = ctrl)

log_best <- select_best(log_tuned, metric = "pr_auc")
rf_best  <- select_best(rf_tuned,  metric = "pr_auc")

log_final_wf  <- finalize_workflow(log_wf, log_best)
rf_final_wf   <- finalize_workflow(rf_wf,  rf_best)
log_fit_final <- fit(log_final_wf, data = train_df)
rf_fit_final  <- fit(rf_final_wf,  data = train_df)

Predictions and Evaluation

Both models are evaluated on the held-out test set. Because super users are rare (just over 1% of patients), we prioritize precision-recall AUC over ROC-AUC, and top-K precision over accuracy.

log_test_preds <- predict(log_fit_final, test_df, type = "prob") %>%
  bind_cols(test_df) %>% mutate(model = "log")

rf_test_preds <- predict(rf_fit_final, test_df, type = "prob") %>%
  bind_cols(test_df) %>% mutate(model = "rf")

val_preds <- bind_rows(log_test_preds, rf_test_preds) %>%
  mutate(model = factor(model, levels = c("log", "rf")))

val_summary <- val_preds %>%
  group_by(model) %>%
  summarise(
    pr_auc  = pr_auc_vec(became_super_user,  .pred_yes, event_level = "second"),
    roc_auc = roc_auc_vec(became_super_user, .pred_yes, event_level = "second"),
    brier   = mean((as.integer(became_super_user == "yes") - .pred_yes)^2),
    .groups = "drop"
  )

patient_ranked <- val_preds %>%
  group_by(model) %>%
  arrange(desc(.pred_yes), .by_group = TRUE) %>%
  mutate(patient_rank = row_number()) %>%
  ungroup()

topk_summary <- patient_topk_summary(val_preds, top_k_values = c(22, 44, 88, 100, 150, 200))

Cross-Validated Performance

Cross-validated out-of-fold predictions provide a less optimistic estimate of performance than the test set alone, and are used for threshold analysis below.

log_best_oof <- collect_predictions(log_tuned) %>%
  filter(penalty == log_best$penalty, mixture == log_best$mixture) %>%
  mutate(model = "log")

rf_best_oof <- collect_predictions(rf_tuned) %>%
  filter(mtry == rf_best$mtry, min_n == rf_best$min_n) %>%
  mutate(model = "rf")

oof_preds <- bind_rows(log_best_oof, rf_best_oof) %>%
  mutate(model = factor(model, levels = c("log", "rf")))

oof_preds %>%
  group_by(model) %>%
  summarise(
    `PR-AUC`      = round(pr_auc_vec(became_super_user,  .pred_yes, event_level = "second"), 3),
    `ROC-AUC`     = round(roc_auc_vec(became_super_user, .pred_yes, event_level = "second"), 3),
    `Brier score` = round(mean((as.integer(became_super_user == "yes") - .pred_yes)^2), 3),
    .groups = "drop"
  ) %>%
  rename(Model = model) %>%
  mutate(Model = recode(Model, log = "Logistic regression", rf = "Random forest")) %>%
  knitr::kable()
Model PR-AUC ROC-AUC Brier score
Logistic regression 0.196 0.885 NA
Random forest 0.153 0.892 0.127

Model Selection

Logistic regression outperforms random forest on PR-AUC and top-K precision at the ART’s operational capacity of 88 patients per year, making it the preferred model. The ROC-AUC scores are nearly identical across both models.

topk_88_precision <- topk_summary %>%
  filter(top_k == 88) %>%
  select(model, topk_88_precision = topk_precision)

val_summary %>%
  left_join(topk_88_precision, by = "model") %>%
  mutate(
    Model                = recode(model, log = "Logistic regression", rf = "Random forest"),
    `PR-AUC`             = round(pr_auc,  3),
    `ROC-AUC`            = round(roc_auc, 3),
    `Brier score`        = round(brier,   3),
    `Precision at top 88` = scales::percent(topk_88_precision, accuracy = 0.1)
  ) %>%
  select(Model, `PR-AUC`, `ROC-AUC`, `Brier score`, `Precision at top 88`) %>%
  gt() %>%
  tab_header(
    title    = "Model comparison",
    subtitle = "PR-AUC and precision at top 88 are the primary criteria given class imbalance."
  ) %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels()) %>%
  tab_options(table.font.size = 13, heading.title.font.size = 14,
              heading.subtitle.font.size = 12,
              column_labels.border.bottom.width = px(1),
              table_body.border.bottom.width    = px(1))
Model comparison
PR-AUC and precision at top 88 are the primary criteria given class imbalance.
Model PR-AUC ROC-AUC Brier score Precision at top 88
Logistic regression 0.255 0.919 NA 40.9%
Random forest 0.209 0.920 0.129 35.2%
best_model_label <- val_summary %>%
  slice_max(pr_auc, n = 1, with_ties = FALSE) %>%
  pull(model) %>% as.character()

best_fit <- if (best_model_label == "rf") rf_fit_final else log_fit_final

cat("Selected model:", ifelse(best_model_label == "rf", "Random forest", "Logistic regression"), "\n")
## Selected model: Logistic regression

Confusion Matrix and Threshold Analysis

At a risk score threshold of 0.30, the model flags roughly as many patients as the ART can screen in a week. The confusion matrix shows the tradeoff between precision and recall at this operating point.

cm_threshold <- 0.30

best_preds_thresh <- val_preds %>%
  filter(as.character(model) == best_model_label) %>%
  mutate(
    .pred_class = factor(
      if_else(.pred_yes >= cm_threshold, "yes", "no"),
      levels = c("no", "yes")
    )
  )

cm <- conf_mat(best_preds_thresh, truth = became_super_user, estimate = .pred_class)

autoplot(cm, type = "heatmap") +
  scale_fill_gradient(low = "#F5F5F5", high = "#4A7BB7") +
  labs(
    title    = sprintf("Confusion matrix — %s at threshold %.2f",
                       ifelse(best_model_label == "rf", "Random forest", "Logistic regression"),
                       cm_threshold),
    subtitle = "Rows = actual label; columns = predicted label"
  ) +
  theme(legend.position = "none")

summary(cm, event_level = "second") %>%
  filter(.metric %in% c("accuracy", "sens", "spec", "precision", "f_meas")) %>%
  mutate(
    Metric = recode(.metric, accuracy = "Accuracy", sens = "Recall",
                    spec = "Specificity", precision = "Precision", f_meas = "F1 score"),
    Value  = round(.estimate, 3)
  ) %>%
  select(Metric, Value) %>%
  gt() %>%
  tab_header(
    title    = sprintf("Classification metrics at threshold %.2f", cm_threshold),
    subtitle = "Held-out test set"
  ) %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
Classification metrics at threshold 0.30
Held-out test set
Metric Value
Accuracy 0.717
Recall 0.928
Specificity 0.713
Precision 0.047
F1 score 0.090

The threshold analysis shows how precision and recall trade off across the full range of possible cutoffs. The ART can adjust the operating threshold depending on whether they want to maximize true positives (lower threshold) or minimize screening effort on false positives (higher threshold).

threshold_grid <- tibble(threshold = seq(0.05, 0.80, by = 0.05))

oof_threshold_perf <- threshold_grid %>%
  crossing(oof_preds) %>%
  mutate(
    pred_class_thresh = factor(
      if_else(.pred_yes >= threshold, "yes", "no"), levels = c("no", "yes")
    )
  ) %>%
  group_by(model, threshold) %>%
  summarise(
    accuracy    = accuracy_vec(became_super_user,  pred_class_thresh),
    recall      = sens_vec(became_super_user,      pred_class_thresh, event_level = "second"),
    specificity = spec_vec(became_super_user,      pred_class_thresh, event_level = "second"),
    precision   = precision_vec(became_super_user, pred_class_thresh, event_level = "second"),
    .groups     = "drop"
  )

oof_threshold_perf %>%
  pivot_longer(cols = c(accuracy, recall, specificity, precision),
               names_to = "metric", values_to = "value") %>%
  ggplot(aes(x = threshold, y = value, color = model)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 1.2, alpha = 0.7) +
  geom_vline(xintercept = cm_threshold, linetype = "dotted",
             color = "grey40", linewidth = 0.6) +
  scale_color_manual(
    values = c(log = "#4A7BB7", rf = "#A64E4E"),
    labels = c(log = "Logistic regression", rf = "Random forest")
  ) +
  scale_y_continuous(limits = c(0, 1), labels = percent_format(accuracy = 1)) +
  facet_wrap(~ metric, ncol = 2) +
  labs(
    title    = "Performance across decision thresholds (cross-validated)",
    subtitle = sprintf("Dotted line = operating threshold (%.2f)", cm_threshold),
    x = "Risk score threshold", y = NULL, color = NULL
  )

Calibration

Both models substantially overpredict risk across the score range. This is a deliberate consequence of downsampling the majority class (non-super users) during training, which inflates predicted probabilities in order to improve the model’s ability to rank positive cases. Because the ART uses the model output as a ranked priority list rather than as a literal probability estimate, calibration is less important than ranking quality (PR-AUC and top-K precision). The predicted scores should be interpreted as relative risk scores, not as estimates of the true probability of becoming a super user.

val_preds %>%
  mutate(prob_bin = ntile(.pred_yes, 10)) %>%
  group_by(model, prob_bin) %>%
  summarise(
    mean_pred   = mean(.pred_yes, na.rm = TRUE),
    actual_rate = mean(became_super_user == "yes", na.rm = TRUE),
    n           = n(),
    .groups     = "drop"
  ) %>%
  ggplot(aes(x = mean_pred, y = actual_rate, color = model)) +
  geom_abline(linetype = "dashed", color = "grey60", linewidth = 0.5) +
  geom_line(linewidth = 0.7, alpha = 0.8) +
  geom_point(aes(size = n), alpha = 0.7) +
  scale_color_manual(
    values = c(log = "#4A7BB7", rf = "#A64E4E"),
    labels = c(log = "Logistic regression", rf = "Random forest")
  ) +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_size_continuous(range = c(1, 4), guide = "none") +
  facet_wrap(~ model) +
  labs(
    title    = "Calibration by predicted risk decile",
    subtitle = "Points near the dashed diagonal indicate well-calibrated predictions",
    x = "Average predicted risk", y = "Observed event rate", color = NULL
  )

Equity and Subgroup Analysis

The model performs consistently across racial groups with no evidence of systematic over- or under-prediction for Black patients relative to white patients, which is an important equity check given the racial concentration of super users in the Guilford County data.

threshold <- 0.30

val_preds_thresh <- val_preds %>%
  left_join(patient_labeled %>% select(pt_full_name, race_clean), by = "pt_full_name") %>%
  mutate(
    .pred_class_thresh = factor(
      if_else(.pred_yes >= threshold, "yes", "no"), levels = c("no", "yes")
    )
  )

safe_metric <- function(fn, truth, estimate, ...) {
  tryCatch(fn(truth, estimate, ...), error = function(e) NA_real_)
}

val_preds_thresh %>%
  group_by(model, race_clean) %>%
  summarise(
    Patients         = n(),
    `True super users` = sum(became_super_user == "yes",   na.rm = TRUE),
    Flagged          = sum(.pred_class_thresh == "yes",    na.rm = TRUE),
    `Actual rate`    = round(mean(became_super_user == "yes",  na.rm = TRUE), 3),
    `Predicted rate` = round(mean(.pred_class_thresh == "yes", na.rm = TRUE), 3),
    Recall           = round(safe_metric(sens_vec,      became_super_user, .pred_class_thresh, event_level = "second"), 3),
    Precision        = round(safe_metric(precision_vec, became_super_user, .pred_class_thresh, event_level = "second"), 3),
    Specificity      = round(safe_metric(spec_vec,      became_super_user, .pred_class_thresh, event_level = "second"), 3),
    .groups = "drop"
  ) %>%
  rename(Model = model, Race = race_clean) %>%
  mutate(Model = recode(Model, log = "Logistic regression", rf = "Random forest")) %>%
  gt() %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
Model Race Patients True super users Flagged Actual rate Predicted rate Recall Precision Specificity
Logistic regression American Indian / Alaska Native 23 0 7 0.000 0.304 NA 0.000 0.696
Logistic regression Asian 272 0 66 0.000 0.243 NA 0.000 0.757
Logistic regression Black 4763 96 1483 0.020 0.311 0.927 0.060 0.701
Logistic regression Hispanic 600 0 81 0.000 0.135 NA 0.000 0.865
Logistic regression Other / Unknown 10 1 10 0.100 1.000 1.000 0.100 0.000
Logistic regression Pacific Islander 25 0 7 0.000 0.280 NA 0.000 0.720
Logistic regression White 5299 69 1601 0.013 0.302 0.928 0.040 0.706
Logistic regression NA 1 0 1 0.000 1.000 NA 0.000 0.000
Random forest American Indian / Alaska Native 23 0 4 0.000 0.174 NA 0.000 0.826
Random forest Asian 272 0 52 0.000 0.191 NA 0.000 0.809
Random forest Black 4763 96 1370 0.020 0.288 0.927 0.065 0.726
Random forest Hispanic 600 0 80 0.000 0.133 NA 0.000 0.867
Random forest Other / Unknown 10 1 8 0.100 0.800 1.000 0.125 0.222
Random forest Pacific Islander 25 0 7 0.000 0.280 NA 0.000 0.720
Random forest White 5299 69 1567 0.013 0.296 0.913 0.040 0.712
Random forest NA 1 0 0 0.000 0.000 NA NA 1.000

Feature Importance

Raw call volume in the antecedent window is the single most powerful predictor; patients who already call frequently are the most likely to continue doing so. When prior call count is excluded, geography (zip code), patient age, and a history of call clustering emerge as the next most important features.

feature_labels <- tribble(
  ~feature,                                                  ~label,
  "total_antecedent_visits",                                 "Total EMS calls in prior year",
  "age",                                                     "Patient age",
  "days_since_prev_visit",                                   "Days since last EMS call",
  "repeat_within_30d_count",                                 "Calls within 30 days of a prior call (count)",
  "repeat_within_7d_count",                                  "Calls within 7 days of a prior call (count)",
  "repeat_within_30d_ever",                                  "Ever had two calls within 30 days",
  "repeat_within_7d_ever",                                   "Ever had two calls within 7 days",
  "prior_lift_assist_count",                                 "Lift-assist complaints (count)",
  "prior_sickle_cell_count",                                 "Sickle cell complaints (count)",
  "prior_lift_assist_any",                                   "Any prior lift-assist complaint",
  "prior_sickle_cell_any",                                   "Any prior sickle cell complaint",
  "zcta_pct_65_plus",                                        "ZIP code: % residents 65+",
  "zcta_pct_black",                                          "ZIP code: % Black residents",
  "zcta_pct_hh_poverty",                                     "ZIP code: % households in poverty",
  "zcta_pct_cost_burdened",                                  "ZIP code: % cost-burdened households",
  "zip_code_other_zip",                                      "ZIP code",
  "age_group_X65.79",                                        "Age group: 65-79",
  "age_group_X35.49",                                        "Age group: 35-49",
  "gender_clean_Male",                                       "Gender: male",
  "dispatch_reason_Traffic.Transportation.Incident",         "Dispatch reason: traffic incident",
  "dispatch_reason_Unconscious.Fainting.Near.Fainting",      "Dispatch reason: unconscious or fainting",
  "dispatch_reason_Cardiac.Arrest.Death",                    "Dispatch reason: cardiac arrest or death"
)

if (best_model_label == "rf") {
  imp_df <- extract_fit_engine(rf_fit_final)$variable.importance %>%
    enframe(name = "feature", value = "importance")
} else {
  imp_df <- tidy(extract_fit_parsnip(log_fit_final)) %>%
    filter(term != "(Intercept)") %>%
    rename(feature = term, importance = estimate) %>%
    mutate(direction = if_else(importance >= 0, "Increases risk", "Decreases risk"),
           importance = abs(importance)) %>%
    select(feature, importance, direction)
}

imp_df %>%
  left_join(feature_labels, by = "feature") %>%
  mutate(label = coalesce(label, feature)) %>%
  { if (!"direction" %in% names(.)) mutate(., direction = "Increases risk") else . } %>%
  arrange(desc(importance)) %>%
  slice_head(n = 10) %>%
  mutate(
    pct_importance = importance / sum(importance),
    pct_label      = percent(pct_importance, accuracy = 0.1)
  ) %>%
  arrange(pct_importance) %>%
  mutate(label = factor(label, levels = label)) %>%
  ggplot(aes(x = pct_importance, y = label, color = direction)) +
  geom_point(size = 3.5) +
  geom_segment(aes(x = 0, xend = pct_importance, yend = label),
               linewidth = 0.5, alpha = 0.3) +
  geom_text(aes(label = pct_label), hjust = -0.3, size = 3.2, color = "grey30") +
  scale_color_manual(
    values = c("Increases risk" = "#A64E4E", "Decreases risk" = "#4A7BB7"),
    drop = FALSE
  ) +
  scale_x_continuous(labels = percent_format(accuracy = 1),
                     expand = expansion(mult = c(0, 0.25))) +
  labs(title = "Feature importance: top predictors",
       x = "Share of total importance", y = NULL, color = NULL)

Model Performance Summary

val_summary %>%
  mutate(
    Model         = recode(model, log = "Logistic regression", rf = "Random forest"),
    `PR-AUC`      = round(pr_auc,  3),
    `ROC-AUC`     = round(roc_auc, 3),
    `Brier score` = round(brier,   3)
  ) %>%
  select(Model, `PR-AUC`, `ROC-AUC`, `Brier score`) %>%
  gt() %>%
  tab_header(
    title    = "Overall model performance",
    subtitle = "Evaluated on the held-out test set (25% of patients)"
  ) %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
Overall model performance
Evaluated on the held-out test set (25% of patients)
Model PR-AUC ROC-AUC Brier score
Logistic regression 0.255 0.919 NA
Random forest 0.209 0.920 0.129
topk_summary %>%
  filter(top_k %in% c(22, 88, 100, 200)) %>%
  mutate(
    Model              = recode(model, log = "Logistic regression", rf = "Random forest"),
    `Patients flagged` = flagged_patients,
    `True positives`   = true_future_high_users,
    Precision          = scales::percent(topk_precision, accuracy = 0.1),
    `Capture rate`     = scales::percent(topk_capture,   accuracy = 0.1),
    `All positives`    = total_future_yes
  ) %>%
  select(Model, `Patients flagged`, `True positives`, Precision, `Capture rate`, `All positives`) %>%
  gt() %>%
  tab_header(
    title    = "Model performance at operational capacity thresholds",
    subtitle = "22 = current simultaneous case load; 88 = annual capacity; 200 = extended outreach"
  ) %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels())
Model performance at operational capacity thresholds
22 = current simultaneous case load; 88 = annual capacity; 200 = extended outreach
Model Patients flagged True positives Precision Capture rate All positives
Logistic regression 22 9 40.9% 5.4% 166
Logistic regression 88 36 40.9% 21.7% 166
Logistic regression 100 40 40.0% 24.1% 166
Logistic regression 200 64 32.0% 38.6% 166
Random forest 22 4 18.2% 2.4% 166
Random forest 88 31 35.2% 18.7% 166
Random forest 100 36 36.0% 21.7% 166
Random forest 200 59 29.5% 35.5% 166

At the ART’s current annual capacity of 88 patients, the logistic regression model identifies a meaningfully higher share of true future super users than the random forest and substantially outperforms the current ad hoc nomination system (estimated at roughly 20% precision). At increased capacity levels (i.e. if the Emergency Services Department hires additional ART case workers), the model’s advantage increases because it can flag more at-risk individuals than individual paramedics and EMTs can identify on an ad hoc basis.


Cost-Benefit Analysis

Reference Benchmarks

workers_current  <- 2L
caseload_per_sw  <- 11L
case_duration_mo <- 3L
months_per_year  <- 12L
patients_per_sw_year <- caseload_per_sw * (months_per_year / case_duration_mo)

sw_annual_salary <- 37426
cost_per_call    <- 1500

benchmarks <- tibble(
  program         = c("Onslow County", "San Diego", "Wausau, WI"),
  patients_served = c(213, 51, 13),
  reduction_rate  = c(2/3, 1/3, 2/3),
  total_savings   = c(1200000, 314000, 100000),
  savings_per_pt  = total_savings / patients_served
)
outcome_call_volume <- val_preds %>%
  filter(as.character(model) == best_model_label) %>%
  left_join(
    outcome_visits_counts %>% rename(outcome_calls = outcome_total_visits),
    by = "pt_full_name"
  ) %>%
  mutate(outcome_calls = replace_na(outcome_calls, 0L)) %>%
  arrange(desc(.pred_yes)) %>%
  mutate(patient_rank = row_number())

# Use the full dataset super user count as the cap, not just those
# who appear in the modeling test set.
max_superusers <- sum(data_by_person$super_either, na.rm = TRUE)

cohort_size <- patients_per_sw_year

cohort_avg_calls <- outcome_call_volume %>%
  filter(patient_rank <= max_superusers) %>%
  mutate(cohort = ceiling(patient_rank / cohort_size)) %>%
  group_by(cohort) %>%
  summarise(
    cohort_start        = min(patient_rank),
    cohort_end          = max(patient_rank),
    n_patients          = n(),
    avg_calls_in_window = mean(outcome_calls, na.rm = TRUE),
    avg_calls_annual    = avg_calls_in_window * (365 / outcome_days),
    .groups = "drop"
  )

benchmark_calls_per_pt <- c("Onslow County" = 20, "San Diego" = 18, "Wausau, WI" = 22)

benchmarks <- benchmarks %>%
  mutate(
    calls_per_pt_yr       = benchmark_calls_per_pt[program],
    calls_averted_total   = patients_served * calls_per_pt_yr * reduction_rate,
    implied_cost_per_call = total_savings / calls_averted_total
  )

implied_cost_per_call_wtd <- weighted.mean(
  benchmarks$implied_cost_per_call, w = benchmarks$patients_served
)

The table below summarizes outcomes from three comparable community paramedicine programs. These programs serve as benchmarks for what Guilford County’s ART might achieve at scale. Call reduction estimates range from one-third (San Diego) to two-thirds (Onslow County, Wausau), and per-patient savings range from roughly $5,600 to $7,700. The Guilford County analysis uses a conservative $1,500 per-call cost estimate and models three reduction scenarios bracketed by these benchmarks.

Reference programs: comparable community paramedicine interventions
Patient-weighted average implied cost per call: $538. Guilford County analysis uses $1500 per call.
Program Patients Call reduction Calls per patient per year Calls averted Total savings Implied cost per call Savings per patient
Onslow County 213 67% 20 2840 $1,200,000 $423 $5,634
San Diego 51 33% 18 306 $314,000 $1,026 $6,157
Wausau, WI 13 67% 22 191 $100,000 $524 $7,692

Staffing and Savings Projections

The table below projects gross savings, staffing costs, and net savings across staffing levels under the mid-range scenario (50% call reduction). The analysis assumes each case worker carries 11 simultaneous cases on a 3-month rotation, for an annual caseload of 44 patients per worker. Gross savings are computed from the actual outcome-window call volumes of the top-ranked patients identified by the model, so the savings estimates reflect the real expected call volume of the patients the ART would serve, not a uniform average. The marginal net savings column shows the incremental return from each additional hire; if this figure is negative, it indicates that the cost of the new hire exceeds what the program can be expected to save. At a call reduction of 50%, the model suggests the program remains consistently cost-positive up to approximately 15 social workers.

reduction_scenarios <- c(low = 1/3, mid = 1/2, high = 2/3)
max_workers <- ceiling(max_superusers / patients_per_sw_year) + 3L

cba_table <- crossing(
  n_workers = seq(workers_current, max_workers),
  scenario  = names(reduction_scenarios)
) %>%
  mutate(
    reduction_rate    = reduction_scenarios[scenario],
    patients_per_year = pmin(n_workers * patients_per_sw_year, max_superusers),
    staffing_cost     = n_workers * sw_annual_salary
  ) %>%
  rowwise() %>%
  mutate(
    gross_savings = {
      n_full   <- floor(patients_per_year / cohort_size)
      n_remain <- patients_per_year %% cohort_size
      full_s   <- cohort_avg_calls %>%
        filter(cohort <= n_full) %>%
        summarise(s = sum(n_patients * avg_calls_annual * reduction_rate * cost_per_call)) %>%
        pull(s)
      part_s   <- 0
      if (n_remain > 0 && n_full < nrow(cohort_avg_calls)) {
        nc <- cohort_avg_calls %>% filter(cohort == n_full + 1) %>% pull(avg_calls_annual)
        if (length(nc) > 0) part_s <- n_remain * nc * reduction_rate * cost_per_call
      }
      full_s + part_s
    },
    net_savings = gross_savings - staffing_cost
  ) %>%
  ungroup() %>%
  group_by(scenario) %>%
  arrange(n_workers, .by_group = TRUE) %>%
  mutate(marginal_net_savings = net_savings - dplyr::lag(net_savings, default = NA_real_)) %>%
  ungroup()

marginal_threshold <- cba_table %>%
  filter(!is.na(marginal_net_savings)) %>%
  group_by(scenario) %>%
  summarise(
    last_worthwhile_hire = if_else(
      any(marginal_net_savings > 0, na.rm = TRUE),
      max(n_workers[marginal_net_savings > 0], na.rm = TRUE),
      NA_integer_
    ),
    .groups = "drop"
  )

cba_table %>%
  filter(scenario == "mid") %>%
  mutate(
    Workers              = n_workers,
    `Patients per year`  = patients_per_year,
    `Calls averted`      = round(gross_savings / (cost_per_call * reduction_rate)),
    `Gross savings`      = scales::dollar(gross_savings, accuracy = 1000),
    `Staffing cost`      = scales::dollar(staffing_cost, accuracy = 1000),
    `Net savings`        = scales::dollar(net_savings,   accuracy = 1000),
    `Marginal net savings` = if_else(is.na(marginal_net_savings), "—",
                                     scales::dollar(marginal_net_savings, accuracy = 1000))
  ) %>%
  select(Workers, `Patients per year`, `Calls averted`,
         `Gross savings`, `Staffing cost`, `Net savings`, `Marginal net savings`) %>%
  gt() %>%
  tab_header(
    title    = "Projected costs and savings — mid scenario (50% call reduction)",
    subtitle = paste0(
      "$", format(cost_per_call, big.mark = ","), " per EMS call  |  ",
      "$", format(sw_annual_salary, big.mark = ","), " annual salary  |  ",
      caseload_per_sw, " patients per worker  |  ",
      case_duration_mo, "-month case duration"
    )
  ) %>%
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_labels()) %>%
  tab_style(
    style     = cell_fill(color = "#EEF4FB"),
    locations = cells_body(rows = Workers == workers_current)
  )
Projected costs and savings — mid scenario (50% call reduction)
$1,500 per EMS call | $37,426 annual salary | 11 patients per worker | 3-month case duration
Workers Patients per year Calls averted Gross savings Staffing cost Net savings Marginal net savings
2 88 309 $232,000 $75,000 $157,000
3 132 408 $306,000 $112,000 $194,000 $37,000
4 176 512 $384,000 $150,000 $234,000 $40,000
5 220 660 $495,000 $187,000 $308,000 $74,000
6 264 781 $586,000 $225,000 $361,000 $53,000
7 308 869 $652,000 $262,000 $390,000 $29,000
8 352 985 $739,000 $299,000 $439,000 $49,000
9 396 1074 $805,000 $337,000 $468,000 $29,000
10 440 1138 $853,000 $374,000 $479,000 $11,000
11 484 1206 $905,000 $412,000 $493,000 $14,000
12 528 1292 $969,000 $449,000 $520,000 $27,000
13 572 1382 $1,036,000 $487,000 $550,000 $30,000
14 616 1463 $1,097,000 $524,000 $573,000 $23,000
15 660 1508 $1,131,000 $561,000 $570,000 -$3,000
16 704 1611 $1,208,000 $599,000 $609,000 $40,000
17 748 1662 $1,247,000 $636,000 $610,000 $1,000
18 792 1726 $1,295,000 $674,000 $621,000 $11,000
19 836 1777 $1,332,000 $711,000 $621,000 $0
20 880 1805 $1,354,000 $749,000 $605,000 -$16,000
21 924 1848 $1,386,000 $786,000 $600,000 -$5,000
22 968 1894 $1,420,000 $823,000 $597,000 -$3,000
23 1012 1946 $1,460,000 $861,000 $599,000 $2,000
24 1056 1987 $1,490,000 $898,000 $592,000 -$7,000
25 1100 2026 $1,520,000 $936,000 $584,000 -$8,000
26 1144 2067 $1,551,000 $973,000 $578,000 -$7,000
27 1188 2115 $1,586,000 $1,011,000 $576,000 -$2,000
28 1232 2166 $1,625,000 $1,048,000 $577,000 $1,000
29 1276 2196 $1,647,000 $1,085,000 $561,000 -$15,000
30 1283 2204 $1,653,000 $1,123,000 $530,000 -$32,000
31 1283 2204 $1,653,000 $1,160,000 $492,000 -$37,000
32 1283 2204 $1,653,000 $1,198,000 $455,000 -$37,000
33 1283 2204 $1,653,000 $1,235,000 $418,000 -$37,000

Net Savings Plots

At the current staffing level of two social workers, the program generates meaningful positive net savings under all three reduction scenarios. Net savings grow with additional case worker hire until 15.

ggplot(cba_table, aes(x = n_workers, y = net_savings, color = scenario, group = scenario)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.5) +
  geom_line(linewidth = 0.9) +
  geom_point(size = 2.5) +
  geom_vline(
    data = marginal_threshold %>% filter(!is.na(last_worthwhile_hire)),
    aes(xintercept = last_worthwhile_hire, color = scenario),
    linetype = "dotted", linewidth = 0.7, show.legend = FALSE
  ) +
  scale_x_continuous(breaks = seq(workers_current, max_workers, 1)) +
  scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "k", accuracy = 1)) +
  scale_color_manual(
    values = c(low = "#B5C8E0", mid = "#4A7BB7", high = "#2A4F7A"),
    labels = c(low  = "Conservative (1/3 reduction)",
               mid  = "Mid-range (1/2 reduction)",
               high = "Optimistic (2/3 reduction)")
  ) +
  labs(
    title    = "Projected net savings by staffing level",
    subtitle = paste0("Capped at ", max_superusers,
                      " total super users. Dotted lines = marginal utility threshold."),
    x = "Number of social workers", y = "Annual net savings", color = "Reduction scenario"
  )

The marginal savings chart shows the incremental return from each additional hire. Each successive hire serves a lower-risk cohort of patients (because the model has already prioritized the highest-risk cases), so marginal savings decline with each hire, though they remain positive well beyond the current two-worker team

cba_table %>%
  filter(!is.na(marginal_net_savings)) %>%
  ggplot(aes(x = n_workers, y = marginal_net_savings, color = scenario, group = scenario)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.5) +
  geom_line(linewidth = 0.9) +
  geom_point(size = 2.5) +
  scale_x_continuous(breaks = seq(workers_current + 1, max_workers, 1)) +
  scale_y_continuous(labels = label_dollar(scale = 1e-3, suffix = "k", accuracy = 1)) +
  scale_color_manual(
    values = c(low = "#B5C8E0", mid = "#4A7BB7", high = "#2A4F7A"),
    labels = c(low  = "Conservative (1/3 reduction)",
               mid  = "Mid-range (1/2 reduction)",
               high = "Optimistic (2/3 reduction)")
  ) +
  labs(
    title    = "Marginal net savings per additional case worker hired",
    subtitle = "Lines crossing zero mark the point where the next hire no longer pays for itself",
    x = "Total social workers on staff",
    y = "Incremental annual net savings", color = "Reduction scenario"
  )


Conclusion

This project demonstrates that a machine learning model can substantially improve the Guilford County ART’s ability to identify patients at risk of becoming high-frequency EMS users before they reach crisis-level utilization. The key takeaways are:

The logistic regression model achieves a PR-AUC of 0.255 and identifies true future super users at roughly 40% precision when the top 88 patients are flagged, more than double the estimated precision of the current ad hoc referral system. In other words, at the ART’s current annual capacity of 88 clients per year, the model identifies four in ten of all future super users in the county.

The Guilford County super user population is distinctive. Unlike the academic literature on super users suggests, Guilford County’s super users are not primarily driven by substance use or behavioral health crises. The dominant pattern is physical frailty and mobility impairment — “lift assist” is the single most diagnostically distinctive complaint — combined with social vulnerability (poverty, public insurance, majority-Black ZIP codes). Interventions calibrated to this profile will likely look different from programs designed for younger populations with high prevalence of substance-use.

The program is cost effective and should grow. At the current two-worker staffing level, the ART already generates substantial net savings under all three call-reduction scenarios. The marginal utility analysis suggests the program remains cost-positive with up to approximately 15 social workers, representing a roughly fourfold expansion from current capacity. The model enables this growth by scaling referral precision as caseload increases.

Equity monitoring is essential. The model shows consistent performance across racial groups in the held-out test set, but this should be monitored prospectively as the model is deployed. False-positive rates, which translate to screening burden on patients who are not enrolled, should be tracked by race and reviewed regularly.

Next steps for deployment:

  1. Prospective validation. Generate scores from data available as of a fixed date and compare to observed super-user outcomes 12 months later, independent of training data.
  2. ART enrollment integration. Exclude currently enrolled patients from the scoring list to avoid duplicate referrals.
  3. Cost assumption refinement. Obtain additional Guilford County cost-per-call data (transport, hospital diversion, administrative overhead) for more precise savings estimates.
  4. Complaint feature expansion. Chief complaint text analysis suggests additional candidate bigrams beyond lift-assist and sickle cell; these should be evaluated as additional model features.
  5. Re-escalation modeling. Patients who were already super users at the start of the antecedent window are currently excluded from the model. A separate model for re-escalation risk among program graduates would complete the ART’s referral pipeline.
  6. Population projection integration. Guilford County’s population is aging, and a growing number of its residents are at risk of becoming future users in the coming years. Population projections could be incorporated into a future version of this cost-benefit analysis, to determine evolving staffing needs and program costs as the vulnerability of Guilford County residents increases.

Citations

CARES: A Community-wide Collaboration Identifies Super-utilizers and Reduces Their 9-1-1 Call, Emergency Department, and Hospital Visit Rates: Prehospital Emergency Care: Vol 21 , No 6 (June 2017). https://doi.org/10.1080/10903127.2017.1335820

Sanko, S. G., & Eckstein, M. (2013). Characteristics of the Most Frequent “Super-Users” of Emergency Medical Services. Annals of Emergency Medicine, 62(4), S145. https://doi.org/10.1016/j.annemergmed.2013.07.230

Community Paramedicine Models for Reducing Use of Emergency Resources—RHIhub Toolkit. (n.d.). Retrieved May 7, 2026, from https://www.ruralhealthinfo.org/toolkits/community-paramedicine/2/reducing-use-of-emergency-resources

Hall, M. K., Raven, M. C., Hall, J., Yeh, C., Allen, E., Rodriguez, R. M., Tangherlini, N. L., Sporer, K. A., & Brown, J. F. (2015). EMS-STARS: Emergency Medical Services “Superuser” Transport Associations: An Adult Retrospective Study. Prehospital Emergency Care, 19(1), 61–67. https://doi.org/10.3109/10903127.2014.936630

LaCalle, E., & Rabin, E. (2010). Frequent users of emergency departments: The myths, the data, and the policy implications. Annals of Emergency Medicine, 56(1), 42–48. https://doi.org/10.1016/j.annemergmed.2010.01.032

MacKinney, A. C., & Stamy, C. (2021). Serving High Need/High Cost Patients in the Emergency Department.

Agarwal, G., Lee, J., McLeod, B., Mahmuda, S., Howard, M., Cockrell, K., & Angeles, R. (2019). Social factors in frequent callers: A description of isolation, poverty and quality of life in those calling emergency medical services frequently. BMC Public Health, 19(1), 684. https://doi.org/10.1186/s12889-019-6964-1

The HOME Team: Evaluating the Effect of an EMS-based Outreach Team to Decrease the Frequency of 911 Use Among High Utilizers of EMS | Prehospital and Disaster Medicine | Cambridge Core. (n.d.). Retrieved May 7, 2026, from https://www.cambridge.org/core/journals/prehospital-and-disaster-medicine/article/abs/home-team-evaluating-the-effect-of-an-emsbased-outreach-team-to-decrease-the-frequency-of-911-use-among-high-utilizers-of-ems