This project was produced as part of the University of Pennsylvania Master of Urban Spatial Analytics Spring 2018 Practicum (MUSA 801), instructed by Ken Steif, Michael Fichman, and Karl Dailey. This document begins with a case study of predicting spatial risk of opioid overdoses in Providence, Rhode Island and is followed by a series of appendices that discuss data wrangling, data visualization, data sources, feature engineering, and model results. Navigate through the document either by using the panel at the left, or by clicking the hyperlinks throughout the document.
This project seeks to build a spatial risk model of opioid overdose events for the City of Providence, Rhode Island by examining current overdose locations, community protective resources, risk factors, and neighborhood characteristics. Assigning a level of risk to each area of the city can assist Providence and local stakeholders in strategically allocating resources in a way that will achieve the greatest impact. As of January 2018, Providence is implementing a Safe Stations program, where people struggling with substance abuse can come to any of the City’s 12 fire stations to be connected with supportive services. The spatial risk model will help Providence’s Department of Healthy Communities determine other areas at high risk of overdose events where the City could site additional interventions or supplement their communications efforts.
Opioid overdose events have skyrocketed in recent years, fueling a decline in the United States life expectancy for the second year in a row and leading President Trump to declare the crisis a public health emergency. A Centers for Disease Control Report released in March 2018 found that emergency room visits for suspected opioid overdoses increased 30% between July 2016 and September 2017 (CDC, 2018). More than 115 Americans die every day from opioid overdoses, and Rhode Island has been hit especially hard by the crisis, with the rate of overdose deaths per 100,000 persons significantly higher than the national average.
Figure 1.3a
The crisis has also escalated much faster in Rhode Island compared to the nation, with the rate of overdose deaths per 100,000 persons increasing nearly six-fold between 1999 and 2016.
Figure 1.3b
Since 2014, over 1,000 people in Rhode Island have died from opioid overdoses. Over one quarter of these deaths have occurred in Providence.
Figure 1.3c
In 2017, almost 700 overdose events were recorded in Providence. In the face of such an overwhelming public health crisis, one of the tools that the City of Providence is using to address the opioid crisis is its new Safe Stations program. Launched in January 2018, the program turns Providence’s 12 fire stations into 24/7 centers where trained staff connect individuals with supportive services. Based on a program created in Manchester, New Hampshire, Safe Stations demonstrate how a network of public, nonprofit, and private actors are working together to address the mounting problem of opioid overdoses.
The premise of Safe Stations is that they are a community-based response that reduces barriers to accessing resources and provides a safe community for people struggling with substance abuse. As can be seen below, because of their nature as an emergency response facility, Providence’s fire stations are relatively dispersed throughout the city’s neighborhoods.
Figure 1.3d
This analysis seeks to determine whether Safe Stations are the assets best located to provide services for those struggling with substance abuse. Assigning an overdose risk level to each area of Providence can be a valuable tool. Knowledge of risk could allow the City and other stakeholders to assess the cost of providing Safe Stations in these locations relative to the potential benefits of opening stations in additional city facilities in high-risk areas, to determine high-risk areas where Safe Stations should be more heavily advertised, and to allocate overdose prevention and harm reduction resources, such as Narcan trainings, needle exchange centers, or safe injection sites, in areas at the highest risk.
Providing citywide spatial information regarding the risk of opioid overdose events empowers the city and nonprofits to continue to make their public health interventions more robust and could serve as a model for cities around the country that are interested in how to implement similar initiatives with the greatest impact.
The spatial risk model built for this analysis varies from one way in which resources are commonly allocated, known as kernel density maps. Kernel density maps, or hotspot maps, interpolate past events across space - suggesting that resources are best allocated in an area where many overdoses have occurred in the past. A kernel density map of Providence’s 2017 overdoses can be seen below:
Figure 1.4a
In contrast, the spatial risk model predicts overdose events as a function of environmental risk and protective factors that vary across space. The unit of analysis for this model breaks down Providence into a set of 500-foot by 500-foot grid cells or a “fishnet” (shown in the map at the bottom left). The count of overdoses per grid cell serves as the dependent variable in the model (shown in the map to the bottom right).
Figure 1.4b
At the fishnet level, it is clear that while the number of overdoses has been increasing in Providence, they remain a relatively rare event in terms of how many grid cells contain overdoses. Only 18% of grid cells contain the approximately 700 overdoses spread out over the 2017 data. This distribution of overdoses is taken into account throughout the modeling process. As the intent of this analysis is to identify high-risk locations that are actionable from a policy standpoint, grid cells allow for the determination of risk-levels specific to an area the size of a city block. At the same time, the grid cells are large enough that they do not provide personally identifiable information or skew the distribution of overdoses within the fishnet.
Figure 1.4c
Independent variables are also included at the fishnet level. Therefore, the final dataset includes a series of characteristics about each 500-foot-square area of the city, along with a count of how many overdoses have occurred there. For the purposes of this analysis, data regarding independent variables was split into one of three categories, with the hypothesis that each would have a different relationship with overdoses: first, community protective resources, such as hospitals or parks, which are amenities where other public health interventions could possibly be sited; second, risk factors, such as crime or 311 complaints; third, neighborhood-level characteristics, such as neighborhood boundaries or census information. The relationship of each of these three categories with overdose events is explored in-depth in the explanatory analysis and model building sections of this document.
To create the spatial risk model, we began with an extensive data wrangling process (described further in the data wrangling appendix). The overdose location data used in the model was provided through a tremendous effort by the City of Providence. The data was pulled from records of emergency medical services responses, which did not contain an indicator of whether or not the event was an overdose. Therefore, clinicians examined data from 2017 and pulled together a list of approximately 700 events that they considered to be overdoses. This created a one-of-a-kind dataset exclusively for this analysis. As this dataset contains health information, overdoses are always shown at an aggregated level to ensure the protection of personal privacy.
As with any data, there is potential for error in terms of whether all overdose events were captured and whether all events recorded after the fact were overdoses. Additionally, there were several address entry errors that had to be manually geolocated in order to ensure accuracy, which is a source of additional potential error. Furthermore, it is worth mentioning that this data is a small sample size, due to both the size of Providence (the city’s population is 179,000) and the fact that this data was collected only for 2017. While this analysis is worth conducting, the sample size precludes certain tests of model generalizability, such as testing across years. These limitations will be discussed throughout the analysis as relevant, but are necessary to mention at the outset.
In addition to overdose events, the data wrangling process included the collection of a wide variety of data regarding community protective resources, risk factors, and neighborhood characteristics in the City of Providence (listed in the data sources appendix). Following the data collection, we undertook a feature engineering process to extract different types of predictive power from the variables, and all of the features are listed in the feature engineering appendix. Once the entire dataset of overdoses and features had been created, we conducted an explanatory analysis process to determine which variables were correlated with overdoses.
With an understanding of which variables were correlated with overdose events, we constructed different predictive models, informed by machine learning techniques. We then conducted a validation process to ensure that the final model was generalizable across the different areas of Providence. The modeling process and model results are described in detail in the model building section and the model results appendix.
In addition to the overdose count dependent variable, several data sources were used to determine risk factors and protective assets in Providence, including:
Below appendices include a data dictionary with a full list of variables and their sources; a data wrangling appendix that explains how data was compiled, cleaned, and feature engineered; a list of the final features; a data visualization appendix that explains how to make basic graphics for this analysis; and a table that lists the results from each model.
To begin the exploratory analysis process, we constructed a dataset through an extensive data wrangling and feature engineering process (described in the data wrangling appendix.) Each of the Community Protective Resource and Risk Factor variables were feature engineered to extract different levels of explanatory power. An example of this can be seen below in Figure 2.1a, which shows drug crimes in the past 180 days, distance to each drug crime, distance to the three nearest drug crimes (or “nearest neighbor”), and density of drug crime. The distance, nearest neighbor, and density variables are each included in the fishnet grid cells as potential independent variables in the modeling process. While each variable measures the same drug crime points, they measure spatial distribution differently. Distance, density, and nearest neighbor maps will be seen throughout the exploratory analysis.
Figure 2.1a
The Safe Stations program has been implemented at Providence’s 12 fire stations. The central question of this analysis is whether these Safe Stations are the most efficient locations to connect those struggling with substance abuse to supportive services, or if this program should be implemented at additional city facilities. In order to examine this question, we first evaluated existing Safe Stations’ proximity to overdose locations. The below map shows where overdoses are occurring within a quarter-mile distance from a fire station, where people could walk less than five minutes to the station to receive aid.
Figure 2.2a
As can be seen in the below bar chart, there are stations that are experiencing considerably more overdoses within a quarter-mile and others that experience almost no overdoses within a quarter-mile. Engine 8, located in West End, had over 20 overdoses occur within a quarter-mile in 2017. In contrast, neighboring Engine 11 near the Reservoir neighborhood experienced one overdose within a quarter-mile. This metric can inform which stations may be in need of additional resources.
Figure 2.2b
Counting the number of overdoses within a quarter-mile of fire stations demonstrates which Safe Stations may experience the most activity. However, it does not show whether or not fire stations are the best location for Safe Stations within the entire city of Providence. We defined “best location” as being in a place where a relatively high number of opioid overdoses are occurring. We developed a hypothesis test that asks whether the average distance from overdoses to fire stations is closer than one would expect due to random chance alone. Figure 2.2c below visualizes the distribution of 999 randomly permuted sets of 12 points (to simulate a set of fire stations) and their corresponding distances to the nearest 10 observed overdose events. The vertical line demonstrates the actual distance between Providence’s 12 fire stations and their 10 nearest overdose events. 63.7% of randomly generated points were closer to overdoses than the actual fire station locations, suggesting that Providence could locate Safe Stations, or supplementary efforts, closer to where overdoses are actually occurring.
Figure 2.2c
As mentioned above, the independent variables in this analysis were divided into three categories. The first of these categories is Community Protective Resources, which are considered to be neighborhood amenities, many of which could support additional public health interventions. These resources include locations such as hospitals, libraries, schools, bus stops, etc. The full list of amenities can be found in the feature appendix below. The hypothesis is that areas that contain higher numbers of these community protective resources would have a lower number of opioid overdoses. However, this analysis demonstrates that, in reality, the relationship varies depending on the resource - and tends to be negatively correlated with overdoses.
This section will discuss variables that ultimately proved to be useful in the model building process. While the dependent variable for this analysis is the count of overdose events, it is easier to explore and visualize correlation patterns when count has been recoded to a spatial density and, therefore, correlation plots below use the logged density of overdoses as a proxy for overdose counts.
One community resource that this analysis examines is the correlation between proximity to different types of education facilities and overdose events. As can be seen below, both private schools and colleges are clustered in College Hill, as well as the Downtown area of the city. The below scatter plots demonstrate that proximity to all three types of schools is correlated with a higher density of overdoses, meaning that as you get closer to an educational facility, there is a higher number of overdoses. However, public school proximity has a slightly higher correlation with overdoses.
Figure 2.3a
Proximity to public transit is another variable that was considered in the model. In this case, public transit is represented by RIPTA bus stop locations. As can be seen below, bus stops also have a negative correlation with overdoses.
Figure 2.3b
Another community resource to note is food vendors. We wanted to examine whether there was a difference in correlation with density of overdoses between supermarkets and SNAP vendors, with the hypothesis that areas that are less well-served by supermarkets would have a higher number of overdoses. As can be seen below, there are many fewer supermarkets in Providence compared to SNAP vendors, and they are primarily concentrated in the northern portion of the City. SNAP vendors, in comparison, appear to be spread throughout the city, with a strong presence along commercial corridors in the central and southern portions where there is an absence of supermarkets. Examining the correlations of these amenities with overdoses shows that both have a negative correlation between proximity and the density of overdoses. However, SNAP vendors have a more negative correlation, indicating that proximity to a SNAP vendor tends to be an indicator of a higher number of overdoses compared to supermarkets.
Figure 2.3c
The correlation between distance to parks and overdose incidents was also explored. Given that parks serve as one of the best measures of public gathering spaces, the hypothesis was that proximity to parks may be associated with a higher number of overdose events. Correlation tests proved that this hypothesis was correct; as you get closer to a park in Providence, the density of overdoses increases. However, this correlation was smaller than that of any other community resource mentioned in this section.
Figure 2.3b
While the hypothesis at the outset of the exploratory analysis process was that areas with a greater number of community resources would have experienced fewer overdose events in 2017, that is not actually the case. While the correlation between each community resource and overdoses varies, proximity to all community resources tends to suggest a higher density of overdoses.
The second category of variables that this analysis examines is risk factors, with the hypothesis that areas containing greater numbers of risk factors would experience higher numbers of overdoses. To explore this hypothesis, we looked at data related to three categories of risk factors: alcohol & tobacco vendors, data from 311 complaint calls, and crime data. Our hypothesis was largely confirmed throughout the risk factors we examined, and risk factors tended to have a greater correlation to density of overdoses than community resources.
Based on past research indicating that alcohol and tobacco vendors are associated with higher rates of crime, we wanted to explore if proximity to these vendors in Providence was associated with higher overdose density. Using business license data, we pulled the locations of tobacco vendors and liquor licenses in Providence. The below maps show that tobacco vendors tend to cluster more in the southwest area of the city and liquor licenses are very clustered in the center of the city. Scatterplots illustrate a similar relationship between each of these variables; higher numbers of overdoses occur in close proximity to both tobacco vendors and liquor licenses and decrease in number farther away.
Figure 2.4.1a
The City provided data regarding 311 complaints that served as the basis for several risk factor categories, such as vacant buildings, abandoned vehicles, out street lights, graffiti, trash, and overgrown lots. These variables largely serve as an indicator of neighborhood conditions and levels of investment in an area. Therefore, the hypothesis was that areas with a higher number of complaints would experience a greater number of overdoses. Due to the extensive nature of the 311 call dataset from the City’s open data site (shown below), we grouped this data into broader categories to better extract meaning from the data (listed in the feature appendix.)
Figure 2.4.2a
In Figure 2.4.2b below, we have highlighted four variables that ultimately proved to be useful in the modeling process due to their correlation with overdose incidents: abandoned buildings, abandoned cars, code violations, and overgrowth. The positive relationships between the density of these variables and overdoses suggests that greater numbers of overdose events are happening in areas with greater numbers of 311 complaints. The strongest correlations with overdoses are code violation complaints and abandoned building complaints, suggesting that disinvestment in the built environment has a strong relationship with overdose events.
Figure 2.4.2b
Figure 2.4.2c below highlights these same four variables and the difference between their density and proximity to the three nearest neighbors. Throughout the modeling process, the spatial differences in these features were considered.
Figure 2.4.2c
A third category of risk factors that we examined was crime data, with the hypothesis that overdoses would be strongly correlated with areas where other crimes were occurring. Specifically, we evaluated drug crimes (removing low-level, marijuana-related crimes) and assaults. The density maps below show that these crimes tend to be clustered toward the center of the city. Correlation tests confirmed our hypothesis that for both drug crimes and assaults, as the density of crime events increases, so does the number of overdose events. Furthermore, the correlation between these types of crime and overdose density was the highest by far of any variable that we examined.
Figure 2.4.3a
Figure 2.5.1a
The third category of data explored in this analysis related to broader neighborhood characteristics, such as geographic boundaries and demographic information. We first wanted to explore if overdoses are spatially concentrated in particular neighborhoods. The above bar plot shows that overdose events are not evenly distributed throughout the city. Certain neighborhoods, such as West End, Downtown, and Wanskuck see more overdoses than other areas of the city. However, overdoses are not isolated to a single part of the city and are taking place across many neighborhoods. The inclusion of neighborhood variables in the final model was important in order to capture some of the variation in overdose events across space.
Figure 2.5.1b
Another measure of neighborhood characteristics is demographic data from the Census, which we included at the block group level. The below maps illustrate the citywide distribution of population density, median household income, unemployment rate, and poverty rate. The accompanying scatterplots show some association between count of overdoses and these demographic variables; more overdoses occur in areas with higher population density, lower median household income, higher unemployment rates, and higher percentages of families living in poverty. However, population density and median household income have a much higher correlation with overdose events than the other two variables. The equity implications of using economic-based variables was considered throughout the model building process.
Figure 2.4.2a
The extensive data wrangling, feature engineering, and exploratory analysis processes informed the creation of a model to predict the risk of overdose events across the city of Providence. The following section will outline the process of selecting features and constructing models, validating models, and converting predictions into actionable intelligence.
Just as we divided the exploratory analysis into three categories of variables that tell different stories, we structured our model building process in a similar manner. We wanted to explore how well each of these ‘stories’ predicted overdose risk on their own. By building different spatial risk models for community protective resources, risk factors, and neighborhoods characteristics, we sought to capture the predictive power of each of these stories separately before combining them into a final ensemble model that harnesses the predictive power of all 3.
To account for the fact that overdoses are rare events, and therefore have not occurred in the majority of grid cells in the city, Poisson and Negative Binomial models were explored for each variable category. The variable selection process for each of the three models was informed by machine learning processes that identified the variables with the greatest predictive power across many iterations. The results of the top performing models are explored in further detail below. This process, and the results of all tested models, are further detailed in the modeling appendix.
The community protective resources model demonstrated that SNAP vendors and bus stops provided the strongest predictive power within this category of variables. The below graph shows the variables included in the final community protective resources model, in order of their relative predictive importance in the model.
Figure 3.1.1a
The model containing the variables shown above was informed by a machine learning process called XG Boost that determined the relative importance of each community protective resource variable across many iterations. For variables with multiple feature engineered forms (distance, density, and nearest neighbor), only the nearest neighbor features were included to avoid duplication. The final model was then honed to only include features that were demonstrated to be important in the machine learning process.
Figure 3.1.1b
Each model created in this analysis utilizes random subsets of 60% of the city’s fishnet grid cells (a “training set”) to create the model, allowing models’ accuracy to be tested on the 40% of grid cells that were set aside (the “test set”). The “mean absolute error” or “residuals,” are the difference between the observed number of overdoses in a test set grid cell and the number of overdoses that the model has predicted in that grid cell. The below graph shows the distribution of residuals for the community protective resources model, which are largely clustered between 0 and 1 overdose event. The measure of mean absolute error for this model is 0.4234 overdose events.
Figure 3.1.1c
In the final risk factors model, assaults provide the strongest predictive power, followed by 311 code violation complaints, and drug crimes.
Figure 3.1.2a
The machine learning process also showed assaults to have, by far, the strongest predictive power.
Figure 3.1.2b
Similar to the community protective resources model, the residuals for the risk factors model are clustered between 0 and 1. The mean absolute error for this model is 0.4155 overdose events, lower than that of community protective resources.
Figure 3.1.2c
Assessing the variable importance of the neighborhood characteristics model demonstrated that the West End, Valley, and Silver Lake had, by far, the most importance in the model. The remainder of the variables in this model were either insignificant, or contained much less weight in the model relative to the top three neighborhoods.
Figure 3.1.3a
Machine learning shows that, on their own, price per acre provides the strongest predictive power. This pattern does not emerge in the full neighborhood characteristics model likely due to the fact that holding neighborhood constant within the model controls for much of the variation of census-based and land value variables.
Figure 3.1.3b
Residuals for the neighborhood model are also clustered between 0 and 1. The model shows a mean absolute error of 0.4486, higher than the errors for the community protective resources and risk factor models.
Figure 3.1.3c
Since each of the 3 models described above tells a different story about predicting overdose risk in Providence, we combined them into a single model through an ensembling process aimed at reducing the overall error. This section describes the process of comparing the predictions and errors of the 3 different stories and using them to create a final ensemble model.
The outcome of each of the three models described above was a predicted risk map of overdose events across the City of Providence, shown below. In the prediction maps, it is clear how the variables in each model strongly inform its predictions. For example, in the community protective resources model, the bus routes are evident and in the neighborhood model there are hard boundaries between the different neighborhoods. Furthermore, the predictions for all models range between 0 and roughly 4 overdose events per grid cell, while the actual number of overdoses per grid cell in the city ranges from 0 to 22. However, risk predictions are better considered on a relative scale, indicating high- and low-risk areas rather than predicting a specific number of events.
Figure 3.1.4a
As described above, each model was trained on 60% of the fishnet grid cells, allowing for its prediction accuracy to be tested on the remaining 40% of the grid cells. The below set of maps shows the residuals, or errors, that each model produced when used to predict on the test set. Since these maps only include the 40% test set of the fishnet grid cells, the full city is not represented. The residuals range from 0 to roughly 20, a much greater range than the predictions themselves. However, the graphs above of the distribution of residuals across all models show that very few grid cells have errors over 3. Similar to the predictions maps, there are variations in errors across the different models; the risk factors model has lower errors in the city’s outer neighborhoods, and is closely followed by the community protective resources model. In contrast, the neighborhood characteristics model has much higher errors in the city’s outer neighborhoods and in the northern half of the city, likely because it contains neighborhood boundaries as a variable, which can be arbitrary at times.
Figure 3.1.4b
The discrepancies in the above maps demonstrate that each of these three models is capturing different spatial risk patterns. Combining these models into a single ensemble model improved their predictive power and decreased the model’s overall error. To create the ensemble model, we utilized a process that ran each model 100 times with randomized training and test sets - allowing for an average prediction for each grid cell. Averaging the results of many versions of the model allows for the reduction of errors within each grid cell. Then, we built a model containing these averaged predictions for each of the three models. Ultimately, the process minimizes the error from each of three models, and runs them as a larger, combined model.
A successful ensemble model requires that the input models not be strongly correlated with one another. Therefore, each model’s variables were culled in order to reduce correlation of variables within the three different models. The final ensemble model contains the following variables:
As can be seen in the graph below, the risk factors model provides the strongest predictive power to the final model, followed by the community protective resources model. The neighborhood boundaries model contributed the least predictive power to the model, which is consistent with the higher level of errors that the model contained on its own.
Figure 3.1.4c
Combining the stories reduced the model’s overall error, lowering the mean absolute error to 0.415. Looking at a map of the predictions for the ensemble model shows that the combined model captures more of the nuance in the higher risk area toward the center, with predictions ranging from 0-8 compared to 0-3 for the individual stories. The map of residuals, again shown on the test set containing 40% of the data, shows that the range of errors has decreased, and is now 0-17 compared with a high of 20 in previous models.
Figure 3.1.4d
We validated this model through several measures of goodness of fit, which are explained in this section. First, we compared the results of the spatial risk model to a traditional kernel density map. Then, we conducted different cross-validation and spatial cross-validation tests to ensure that the model is generalizable, both on different fishnet test and training sets and across different areas of the city. The process that we used to conduct these tests is further detailed in the modeling appendix below.
As described earlier, kernel density maps, or hotspot maps, identify higher-risk areas of the city based on where overdose events have occurred in the past. It is important to evaluate if our model performs better than this traditional approach so that it can be used to inform and improve the way resources are allocated to combat the opioid crisis. The below graphs compare each of the models described above to a traditional kernel density approach by putting their predictions on the same scale and applying “risk levels” to that scale. Then, actual overdoses that occurred within the test set are counted based on which risk level they fall into. For this use case, the aim is to capture points in both of the top two risk levels; if all of the test set points were in the 90-100% risk category, the model would be overfit on the limited data that Providence has for 2017.
As can be seen in the below graphs, while the first three models predict only marginally better than the kernel density approach, the final ensemble model performs noticeably better in both the 70-89% and 90-100% risk categories.
Figure 3.2.1a
Cross-validation is important because it determines whether the model predicts well on many different subsets of the data, not just the initial test set. The cross-validation process involved splitting the fishnet into 100 random test sets and measuring average error for each model. The below histograms show the distribution of errors across these 100 random test sets for each of the four models. For all models, the error clusters below 0.5 overdose events. As with the distribution of errors shown for each model above, there are a few significantly larger outliers, likely accounting for the discrepancy between the prediction values and the areas with higher counts of overdoses. The histogram for the ensemble model shows that it has a higher number of test sets with a mean absolute error below 0.5 overdoses, and the value of its outlier errors is lower than that of the other models. This distribution of error indicates that across many subsets of the fishnet grid cells, the ensemble model performs with the least amount of error.
Figure 3.2.2a
Throughout the modeling process, the four models have been trained on random subsets of 60% of the city’s fishnet grid cells and tested on the remaining 40% of grid cells. However, testing the model on randomly selected grid cells in Providence is not enough to determine the generalizability of the model; the model also must be tested in different parts of the city, a process called spatial cross-validation. How well does the model predict risk in parts of Providence that have different characteristics? If there are discrepancies between neighborhoods, it is important to be aware of them when implementing any policies based on the model. The spatial cross-validation process involved running the models several times with each of Providence’s census tracts as a test set, predicting on that tract, and calculating the prediction error for each tract. The prediction error in this case was calculated by normalizing the average error for each census tract by the mean count of overdoses per grid cell in the tract. This was determined to be the best measure of spatial autocorrelation of errors for this use case and the process is further described in the modeling appendix.
The maps below show the outcome of the spatial cross-validation process. While the errors across the models are somewhat clustered, there are clear differences between the four models. The ensemble model shows the fewest tracts with high overdose prediction error. While there are higher errors in the southern portion of the city, this area includes neighborhoods such as Reservoir and South Elmwood that did not experience high numbers of overdoses in 2017. Regardless, this distribution of errors must be kept in mind when utilizing the risk predictions.