Across the United States, many cities have committed to reducing the number of automobile collisions on their streets. Unfortunately, finding ways by which these collisions may be prevented is a challenge that many cities struggle to overcome. While a great deal of academic attention has been devoted to explaining the nature and causes of varying types of automobile collisions, predictive modeling offers a different approach. Our aim is to provide insight into the degree by which built-environment characteristics of roads, and their interaction in space, can be used as predictors of collision-risk. If succesful, we hope to arm transportation professionals and policy-makers with a new tool for informing and targeting their built-environment interventions.
Figure 1.1 shows a map of overall collision locations in Louisville. Note that our analysis only considers roads that are within the municipal boundary of Louisville, as outlined in black in the figure.
For this analysis, we have partnered with the city of Louisville, Kentucky. In Louisville, traffic collisions are of the utmost concern. From 2013-2016, collisions in Louisville increased by 37.5% in aggregate. Compared with the rest of the United States, Louisville experienced 34% more traffic-related deaths than the national average.
In its annual “Traffic Collision Facts” report, the Kentucky State Police describes the spatial distribution of collisions within the city, and the road types which seem to suffer the most collisions:
Figure 1.1.1 shows two areas in the city (Downtown and Bardstown Rd) that have exhibited some of the highest concentrations of accidents in 2017, according to the Kentucky State Police.
It is reasonable and expected that roads and intersections with heavy traffic volume suffer the highest share of collisions. However, our analysis aims to explore whether additional factors may play a role. We seek to learn why, even along the same road segments (which likely have comparable traffic exposure), there is notable variability in the risk of collision (see Figure 1.2.1). We aim to explore the reasons and factors which shape this observed variability in risk.
As posited above, a significant portion of the factors that contribute to collision-risk are straightforward. Most would expect that busy, high-speed roads endure a disproportionate number of automobile collisions. What is not straightforward, however, is that the segments that comprise these busy, important roads demonstrate substantial, currently unexplained, variability (Figure 1.2.1).
It is our hypothesis that road characteristics and built environment features play an important role in influencing collision-risk. To test this hypothesis, we encode the network’s physical features, as well as Louisville’s demographic and spatial attributes into our dataset. This will allow us to model the differences in the road network and the degree to which these differences are associated with collision-risk across the city.
Presently, policy officials pursue interventions based on observing the density of collisions after the fact, without a systematic understanding of the reasons why these density differences exist. Our approach seeks to predict where automobile collisions are likely to occur at the segment level based on the characteristics of each segment, thereby providing policymakers with the means to be more proactive in their interventions and their future designs. Additionally, informed by a sense of just how much various categories and individual features are associated with collision-risk, policymakers can streamline their intervention strategies and allocate resources effectively.
Our analysis begins with Louisville’s road network (street-centerlines). The first step in the modeling process is to break this continuous shapefile into segments. Each segment will be treated as an observation in the dataset. Specifically, we broke the network into two district segment types- intersections and roads. All additional data which we collect will be attributed to these road/intersection segments as features in the dataset.
The dependent variable which we will predict for each of the segments is the count of collisions per year. To generate this dependent variable, we used ArcGIS to spatially join the segments and crash locations from 2017, thereby attributing to each segment the number of crashes which occurred closest to that segment in that year. Please see section 2 “Data Collection and Wrangling” for additional details.