Return to MUSA 801 Projects Page
This project was produced as part of the University of Pennsylvania’s Master of Urban Spatial Analytics Spring 2019 Practicum (MUSA 801) taught by Ken Steif, Michael Fichman, and Matt Harris. We would like to thank the Philadelphia Fire Department for providing useful information and data.
This document is intended to enable others to replicate the methodology of a study of structural fires and hydrant inspections. We first introduce the context of this project, followed by the data, methodology, modeling, and guiding appendices. A table of contents is provided for easy navigation, along with select hyperlinks throughout the document.
According to the data from Philadelphia fire department, the recordings of fire accidents in Philadelphia has increased sharply since 2017. Despite the fact that the spike in the count of fires after 2017 was a result of the change in the procedure for recording fires(More information can be found in this article), how to inspect fire hydrants more efficiently and deal with fire risk has become a new challenge for the fire department.
A hydrant that is more likely in need of inspection could be the last on the list, and firefighters have to take time and energy to inspect each hydrant. To address this, we propose using past data on fires and factors related to fire accidents to identify areas in Philadelphia where fires are more likely. To be more specific, we decided to build a latent risk model. Then, based on our modelling result, we also include about using social, hydrant and land use type to extend the fire model to a prioritization to identify which hydrants are more urgent to be maintained. Taking into consideration both fire risk and other related priorities, we aim to generate a list for local engines to inspect hydrants and for the PWD to then maintain them. This would ensure that in each engine district, most hydrants are in good condition most of the time and allows firefighters to focus on a few hydrants rather than inspecting all hydrants.
To summarize, the use case is:
Prioritize inspections for local fire engine districts
Ensure that at all times, especially in areas at high risk of fire, most hydrants are in good condition
Improve the inspection efficiency and reduce the need to inspect good hydrants
Our goal is to provide decision-makers with insight regarding the built-environment and intrinsic characteristics that contribute to the priority of fire hydrant inspections with the distribution of fire fire risk over the City. Therefore, besides fire and hydrant information, factors that may outline the fire causes are considered in our data selection, including public safety, demographics, properties, facilities and environment. For our analysis, we primarily use:
Data from Philadelphia Fire Department and Philadelphia Water Department in order to work on the most accurate and updated information for fire incidents and fire hydrants;
Open data source (OpenDataPhilly) so that other municipalities with similar open data repositories can refer to or even reproduce our analysis.
The data was also “wrangled” before being explored in the following section. This process included various transformations of the data in order to optimize predictive ability of each variable. For details on this procedure, please see Feature Engineering part. Through exploratory analysis and the modeling process, the final dataset was narrowed down to a set of best predictors. For results on how the data ultimately performed in the models, please see Model Building.
A hydrant should have a higher priority of being inspected if
there is a latent risk of fire in the vicinity;
the hydrant is close to social impact factors such as the proximity of vulnerable population and certain facilities, as it is not feasible to model hydrant conditions.
Based on the above, we built a latent risk model to predict the area with higher fire risk. A latent fire risk model is an alternative approach that utilizes neighborhood, environmental, and demographic variables to help identify the distribution of fire counts; it does not assume that past fires are the best indication of future ones and therefore allows risk to vary across the city. In order to best account for this variation, the city was divided into 800*800 foot grid cells, translating to a city block level unit of analysis.(Given that the code by the National Fire Protection Association states that the maximum distance between the hydrants should not exceed 800ft for detached one- and -two family dwellings, and that this for other buildings shall not exceed 500ft, we decided that an appropriate unit for analysis would be a 800x800ft grid cell size. )
Together, the grid cells form a “fishnet.” Mapping fires to the fishnet clearly conveys the rarity of such an event, with 43% percent of grid cells containing at least one fire. This observation subsequently shaped our modeling approach. In addition to mapping fires to the fishnet, other variables were also added to the fishnet, so that each grid cell contained risk factors, built environment data, and demographic characteristics, as well as data about time-space characteristics.
In addition to the predicted result of fire risk model, three extras factors which we felt could be important will also be taken into consideration when figuring out the final priority of fire hydrants in each engine district. (including Industrial Area, Social Impact and Hydrant Age, the calculation method will be introduced in part 6).
In order to predict fire risk, in each location citywide, we need to account for the count of fires as well as measures of exposure to risk factors associated with fire, like blight. The feature engineering process is the process for measuring this exposure.
To best extrapolate the relationship between features and fires, we “engineered” features to include in our fishnet. Many of these variables have more explanatory power as varied spatial measurements, by aggregating and engineering features, we adopted:
1. Kernel Density: This is the average spatial density of the certain features per grid cell.
2. Count: This is the count of certain features per grid cell.
3. Nearest Neighbor Distance: The average distance from each grid cell to the nearest certain number of features.
The features used in the model were classified into four categories: built environment factor, risk factor, demographic factor and time-spatial factor.For an explanation of which measurements were used for each variable, please see Appendix: Data Dictionary.
In 1803 Frederick Graff, Sr., designed for the then recently constructed Philadelphia hydrant, a stand-pipe intended to remain permanently in position and to be constantly charged with water.After that, the hydrants in Philadelphia continued to expand as the city grew:In the 1900s, fire hydrants were mainly located in the city center; In the 1950s, the distribution of fire hydrants gradually began to spread around the city; In the 2000s, the fire hydrants had basically covered the whole city, and we can find that the fire hydrants density in the downtown area is greater than the surrounding area.
In the present day, there is the highest density of hydrants in center city and the density seems to dissipate outwards from the center city. This could mean that fire fighters spend more time inspecting hydrants in the center city due to the higher concentration. Of course, the high fire hydrant density in central city may also be related to the high frequency of fire accident in the downtown area.
Before jumping into feature engineering, it is worth studying the spatial correlation of fire incidents to understand if fires have a tendency to cluster in Philadelphia. We explored the spatial distribution of fires (2018 data) using the Local Moran’s I statistics for spatial auto correlation. The GIF indicates different fire clusters at different P-value thresholds. Does the presence of one fire event correspond with a higher volume of others nearby? If so, we would expect to observe clusters of fires - as opposed to a random distribution of these events across the whole Philadelphia - and indeed, we see that this is the case in the map below. In fact, the amount of clustering is statistically significant.