Photo Credit: Google Street View

1. Introduction

1.1. Abstract

This project is part of the Spring 2023 Master of Urban Spatial Analytics Practicum course at the University of Pennsylvania taught by Michael Fichman, and Matthew Harris.

Our project explores the experiences of properties and neighborhoods after fires occur with the purpose of developing a predictive model that can inform the Philadelphia Fire Department (PFD) on the likelihood of redevelopment or vacancies for fire impacted structures. This intelligence will allow PFD and partner agencies to understand when pro-active expertise and services might have their highest impact, and what targeted interventions after a fire will provide the fastest way to recovery.

1.2. Motivation & Use Case

In 2022, there were 1.2 million structure fires in the countrythat led to 2,500 deaths ā€” 276 of them children. Last year major cities like Philadelphia and New York grappled with severe and deadly structure fires. In Philadelphia specifically, 41 lost thier lives from fires, while nearly 200 were injured, and thousands were displaced.

As it stands, the PFD has very limited knowledge about what happens after their job is complete. There are currently no programmed set of economic development interventions that are used by public agencies in Philadelphia as a response to fire incidents. The PFD expressed their desire to better understand and predict consequences of a fire in order to better understand recovery patterns in the city.

Through a storytelling lens that will contextualize our research at the incident level, this project development will be the first to provide the PFD with after fire predictions for each property by fire severity, and visualize it to allow them to study and gain understanding of aftermath trends. Our application will be be used as an interactive tool to assist PFD and partner agencies to understand when pro & post-active expertise and services might have their highest impact.

1.3. Understanding Fire Incidents

With the help of the PFD, we were given access to proprietary data that consist of rich and extensive fire data collected by the department dating back to 2009.

To better understand the outcomes of fire impacted fires, we first conducted preliminary literature review research on why fires occur in the first place. The following model, originally developed by Charles Jennings(1996), is a conceptualized model that represents the interrelationships between environmental, structural, and human factors as they relate to fire. We find this model useful as a way to develop more powerful predictors of the incidence of fire and nuanced model to determine their social and economic impacts in the future.

2. Exploratory Analysis

2.2. Exploring Fire Incidents

To begin our exploratory analysis, we first tried to understand the complex nature of fire incidents in the City, This exploratory analysis answered many preliminary questions we had about the types of fires incidents occurring, where they were occurring and their severity. We first examined the fire departmentā€™s data to find any clear signals of whether a property was going to be repaired, experience vacancy or be sold. Their datasheet includes fire incidents, with locations, time, investigated causes, the severity.

2.2.1. How Many Fires Occur Per Address?

There have been at least one incident of a fire for over 15,000 structures in Philadelphia.

2.2.2. Fire Frequency over Time

There is little unique patterns to the data, and it appears that fires have been relatively consistent over the last 15 years, with little to no seasonal variation. - Note: fires dip from July to September

2.2.3. Property Use

Focusing on residential count

Non-residential buildings have very different types of fires and consequently very different post fire outcomes than residential buildings (supercategory 4).For this reason, we omitted these structures and were still left with rich data on residential properties.

Variable Property Use Supercategory Share (%)
Property_Use_SuperCat 4 84.57
Property_Use_SuperCat 5 6.17
Property_Use_SuperCat 2 2.18
Property_Use_SuperCat 1 1.87
Property_Use_SuperCat 0 1.59
Property_Use_SuperCat 8 1.36
Property_Use_SuperCat 3 1.08
Property_Use_SuperCat 7 0.57
Property_Use_SuperCat 9 0.39
Property_Use_SuperCat 6 0.17
Property_Use_SuperCat U 0.05
Property_Use_SuperCat N 0.01

2.3. Measures of Fire Severity

When measuring outcomes of buildings that experience a fire, the obvious question is how severe the fire was. There are multiple possible determinants in the data. We ended up using the first one, the ordinal variable labeled ā€œfire spreadā€. Hereā€™s how we made our decision:

  • Disruption of fire spread within he fire spread severity. SO using fire spread is ideal

  • Incident type: weighted to one cateogry, not as useful

  • No record for a lot of fire incidents

2.3.1. Fire Spread

In our research, time had a big correlation with damage. The longer fire burns close to metal and concrete, the hotter those critical infrastructure pieces get, and the less able they are to hold weight.

Source: https://www.homego.com/blog/house-fire-damage/

This was the simplest and most normally-distributed feature as compared to the three others we considered.

2.3.2. Floor Damage Counts

Originally these counts seemed useful as direct measures of damage, but the large amount of ā€œno recordā€ instances means that it canā€™t function as a reliable metric.

2.3.3. CAD Code Description

We have more specific categories, like CAD (Computer Aided Dispatch) nature code description to help us better understand the types of fire occurring, their frequency and to what extent the fire spread.

2.3.4. Incident Type

Incident type measures whether the interior or exterior of the structure has collapsed. Considering outcomes like demolition, vacancy, and major construction for homes affected by a fire, then collapse seems like a strong candidate for collelation.

However, the vast majority of cases are type 1110: no collapse. This doesnā€™t make it a very helpful measure for severity, especially when we can see fire spread varying so heavily inside these categories.

As a result, we picked fire spread as our measure of severity.

2.3.5. Outliers

We will classify an outlier as an observation more than 3 standard deviations away from the population mean.

The mean number of fires per location is 1.115, weighted by the large amount of places where only 1 fire has occurred. The standard deviation of this fire population is 0.521. The result, 2.678, means any of the 293 locations with three or more fires will be classified as an outlier.

Apart from centers of population, there is not an obvious geographic distribution of the outliers. More research could be useful with ACS data to determine correlations.

2.3.6. Understanding Fire Incident by Property Types

There is a difference among property use, in that 215, the designation for ā€œschools, high/junior/middleā€ is now in the top 3. Code 500, the designation for ā€œmercantile, business, otherā€, also has a greater share than in the complete data set.

Regardless of the cause, these school and commercial fires are very different in their causes and outcomes than a residential fire. This outlier analysis supports our decision to remove the non-residential fires from our research in order to focus what narratives we can observe.We also believe that there needs to be a different policy and programming approach to addressing commercial and school fires.

2.3.7. Understanding Fire Incident by Owner Type

Having established a likelihood of causality, we looked further into different variables to inform our predictions. This is the same type of graph, but only features properties that had fires, and separates the counts by whether the units was occupied by its owner, by a renter, or couldnā€™t be measured. Often large condo buildings and multi-family units couldnā€™t be effectively measured. The most striking observation we had is that owner-occupied units had many more permits in the year following the fire as compared to the renter units - see the difference in those four bars around the center. On the other hand, renter units experienced more vacancies, as shown in this graph.

This categorization of ā€œownerā€, ā€œrenterā€, and ā€œNAā€ was made by comparing the ownerā€™s address in the OPA dataset to the propertyā€™s actual address. If they werenā€™t

On the other hand, renter units experienced more vacancies, as shown in the graph below.

2.3.8. Limitations of OPA Category Code

We attempted to extend our analysis of property types by joining addresses in the Fire Departmentā€™s dataset with their OPA property database counterparts, and sorting based on OPAā€™s ā€œcategory codeā€ field. This field was promising because it labels properties as the broad but useful categories like mixed-use, multi-family, and single-family. However, we noticed that large, multi-unit condo buildings that would experience fires similarly to multi-family units were in fact labeled as single family. This is perhaps because they are more often occupied by the single owner of the condo? Unfortunately, sorting out the OPA categories was a time investment we couldnā€™t make with our schedule, and we had to move forward. We included this note to advise anyone attempting to join such data with their cityā€™s property data set.

3. Data Wrangling: Panel Data Analysis

A key element to the exploratory analysis is understanding how fires relate to properties and other relevant data that will help us better predict post fire impacted properties. To start, we have decided to work with 311 complaints, permit data, and property assessment data to further explore and craft different post fire scenarios.

3.1. Fire Panel - Initial, Count, and Final Panel

The fire panel is created as the base template containing the key information related to fire incidents. The dataset first undergoes numerous operations to refine the information.

The current structure of the initial dataset contains information regarding fire incidents from January 2009 - December 2022. Despite having the same address, there will be a new record of fire incident for that address as each fire has a unique incident number. In order to see the count and severity of properties with and without a fire incident we create a panel to display the observations of every possible address and time combination.

How We Tie Fires to Outcomes

First, we started with every address in Philadelphia (Office of Property Assessment data), then filtered to residential properties only. Next, we Assigned time slots for every quarter of the 14-year time span, 2009-2022. Then we joined this with PFDā€™s structure fire data and then again with Open Data 311 reports and permit reports, formatted similarly. Finally, we tied fires to their outcomes within a certain time period. We currently use ā€œoccurred the same quarter or the next quarterā€, aka 6 months maximum. The diagram below illustrates our panel process.

An important note: though we were able to conveniently parse the large majority of the addresses into a format that would match the OPA property addresses, we had to consolidate fires at multi-family units together under a single address in order to combine them with the data. We could not accommodate different unit numbers. This means that those large properties may have had multiple fires, but they might have occurred at different units within that overall property. More effective matching would make for a better analysis of large multi-family units. Removal of outliers with 3+ fires at a single address happened before this step.

3.2 Combining Datasets to the Panel

311 Data

This bar chart visualizes the first step in the panel process of filtering 311 Data to the appropriate categories. Data is limited to only 2014-2020.

License & Inspections Data

Designations for License & Inspection(L&I) data switch during certain periods, but it does cover the span of all the fire data.

We combine these for better coverage 2014-2020, but we take out the calls that happened during the same time on the same address.

Permit Data

We did not filter further into permit types for multiple reasons. Foremost, the categories used by L&I change over time, so filtering for just one type of permit or a collection of permits will result in lots of data being left out. See the graph below, which filters for a selection of permits:

Also, any official action on the property can be seen as a signal of recovery. Even demolition can be a step toward recovery, as it either paves the way for reconstruction or at least eliminates a vacant and potentially dangerous building from the street. Hereā€™s the full distribution:

Real Estate Transfer Data

3.3 Calculating Outcomes

The objective when calculating outcomes is to know which fires had vacancy complaints, permit requests, or real estate transfers in the months or years following their fire.

Variable Median Years Since Fire
n_permits 0.75
n_transfers 1.75
n_Vacant 1.00

Results

The outcomes here can be used as the dependent variable for our model, when combined back with all the other properties. Our 2-years-after-fire results panel has 14,545 observations, down from the original dataā€™s ~21,000, so we lost about 40% of incidents.

Number of open data matches with fire data

The table below is the number of fires that had a particular outcome. These are not mutually exclusive, so some fires had two or all three outcomes.

Total Outcomes Observed, Up to 2 Years After Fires
Total_Fires Total_Vacant Total_Permit Total_Transfers
13535 1644 3767 4709

3.4 Plotting Measurements

Measuring Time from Fire to Outcomes

Hereā€™s our validation for picking 2 years as a cutoff.We originally picked 6 months as an arbitrary cutoff, but talking with different people about their recovery experiences revealed that we should expand our timeline. One classmate had to stay out of her apartment for 9 months until repairs were complete, and a house we could observe through Google Streetview was not repaired or sold until years after its vacancy complaint.

The graphs below show the distribution of vacancy reports, permits, and property transfers (sales) when measuring their time from the fire.

All seem to occur more frequently closer to the fire, with permits and vacancies most frequently occurring within a year after the fire and decreasing at a logarithmic rate. Transfers has a more linear relationship, starting high and gradually decreasing over time.

In order to see whether this trend is unique among properties with a fire, we needed to compare it to a control group.

3.5 Propensity Match Set

In order to establish whether fires are likely have an impact on these three outcomes, we created a dataset that matched 12,000 properties that had fires to 12,000 individually similar properties that did not. That is to say, ā€œTake two properties of similar qualities. If one experiences a fire, then from that time of the fire, will it have different outcomes than the one that doesnā€™t?ā€

We then measured the time difference between the propertyā€™s fire and the reports of permits, transfers, and vacancies, plotted here. Each of these graphs shows a different outcome.

The bottom axis is the time between when the fire happened and when the outcome was recorded, rounded into quarters per year. If a permit was requested in the same quarter as the fire, itā€™s at 0. If a permit was requested before or after the fire, itā€™s further from 0 based on when. A property that had no fire was given the quarter of its matched counterpart in the data set. The y axis shows the count of permits, transfers, or vacancies recorded at those times.In each of these graphs, both groups of properties experienced similar rates of outcomes before fires, then diverged in the quarter of the fire.

For permits, there was a spike within 6 months that continues until a year and a half. So we see that repairs seem to happen in the first year, but continue at an increased rate for a long time after.

For Transfers, the trend shows a rising rate of sales regardless of fires. However, properties that experienced a fire do show at least 50% more sales in the first year. So we see that these properties are still more likely to be sold or transferred, especially within two years after a fire.

Vacancies are the only outcome where we see increases leading up to fires, which matches research we read that concludes they are a fire hazard. There is still a spike after the fire in vacancy complaints, which levels off two years after the fire, and returns to the same area as no-fire properties after four years.

Spatial Relationships between Fire and No Fire sets

Spatially, there arenā€™t one or two areas in Philadelphia experiencing an abnormal amount of these outcomes beyond where they happen normally, but they still merit inclusion in the model.

Here are two maps that illustrate how our matched properties have the same spatial distribution. Youā€™ll notice how Center City has very few samples in this set due to the difficulty of matching individual units of large multi-family or condo buildings from the fire data set with their OPA dataset counterpart.

For repairs, we know that fires are associated with an increased likelihood of permit requests, especially within the first year. The interesting, though perhaps obvious spatial pattern here is how the permits for fire-stricken properties occur in areas outside of the normal permit donā€™t have permit requests as frequently.

Due to the high volume of transfers and the relatively smaller difference between properties with or without fires, itā€™s hard to draw any conclusion from property transfer maps.

Vacancies follow a similar spatial pattern to permits, in that where there are increased numbers of vacancies, they are usually reported within the first year after the fire, but with a bit more spread into the second and third years.

As such, we concluded that outcomes tend to follow patterns where they normally occur, but when they are outside those areas, they tend to be recorded within the first year. Spatial elements and neighborhood controls still merit inclusion in our model.

The process for creating the set follows the steps in this tutorial. In summary, we create a rough model that estimates the likelihood that a building will have a fire, then match properties with similar propensity-for-fire scores and other variables using a matching algorithm. The initial contrast between the properties that had fires and those that didnā€™t is here:

Propensity Score Estimation

Here are the very rough variables for fire likelihood. We use predictors from the OPA dataset that monitor the quality of the home, the demographics of the area, include the current owner occupancy comparisons we previously engineered, and include the neighborhood name as a categorical variable to control for neighborhood effects. Again, this model could be vastly improved on its own, but the goal here is to simply get the overall sample distributions of these variables to be the same.

# Propensity Score Estimation
#Getting Errors following chunk 
g(opa_sf)
## Rows: 483,260
## Columns: 37
## $ address                   <chr> "934-36 S 9TH ST", "2206 CEDAR ST", "1601 PAā€¦
## $ category_code             <dbl> 3, 1, 3, 3, 3, 2, 3, 2, 1, 3, 3, 3, 1, 1, 3,ā€¦
## $ category_code_description <chr> "MIXED USE", "SINGLE FAMILY", "MIXED USE", "ā€¦
## $ building_code             <chr> "Y50", "V10", "T38", "Y50", "S30", "U50", "Yā€¦
## $ building_code_description <chr> "STR/OFF+APT 3 STY MASONRY", "PRIV GAR 1 STYā€¦
## $ total_area                <dbl> 1840, 776, 1829, 1280, 820, 1600, 2884, 1244ā€¦
## $ total_livable_area        <dbl> 1840, 258, 2247, 2168, 1368, 2540, 2280, 128ā€¦
## $ market_value              <dbl> 292100, 9300, 439000, 364000, 80700, 412200,ā€¦
## $ mailing_street            <chr> "934-36 S 9TH ST", "2353 E SUSQUEHANNA AVE",ā€¦
## $ number_of_bedrooms        <dbl> NA, 0, NA, NA, NA, 0, NA, 0, 3, NA, NA, NA, ā€¦
## $ number_of_bathrooms       <dbl> NA, 0, NA, NA, NA, 0, NA, 0, 1, NA, NA, NA, ā€¦
## $ number_stories            <dbl> NA, 1, NA, NA, NA, 1, NA, 2, 2, NA, NA, NA, ā€¦
## $ year_built                <chr> "1900", "1899", "1950", "1920", "1920", "189ā€¦
## $ geometry                  <POINT [Ā°]> POINT (-75.15825 39.93792), POINT (-75ā€¦
## $ Price_Sqft                <dbl> 158.750000, 11.984536, 240.021870, 284.37500ā€¦
## $ quality_grade_mod         <ord> C, NA, C, C, C, B-, C, C+, C+, C+, C, C, B, ā€¦
## $ condo                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAā€¦
## $ owner_occ                 <lgl> TRUE, FALSE, TRUE, FALSE, FALSE, NA, FALSE, ā€¦
## $ fire                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,ā€¦
## $ GEOID                     <chr> "42101002400", "42101016002", "42101037300",ā€¦
## $ NAME                      <chr> "Census Tract 24, Philadelphia County, Pennsā€¦
## $ Total_Pop                 <dbl> 4892, 4547, 5632, 5247, 6631, 2537, 5100, 40ā€¦
## $ Med_Income                <dbl> 81476, 72857, 81102, 63726, 30958, 18310, 19ā€¦
## $ Total_Units               <dbl> 2723, 2180, 2733, 2340, 2329, 1333, 1635, 19ā€¦
## $ B25002_001M               <dbl> 336, 466, 298, 361, 320, 194, 229, 247, 164,ā€¦
## $ Total_Vacant              <dbl> 277, 156, 95, 92, 314, 306, 364, 59, 427, 32ā€¦
## $ B25002_003M               <dbl> 135, 128, 86, 99, 142, 109, 127, 48, 151, 15ā€¦
## $ Perc_White                <dbl> 0.77473426, 0.90323290, 0.84499290, 0.801410ā€¦
## $ Perc_Vacant               <dbl> 10.172604, 7.155963, 3.476034, 3.931624, 13.ā€¦
## $ name                      <chr> "BELLA_VISTA", "FISHTOWN", "PACKER_PARK", "Lā€¦
## $ listname                  <chr> "Bella Vista", "Fishtown - Lower Kensington"ā€¦
## $ mapname                   <chr> "Bella Vista", "Fishtown - Lower Kensington"ā€¦
## $ shape_leng                <dbl> 8360.054, 25973.396, 21816.018, 12975.343, 3ā€¦
## $ shape_area                <dbl> 4253835, 33605828, 28888967, 10345497, 39024ā€¦
## $ cartodb_id                <int> 105, 87, 154, 110, 73, 82, 85, 96, 143, 85, ā€¦
## $ created_at                <dttm> 2013-03-19 13:41:50, 2013-03-19 13:41:50, 2ā€¦
## $ updated_at                <dttm> 2013-03-19 13:41:50, 2013-03-19 13:41:50, 2ā€¦

m_ps <- glm(fire ~ total_livable_area + number_of_bedrooms + Price_Sqft + Med_Income + Perc_White, 
            family = binomial(), data = opa_sf)

summary(m_ps)
## 
## Call:
## glm(formula = fire ~ total_livable_area + number_of_bedrooms + 
##     Price_Sqft + Med_Income + Perc_White, family = binomial(), 
##     data = opa_sf)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.9111  -0.2798  -0.2298  -0.1691   3.5012  
## 
## Coefficients:
##                         Estimate    Std. Error z value             Pr(>|z|)    
## (Intercept)        -2.5045028196  0.0316303650 -79.180 < 0.0000000000000002 ***
## total_livable_area  0.0000779289  0.0000099151   7.860  0.00000000000000385 ***
## number_of_bedrooms -0.0325684327  0.0065788315  -4.950  0.00000074027266693 ***
## Price_Sqft          0.0001671883  0.0000490197   3.411             0.000648 ***
## Med_Income         -0.0000190141  0.0000006666 -28.526 < 0.0000000000000002 ***
## Perc_White         -0.8161771801  0.0464654065 -17.565 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 116699  on 459483  degrees of freedom
## Residual deviance: 113140  on 459478  degrees of freedom
##   (23776 observations deleted due to missingness)
## AIC: 113152
## 
## Number of Fisher Scoring iterations: 7


m_ps2 <- glm(fire ~ total_livable_area + number_of_bedrooms + Price_Sqft + Med_Income + Perc_White + owner_occ + quality_grade_mod + name, 
            family = binomial(), data = opa_sf)

summary(m_ps2)
## 
## Call:
## glm(formula = fire ~ total_livable_area + number_of_bedrooms + 
##     Price_Sqft + Med_Income + Perc_White + owner_occ + quality_grade_mod + 
##     name, family = binomial(), data = opa_sf)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.6095  -0.2708  -0.2086  -0.1536   3.5200  
## 
## Coefficients:
##                               Estimate    Std. Error z value
## (Intercept)               -2.726574071   0.320072297  -8.519
## total_livable_area         0.000065484   0.000017652   3.710
## number_of_bedrooms        -0.006939026   0.008856605  -0.783
## Price_Sqft                -0.000580784   0.000160752  -3.613
## Med_Income                -0.000003468   0.000001236  -2.806
## Perc_White                -1.294826381   0.121695239 -10.640
## owner_occTRUE             -0.401829791   0.020504481 -19.597
## quality_grade_mod.L        0.937387261   0.357311235   2.623
## quality_grade_mod.Q        1.287159272   0.294890872   4.365
## quality_grade_mod.C        0.479742552   0.314512388   1.525
## quality_grade_mod^4       -0.039290798   0.331046972  -0.119
## quality_grade_mod^5       -0.196691158   0.289911432  -0.678
## quality_grade_mod^6       -0.198300099   0.253106661  -0.783
## quality_grade_mod^7        0.284567186   0.278280452   1.023
## quality_grade_mod^8        0.267728359   0.285313202   0.938
## quality_grade_mod^9       -0.187862970   0.294450683  -0.638
## quality_grade_mod^10       0.088987818   0.314363469   0.283
## quality_grade_mod^11      -0.264436206   0.270797457  -0.977
## quality_grade_mod^12       0.099100967   0.205897773   0.481
## quality_grade_mod^13       0.067231606   0.164034056   0.410
## quality_grade_mod^14       0.032619050   0.096262998   0.339
## nameALLEGHENY_WEST         0.079706124   0.316467762   0.252
## nameANDORRA               -0.749405433   0.579655132  -1.293
## nameASTON_WOODBRIDGE       0.072308326   0.377111911   0.192
## nameBARTRAM_VILLAGE        0.033715419   0.380262757   0.089
## nameBELLA_VISTA            0.370763549   0.387218656   0.958
## nameBELMONT                0.710784305   0.331162987   2.146
## nameBREWERYTOWN            0.266395168   0.315260297   0.845
## nameBRIDESBURG             0.600871860   0.338112152   1.777
## nameBURHOLME              -0.794576082   0.536611534  -1.481
## nameBUSTLETON             -0.350899356   0.317467683  -1.105
## nameBYBERRY               -1.063340048   1.043442666  -1.019
## nameCALLOWHILL           -10.230220523 124.577794860  -0.082
## nameCARROLL_PARK           0.329029365   0.312654475   1.052
## nameCEDAR_PARK             0.460544773   0.335103495   1.374
## nameCEDARBROOK            -0.779455611   0.333063699  -2.340
## nameCENTER_CITY          -10.546872534 206.076584964  -0.051
## nameCHESTNUT_HILL         -1.007597208   0.447433297  -2.252
## nameCHINATOWN            -10.482007455 142.546502210  -0.074
## nameCLEARVIEW             -0.055302118   0.365143233  -0.151
## nameCOBBS_CREEK            0.244438593   0.307264936   0.796
## nameCRESCENTVILLE          0.348368417   0.658062913   0.529
## nameCRESTMONT_FARMS       -9.967054054  86.482772945  -0.115
## nameDEARNLEY_PARK         -1.067670174   1.043465199  -1.023
## nameDICKINSON_NARROWS      0.511583043   0.335510943   1.525
## nameDUNLAP                 0.482066451   0.348213415   1.384
## nameEAST_FALLS            -0.163886644   0.372926354  -0.439
## nameEAST_KENSINGTON        0.909274844   0.321407197   2.829
## nameEAST_OAK_LANE         -0.416291813   0.343997101  -1.210
## nameEAST_PARK             -9.795231912 207.600350626  -0.047
## nameEAST_PARKSIDE          0.023324004   0.354709930   0.066
## nameEAST_PASSYUNK         -0.053744599   0.342860800  -0.157
## nameEAST_POPLAR           -0.514429438   0.774125153  -0.665
## nameEASTWICK              -0.582927658   0.387204154  -1.505
## nameELMWOOD                0.214500763   0.311169171   0.689
## nameFAIRHILL               0.905616014   0.316487971   2.861
## nameFAIRMOUNT             -0.501961799   0.389436426  -1.289
## nameFELTONVILLE            0.385865481   0.310785509   1.242
## nameFERN_ROCK              0.000441113   0.348453045   0.001
## nameFISHTOWN               0.582811221   0.310675270   1.876
## nameFITLER_SQUARE          0.159176414   0.546622019   0.291
## nameFOX_CHASE             -0.120424176   0.324234868  -0.371
## nameFRANCISVILLE           0.479178959   0.389811170   1.229
## nameFRANKFORD              0.674237403   0.304615204   2.213
## nameFRANKLINVILLE          0.582620155   0.319115769   1.826
## nameGARDEN_COURT          -0.171356937   0.482710497  -0.355
## nameGERMANTOWN_EAST        0.132341283   0.315956270   0.419
## nameGERMANTOWN_MORTON      0.186419511   0.327780979   0.569
## nameGERMANTOWN_PENN_KNOX  -0.298297834   0.487826950  -0.611
## nameGERMANTOWN_SOUTHWEST   0.343845455   0.318774292   1.079
## nameGERMANTOWN_WEST_CENT   0.043302672   0.354687854   0.122
## nameGERMANTOWN_WESTSIDE    0.551401493   0.351042687   1.571
## nameGERMANY_HILL          -0.066907672   0.503201713  -0.133
## nameGIRARD_ESTATES         0.155934087   0.332692445   0.469
## nameGLENWOOD               0.160394282   0.335328780   0.478
## nameGRADUATE_HOSPITAL      0.277014180   0.344756279   0.804
## nameGRAYS_FERRY            0.232726231   0.309987934   0.751
## nameGREENWICH             -0.290786006   0.416549173  -0.698
## nameHADDINGTON             0.244347250   0.310132785   0.788
## nameHARROWGATE             0.954423441   0.305936010   3.120
## nameHARTRANFT              0.637806381   0.308589949   2.067
## nameHAVERFORD_NORTH        0.517081377   0.371198031   1.393
## nameHAWTHORNE              0.417752219   0.410288007   1.018
## nameHOLMESBURG             0.401455307   0.306036080   1.312
## nameHUNTING_PARK           0.590689907   0.308113811   1.917
## nameJUNIATA_PARK           0.340984203   0.308681106   1.105
## nameKINGSESSING            0.127489568   0.310277026   0.411
## nameLAWNDALE              -0.209064286   0.312872078  -0.668
## nameLEXINGTON_PARK        -0.644981797   0.478531314  -1.348
## nameLOGAN                  0.288736072   0.310628812   0.930
## nameLOGAN_SQUARE           0.559780767   0.516958580   1.083
## nameLOWER_MOYAMENSING      0.061713360   0.313725220   0.197
## nameLUDLOW                 0.172537168   0.423133547   0.408
## nameMANAYUNK               0.245558405   0.340497634   0.721
## nameMANTUA                -0.018437466   0.331537248  -0.056
## nameMAYFAIR               -0.136676335   0.304222559  -0.449
## nameMCGUIRE                0.859021080   0.328967505   2.611
## nameMECHANICSVILLE         2.013701529   0.664927872   3.028
## nameMELROSE_PARK_GARDENS  -0.763696204   0.399119706  -1.913
## nameMILL_CREEK             0.607228085   0.324646249   1.870
## nameMILLBROOK             -0.697873789   0.477679383  -1.461
## nameMODENA                -0.207244720   0.355618058  -0.583
## nameMORRELL_PARK          -0.603447540   0.389873028  -1.548
## nameMOUNT_AIRY_EAST       -0.369363324   0.317154984  -1.165
## nameMOUNT_AIRY_WEST       -0.363947098   0.343970447  -1.058
## nameNEWBOLD                0.385035915   0.325042343   1.185
## nameNICETOWN               0.256968881   0.331318763   0.776
## nameNORMANDY_VILLAGE      -0.051145779   0.503838669  -0.102
## nameNORTH_CENTRAL          0.457325906   0.317524577   1.440
## nameNORTHERN_LIBERTIES     0.555713268   0.369957280   1.502
## nameNORTHWOOD              0.081781765   0.324815065   0.252
## nameOGONTZ                -0.025733958   0.315529177  -0.082
## nameOLD_CITY               0.797237167   0.783741091   1.017
## nameOLD_KENSINGTON         0.502255416   0.373822228   1.344
## nameOLNEY                  0.080938608   0.307032576   0.264
## nameOVERBROOK             -0.055923370   0.308891276  -0.181
## nameOXFORD_CIRCLE         -0.118525352   0.303437804  -0.391
## namePACKER_PARK           -0.047208343   0.420630480  -0.112
## namePARKWOOD_MANOR        -0.391265819   0.340731840  -1.148
## namePASCHALL               0.449275006   0.312843568   1.436
## namePASSYUNK_SQUARE        0.260309536   0.336685406   0.773
## namePENNSPORT              0.330197430   0.338807608   0.975
## namePENNYPACK             -0.569161869   0.375139254  -1.517
## namePENNYPACK_PARK       -10.225224468 160.885738744  -0.064
## namePENNYPACK_WOODS        0.166596447   0.460271240   0.362
## namePENROSE               -0.618618087   0.374085801  -1.654
## namePOINT_BREEZE           0.289825096   0.308959393   0.938
## namePOWELTON              -0.200346272   0.655622129  -0.306
## nameQUEEN_VILLAGE          0.437009777   0.382913293   1.141
## nameRHAWNHURST            -0.509259461   0.321554509  -1.584
## nameRICHMOND               0.720817925   0.300255468   2.401
## nameRITTENHOUSE            0.486223457   0.408995437   1.189
## nameRIVERFRONT             0.805747535   1.054652420   0.764
## nameROXBOROUGH            -0.114364462   0.342755877  -0.334
## nameROXBOROUGH_PARK       -1.297151346   1.043225779  -1.243
## nameSHARSWOOD              0.189058562   0.413750151   0.457
## nameSOCIETY_HILL           0.455819964   0.461374485   0.988
## nameSOMERTON              -0.378153278   0.318756689  -1.186
## nameSOUTHWEST_SCHUYLKILL   0.214029027   0.316062505   0.677
## nameSPRING_GARDEN          0.323917402   0.466769815   0.694
## nameSPRUCE_HILL            0.411284569   0.385805053   1.066
## nameSTADIUM_DISTRICT      -0.091993077   0.363525396  -0.253
## nameSTANTON                0.536179036   0.311882467   1.719
## nameSTRAWBERRY_MANSION     0.264499022   0.311424004   0.849
## nameSUMMERDALE             0.320939411   0.325088947   0.987
## nameTACONY                 0.382893460   0.306817238   1.248
## nameTIOGA                  0.456246540   0.313805667   1.454
## nameTORRESDALE            -0.219927455   0.367771953  -0.598
## nameUNIVERSITY_CITY        0.177307411   0.659587568   0.269
## nameUPPER_KENSINGTON       0.786364249   0.304565662   2.582
## nameUPPER_ROXBOROUGH      -0.523405508   0.379606754  -1.379
## nameWALNUT_HILL            0.381003866   0.352132610   1.082
## nameWASHINGTON_SQUARE      0.373081307   0.447403668   0.834
## nameWEST_KENSINGTON        0.588271537   0.312244385   1.884
## nameWEST_OAK_LANE         -0.359436758   0.310601617  -1.157
## nameWEST_PARKSIDE         -0.742922217   0.657007457  -1.131
## nameWEST_PASSYUNK          0.030623044   0.318407746   0.096
## nameWEST_POPLAR           -0.309267207   0.508574378  -0.608
## nameWEST_POWELTON          0.101340492   0.375006856   0.270
## nameWHITMAN                0.283553238   0.322857019   0.878
## nameWINCHESTER_PARK        0.250236182   0.431443791   0.580
## nameWISSAHICKON           -0.199753753   0.420465774  -0.475
## nameWISSAHICKON_HILLS     -1.602895326   1.042632391  -1.537
## nameWISSAHICKON_PARK     -10.007279396 393.921430019  -0.025
## nameWISSINOMING            0.266081958   0.305516305   0.871
## nameWISTER                 0.280385308   0.335948738   0.835
## nameWOODLAND_TERRACE     -10.421002162 106.663312922  -0.098
## nameWYNNEFIELD            -0.165254059   0.319315551  -0.518
## nameWYNNEFIELD_HEIGHTS    -1.668691669   0.898442886  -1.857
## nameYORKTOWN              -0.278795184   0.416842989  -0.669
##                                      Pr(>|z|)    
## (Intercept)              < 0.0000000000000002 ***
## total_livable_area                   0.000207 ***
## number_of_bedrooms                   0.433342    
## Price_Sqft                           0.000303 ***
## Med_Income                           0.005021 ** 
## Perc_White               < 0.0000000000000002 ***
## owner_occTRUE            < 0.0000000000000002 ***
## quality_grade_mod.L                  0.008704 ** 
## quality_grade_mod.Q                 0.0000127 ***
## quality_grade_mod.C                  0.127171    
## quality_grade_mod^4                  0.905524    
## quality_grade_mod^5                  0.497485    
## quality_grade_mod^6                  0.433354    
## quality_grade_mod^7                  0.306501    
## quality_grade_mod^8                  0.348056    
## quality_grade_mod^9                  0.523466    
## quality_grade_mod^10                 0.777121    
## quality_grade_mod^11                 0.328812    
## quality_grade_mod^12                 0.630295    
## quality_grade_mod^13                 0.681906    
## quality_grade_mod^14                 0.734720    
## nameALLEGHENY_WEST                   0.801148    
## nameANDORRA                          0.196064    
## nameASTON_WOODBRIDGE                 0.847944    
## nameBARTRAM_VILLAGE                  0.929349    
## nameBELLA_VISTA                      0.338313    
## nameBELMONT                          0.031847 *  
## nameBREWERYTOWN                      0.398110    
## nameBRIDESBURG                       0.075546 .  
## nameBURHOLME                         0.138679    
## nameBUSTLETON                        0.269026    
## nameBYBERRY                          0.308170    
## nameCALLOWHILL                       0.934552    
## nameCARROLL_PARK                     0.292628    
## nameCEDAR_PARK                       0.169337    
## nameCEDARBROOK                       0.019270 *  
## nameCENTER_CITY                      0.959183    
## nameCHESTNUT_HILL                    0.024325 *  
## nameCHINATOWN                        0.941381    
## nameCLEARVIEW                        0.879618    
## nameCOBBS_CREEK                      0.426305    
## nameCRESCENTVILLE                    0.596539    
## nameCRESTMONT_FARMS                  0.908248    
## nameDEARNLEY_PARK                    0.306215    
## nameDICKINSON_NARROWS                0.127312    
## nameDUNLAP                           0.166236    
## nameEAST_FALLS                       0.660327    
## nameEAST_KENSINGTON                  0.004669 ** 
## nameEAST_OAK_LANE                    0.226217    
## nameEAST_PARK                        0.962367    
## nameEAST_PARKSIDE                    0.947573    
## nameEAST_PASSYUNK                    0.875439    
## nameEAST_POPLAR                      0.506351    
## nameEASTWICK                         0.132201    
## nameELMWOOD                          0.490611    
## nameFAIRHILL                         0.004217 ** 
## nameFAIRMOUNT                        0.197418    
## nameFELTONVILLE                      0.214391    
## nameFERN_ROCK                        0.998990    
## nameFISHTOWN                         0.060662 .  
## nameFITLER_SQUARE                    0.770898    
## nameFOX_CHASE                        0.710332    
## nameFRANCISVILLE                     0.218975    
## nameFRANKFORD                        0.026870 *  
## nameFRANKLINVILLE                    0.067891 .  
## nameGARDEN_COURT                     0.722598    
## nameGERMANTOWN_EAST                  0.675319    
## nameGERMANTOWN_MORTON                0.569538    
## nameGERMANTOWN_PENN_KNOX             0.540880    
## nameGERMANTOWN_SOUTHWEST             0.280744    
## nameGERMANTOWN_WEST_CENT             0.902830    
## nameGERMANTOWN_WESTSIDE              0.116240    
## nameGERMANY_HILL                     0.894222    
## nameGIRARD_ESTATES                   0.639282    
## nameGLENWOOD                         0.632423    
## nameGRADUATE_HOSPITAL                0.421682    
## nameGRAYS_FERRY                      0.452798    
## nameGREENWICH                        0.485125    
## nameHADDINGTON                       0.430767    
## nameHARROWGATE                       0.001810 ** 
## nameHARTRANFT                        0.038749 *  
## nameHAVERFORD_NORTH                  0.163618    
## nameHAWTHORNE                        0.308586    
## nameHOLMESBURG                       0.189591    
## nameHUNTING_PARK                     0.055223 .  
## nameJUNIATA_PARK                     0.269312    
## nameKINGSESSING                      0.681154    
## nameLAWNDALE                         0.503999    
## nameLEXINGTON_PARK                   0.177711    
## nameLOGAN                            0.352619    
## nameLOGAN_SQUARE                     0.278882    
## nameLOWER_MOYAMENSING                0.844053    
## nameLUDLOW                           0.683449    
## nameMANAYUNK                         0.470802    
## nameMANTUA                           0.955651    
## nameMAYFAIR                          0.653241    
## nameMCGUIRE                          0.009021 ** 
## nameMECHANICSVILLE                   0.002458 ** 
## nameMELROSE_PARK_GARDENS             0.055690 .  
## nameMILL_CREEK                       0.061424 .  
## nameMILLBROOK                        0.144025    
## nameMODENA                           0.560046    
## nameMORRELL_PARK                     0.121669    
## nameMOUNT_AIRY_EAST                  0.244175    
## nameMOUNT_AIRY_WEST                  0.290020    
## nameNEWBOLD                          0.236187    
## nameNICETOWN                         0.437989    
## nameNORMANDY_VILLAGE                 0.919144    
## nameNORTH_CENTRAL                    0.149787    
## nameNORTHERN_LIBERTIES               0.133071    
## nameNORTHWOOD                        0.801212    
## nameOGONTZ                           0.934998    
## nameOLD_CITY                         0.309049    
## nameOLD_KENSINGTON                   0.179088    
## nameOLNEY                            0.792076    
## nameOVERBROOK                        0.856332    
## nameOXFORD_CIRCLE                    0.696087    
## namePACKER_PARK                      0.910639    
## namePARKWOOD_MANOR                   0.250841    
## namePASCHALL                         0.150974    
## namePASSYUNK_SQUARE                  0.439431    
## namePENNSPORT                        0.329765    
## namePENNYPACK                        0.129216    
## namePENNYPACK_PARK                   0.949324    
## namePENNYPACK_WOODS                  0.717387    
## namePENROSE                          0.098193 .  
## namePOINT_BREEZE                     0.348209    
## namePOWELTON                         0.759923    
## nameQUEEN_VILLAGE                    0.253755    
## nameRHAWNHURST                       0.113252    
## nameRICHMOND                         0.016365 *  
## nameRITTENHOUSE                      0.234509    
## nameRIVERFRONT                       0.444871    
## nameROXBOROUGH                       0.738635    
## nameROXBOROUGH_PARK                  0.213719    
## nameSHARSWOOD                        0.647715    
## nameSOCIETY_HILL                     0.323172    
## nameSOMERTON                         0.235489    
## nameSOUTHWEST_SCHUYLKILL             0.498296    
## nameSPRING_GARDEN                    0.487710    
## nameSPRUCE_HILL                      0.286404    
## nameSTADIUM_DISTRICT                 0.800223    
## nameSTANTON                          0.085583 .  
## nameSTRAWBERRY_MANSION               0.395703    
## nameSUMMERDALE                       0.323527    
## nameTACONY                           0.212048    
## nameTIOGA                            0.145970    
## nameTORRESDALE                       0.549840    
## nameUNIVERSITY_CITY                  0.788072    
## nameUPPER_KENSINGTON                 0.009825 ** 
## nameUPPER_ROXBOROUGH                 0.167953    
## nameWALNUT_HILL                      0.279257    
## nameWASHINGTON_SQUARE                0.404348    
## nameWEST_KENSINGTON                  0.059564 .  
## nameWEST_OAK_LANE                    0.247179    
## nameWEST_PARKSIDE                    0.258153    
## nameWEST_PASSYUNK                    0.923381    
## nameWEST_POPLAR                      0.543117    
## nameWEST_POWELTON                    0.786978    
## nameWHITMAN                          0.379801    
## nameWINCHESTER_PARK                  0.561917    
## nameWISSAHICKON                      0.634732    
## nameWISSAHICKON_HILLS                0.124207    
## nameWISSAHICKON_PARK                 0.979733    
## nameWISSINOMING                      0.383795    
## nameWISTER                           0.403939    
## nameWOODLAND_TERRACE                 0.922171    
## nameWYNNEFIELD                       0.604789    
## nameWYNNEFIELD_HEIGHTS               0.063266 .  
## nameYORKTOWN                         0.503607    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 101275  on 413591  degrees of freedom
## Residual deviance:  96721  on 413422  degrees of freedom
##   (69668 observations deleted due to missingness)
## AIC: 97061
## 
## Number of Fisher Scoring iterations: 13

Region of Common Support

Matching Algorithm

For the matching algorithm to work, we had to eliminate any observations that did not have the right data available. This cut down our count of fires from the original ~18,000 non-outliers we started with to around 12,000 incidents. Fortunately thatā€™s still a healthy sample.

See how the averages of the variables are much more aligned than before the modeling and matching process? With this set of 24,000 properties, we could move forward with analysis.

# Matching Algorithm
opa_nomiss <- opa_sf %>%  # MatchIt does not allow missing values
  dplyr::select(address, fire, one_of(opa_sf_cov)) %>%
  na.omit()

mod_match <- matchit(fire ~ total_livable_area + number_of_bedrooms + Price_Sqft + Med_Income + Perc_White + owner_occ + quality_grade_mod + name,
                     method = "nearest", data = opa_nomiss)
dta_m <- match.data(mod_match)

Difference in Means

#Difference In Means
dta_m %>%
  st_drop_geometry()%>%
  group_by(fire) %>%
  dplyr::select(one_of(opa_sf_cov)) %>%
  summarize_all(funs(mean))
## # A tibble: 2 Ɨ 9
##   fire  total_livable_area numbeā€¦Ā¹ Priceā€¦Ā² Med_Iā€¦Ā³ ownerā€¦ā“ Perc_ā€¦āµ qualiā€¦ā¶  name
##   <fct>              <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
## 1 0                  1315.    2.66    118.  39805.   0.581   0.239      NA    NA
## 2 1                  1329.    2.65    120.  39984.   0.582   0.241      NA    NA
## # ā€¦ with abbreviated variable names Ā¹ā€‹number_of_bedrooms, Ā²ā€‹Price_Sqft,
## #   Ā³ā€‹Med_Income, ā“ā€‹owner_occ, āµā€‹Perc_White, ā¶ā€‹quality_grade_mod

Where does recovery happen fastest or slowest?

We see that with both vacancies and permits, the falling likelihood of reports between 6 and 18 months levels off to a sustained, slowly decreasing amount. Perhaps these are different trends, in that permits and vacancies that happen within the first year are a natural part of fire recovery, but reports after that point indicate there are obstacles to recovery. One fire repair service lists ā€œlarge fireā€ recovery as taking ā€œa significant amount of time, typically at minimum several weeks, with the likelihood of it taking several months for repairs.ā€ Others have similar timelines.

Knowing that that permits have been delayed from the usual recovery timeline, we were curious if there were spatial or demographic differences among repair speed.

We mapped these permit reports across Philadelphia and separated the data by whether it occurred within a year or between 1-4 years after the fire.

There isnā€™t an obvious spatial with trend other than a slight decrease of repairs in Northeast and the Northern South Philadelphia.On the vacancy end, fire stricken properties appear to have more vacancy reports in areas of already-high vacancy rates, as defined by ACS Data for 2020. Vacant properties tend to linger in these areas as well. The two maps below divide the vacancy reports for the matched-set between reports within 1 year of the fire and reports between 1 year and 4 years after the fire. The blue dots appear to somewhat consolidate into those darker, higher-vacancy areas.
To be more precise, we measured each neighborhoodā€™s rate of its proprieties requests and vacancy records 1-4 years after a fire and compared it to the total total count of each for the fire-stricken properties:

This map shows which areas have the highest rate of vacancies that persist over a year after the fire, which signals obstacles to recovery. The map is incomplete due to the low rate of vacancies in some areas and the fact that we excluded some building types from the analysis that are clustered in Center City.

#g(panel_pm_4YForward)

This analysis does show some difference in these two metrics. Recovery may happen slowest in neighborhoods with the highest lingering vacancies and delayed permits.

Are investors buying fire-stricken properties at a higher rate than normal properties?

With our case study featuring two examples of investor-purchased homes, we were curious if this was a larger trend. In short, the transfer of properties between investors and homeowners is a minority of property transfers as a whole, but fires do affect sales in these categories in two different ways.

Weā€™re defining investors as those who own above 5 properties in the OPA dataset and those that have ā€˜LLCā€™, ā€˜LPā€™, ā€˜REOā€™, ā€˜Investmentsā€™, ā€˜Homesā€™, ā€˜Trustā€™, ā€˜Corpā€™, ā€˜Incā€™ in their name. From this criteria, we created a list of investors to find in the matched set.

Homeowner to Investor, Investor to Homeowner

Hereā€™s the same type of graph as the property sales, but counting sales that we classify as ā€œhomeowner-to-investorā€ based on their name and how many properties they own. The rate of homeowner-to-investor sales is steadily increasing over time, but fires are associated with a nearly 100% increase in the first year, with the trend maintaining over time.

Conversely, if we look at investor-to-homeowner sales, thereā€™s a lag time where investor-owned properties with fires are sold at a lower rate, but then investor sales pick up after six months. This may indicate a flip-sale trend, with a purchase and renovation period in the couple quarters, followed by increased sales from the investors.

3.6 Exploratory Analysis Conclusion

One house, 3 Outcomes

Our exploratory analysis led us to one property in Northern Brewerytown that demonstrated how all three outcomes of vacancy, permits and sales can occur at one property.

Photo 1&2: is the property just one month prior to the fire incident. As you can see, the property next to it is vacant. Photo 2 is the same property a year later. Since that last year, the property had received several 311 complaints about its vacancy to which L&I responded with a vacancy inspection. Property cleans up just so happened to be active when Google Streetcar was driving by.

Photo 3: After the fire, and the L&I inspections, the property, and its neighbors sat vacant for several years displayed in photo 3. According to the Department of Records, it was finally sold in 2020 for $80,000. L&Iā€™s permit data also received many construction permit requests for it during this period.

Photo 4: A local developer bought the house on the left with funding from Jumpstart Philly LLC, a community development fund for small developers. They then sold it for $250,000 in 2021. The other house was bought by a different local developer around the same time as its neighbor, but then sold again to a national-level investor, who then repaired and sold it in 2022 for $270,000.

# Create a table with three columns
img1 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /1.jpg"
img2 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /2.jpg"
img3 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /3.jpg"
img4 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /4.jpg"
tbl <- tibble::tibble(
  Photo1 = paste0("![One month before fire, 2017](", img1, ")"),
  Photo2 = paste0("![One year later, 2018](", img2, ")"),
  Photo3 = paste0("![Vacant, 2019](", img3, ")"),
  Photo4 = paste0("![Sold, 2022](", img4, ")")
)

# Display the table without borders
kable(tbl, format = "html", escape = FALSE) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE) %>%
  column_spec(1:4, width = "25%")
Photo1 Photo2 Photo3 Photo4

One month before fire, 2017

One year later, 2018

Vacant, 2019

Sold, 2022

4. Modeling Strategy

We begin forecasting the probability of the three outcomes of property fires by employing two regression models: generalized linear model (GLM) and random forest (RF). The generalized linear model is able to count for numerous types of data such as binaries, continuous and categorical data, and observes the relationship between the response and predictor variables. The random forest regression model is a supervised learning process that observes the data through a series of decision trees to understand the best predictors.

Our modeling process generated five models in total: three generalized linear models and two random forest models. We began a preliminary process with GLM and noticed that permits and transfers had optimal thresholds that are significantly higher than vacancy. With this in mind, we decided to attempt these outcomes on the random forest model to see how they perform.

The results of the random forest failed to yield better results, which allowed for us to pivot into focusing on GLM and employing different approaches for each outcome.

4.1 Feature Engineering

The variables that went into our model are a combination of proprietary data from the Philadelphia Fire Department, property information from the Philadelphia Office of Property Assessments, census level data, and spatial information from OpenDataPhilly. Upon initial observation, we decided to feature engineer the following variables in order to yield more effective predictive power for our models:

Property Information:

  • Quality Grade: above average and below average
  • Fuel Type: fossil, solid, alternative, other
  • Topography: above street level, street level, below street level, rocky, flood plain, other
  • Electric Heater: yes, no
  • Central Air: yes, no
  • Fireplace: yes, no
  • Number of Bedrooms: low, medium, high
  • Number of Bathrooms: low, medium, high
  • Total Livable Area: low, medium, high
  • Year Built: up to 1950, 1950 to 1999, 2000 to present
  • Exterior Condition: average, below average, vacant, sealed, unknown
  • Market Value: low, medium, high
  • Sales Value: above market, below market
  • Resident Type: owner, renter
  • Value per square foot

Census Information:

  • Is the property in a census tract experiencing poverty?: yes, no
  • What is the average income level of the census tract the property is in?: low, high

Spatial Information:

  • Is the property in a redevelopment certified area?: yes, no
  • Distance of the property to the five nearest public schools.

Generally our feature engineering method proved to provide more powerful results within our modeling process.

4.2 Random Forest Model

As previously mentioned, due to the results of our three preliminary GLMs, we tested permits and transfers on the random forest model. In both cases, the random forest accuracy scores were comparatively low at 27% and 32% for permits and transfers respectively. These scores are lower than the preliminary GLM for permits and transfers, which signified that our GLM approach would yield better predictions.

4.2.1 RF Results

Random Forest Modeling Results
Model Accuracy MAE
Permit 0.2740 0.305900
Transfer 0.3231 0.365959

4.3 GLM Models

Within GLM modeling, we are creating three models, one for each outcome: vacancies, permits, and transfers. For the three models, we are utilizing different processes through a cost benefit approach that will allow us to balance the sensitivity and specificity of each outcome to achieve the optimal threshold for each model in order to create a better modeling experience.

4.4 Model Evaluations

We use a series of metrics to understand the predictive power, accuracy, and error of our models which includes: AUC scores, ROC curves, and distribution density.

AUC Scores

The AUC score is the area under the ROC curve score, which allows us to gain a rough estimate of how our models are performing. Generally, it is ideal to have a score between 0.5 and 1. The chart below indicates the scores for our models perform well at 0.72 for vacant, 0.70 for permits, and 0.67 for transfers.

AUC by Outcome
auc outcome
0.7354691 Vacant
0.6967070 Permit
0.6622864 Transfer

ROC Curves

ROC curves and AUC scores work tangentially, with the ROC curve materializing itself as a visualization of the AUC scores. The diagonal line represents the 0.5 mark, and in this sense it is ideal to avoid having a curve close to a 90 degree angle, representing a value 1. The ROC curve plots the true positive rate (sensitivity) against the false positive rate (1-specificity) for different threshold values.

Optimal Thresholds

Due to the nature of the scope of our research, the three outcomes will experience different values in order to find their optimal threshold. The optimal threshold allows us to achieve the most appropriate accuracy scores that maximizes sensitivity and specificity. These metrics for the respective models can be seen in the chart below.

Optimal Threshold by Outcome
outcome threshold sensitivity specificity
Vacant 0.1122426 0.6735113 0.6832591
Permit 0.2771115 0.5707965 0.7118959
Transfer 0.2598390 0.7611191 0.5190656

Distribution Density

The distribution density allows us to see the density of our predicted probabilities for each model. A negative or 0 value means that the outcome did not occur, while a positive or 1 value means that the outcome did occur. Strong models will have a peak closer to 0 for the negatives (no occurrence), and a peak closer to 1 for the positives (occurred).

The plots indicate that for each of the three outcomes, our model is stronger at predicting the probability that the outcome did not occur, while the predictive probability for an outcome actually occurring ranges from 0.08 to 0.25.

4.5 Spatial Cross-Validation

To add an additional layer of evaluation, we employ a spatial cross validation method. This process allows us to understand how our models perform spatially. In this spatial cross validation, we test our models against Philadelphia neighborhoods. Below you will find the outcome of spatial cross validations both mapped and in table format.

Accuracy

The following map visualize how are accurate our model predicted spatially, for each individual outcome. The darker the color, the higher accuracy rate.

Confusion Matrices

Our 3 confusion matrix heatmaps for all three outcomes is a graphical representation of a confusion matrix, where each cell is color-coded to indicate the number of correctly classified instances for each class. In our case the true positive (TP) and true negative (TN) cells are shaded in dark orange and red, while the false positive (FP) and false negative (FN) cells are shaded in light yellow and light orange. The heatmaps provides a quick and intuitive way to identify which classes are being misclassified, and how frequently these errors are occurring.

  • TP = we predicted a property would be repaired, left vacant, or sold, and were correct.
  • TN = we predicted a property would be repaired, left vacant, or sold, and were incorrect.
  • FP = we predicted a property would notbe repaired, left vacant, or sold, and were incorrect.
  • FN = we predicted a property would notbe repaired, left vacant, or sold, and were correct.
Count of Confusion Metrics by Neighborhood: Vacant
Neighborhood True_Positive True_Negative False_Negative False_Positive
Upper Kensington 79 107 4 259
Richmond 60 91 4 175
Cobbs Creek 50 201 5 230
Haddington 46 123 10 149
Frankford 42 109 5 164
Strawberry Mansion 39 93 7 178
Stanton 36 74 4 143
Hunting Park 35 88 2 130
Hartranft 32 68 6 129
Point Breeze 32 82 3 91
Tioga 29 67 3 108
Harrowgate 26 29 1 60
Logan 24 111 10 112
Carroll Park 23 89 1 89
Paschall 23 81 5 105
West Oak Lane 23 135 8 107
Southwest Schuylkill 21 45 5 62
Juniata Park 19 71 1 66
Kingsessing 18 115 12 118
Wissinoming 18 79 6 76
Allegheny West 17 54 3 83
Olney 17 178 4 138
Franklinville 16 37 0 63
Feltonville 15 56 4 74
Mayfair 13 90 6 61
North Central 13 46 1 54
Overbrook 13 128 3 105
Belmont 12 25 0 24
Elmwood 12 130 5 83
Fairhill 12 37 2 65
Germantown - Morton 12 31 4 23
Southwest Germantown 12 40 3 56
Tacony 12 73 4 44
Holmesburg 11 67 6 49
Ogontz 11 57 2 79
Mill Creek 10 29 2 45
Nicetown 10 22 4 27
McGuire 9 19 0 31
Oxford Circle 9 166 3 82
West Passyunk 9 56 6 28
Dickinson Narrows 8 21 2 13
East Germantown 8 46 6 81
East Mount Airy 8 54 3 40
Grays Ferry 8 69 5 49
West Kensington 8 57 2 83
Wynnefield 8 47 3 48
Brewerytown 6 25 0 63
Bridesburg 6 15 2 14
Lawndale 6 56 1 62
Cedarbrook 5 22 3 18
Dunlap 5 11 0 22
Glenwood 5 16 0 40
Mantua 5 26 1 34
Somerton 5 36 3 15
Wister 5 16 0 33
East Kensington 4 31 3 27
East Parkside 4 12 0 21
Fishtown - Lower Kensington 4 39 2 18
Haverford North 4 3 0 19
Newbold 4 38 4 23
West Powelton 4 8 1 8
Cedar Park 3 19 2 18
East Oak Lane 3 18 0 13
Francisville 3 5 2 7
Germantown - Westside 3 7 1 17
Girard Estates 3 30 1 11
Lower Moyamensing 3 39 3 21
Manayunk 3 20 0 10
Walnut Hill 3 12 1 15
Burholme 2 2 0 2
Chestnut Hill 2 5 0 1
Fox Chase 2 29 1 14
Modena 2 19 0 3
Normandy Village 2 2 0 2
Northwood 2 30 2 32
Old Kensington 2 15 0 11
Penrose 2 12 0 6
Queen Village 2 17 3 6
Stadium District 2 15 0 4
West Central Germantown 2 14 0 13
West Mount Airy 2 15 1 14
Aston-Woodbridge 1 9 0 8
Bartram Village 1 2 1 1
Bella Vista 1 14 0 6
Clearview 1 11 0 11
East Falls 1 3 0 0
Eastwick 1 13 0 3
Fern Rock 1 23 0 10
Garden Court 1 5 0 0
Germany Hill 1 3 0 3
Graduate Hospital 1 16 2 5
Greenwich 1 5 0 5
Hawthorne 1 5 0 8
Lexington Park 1 3 1 1
Ludlow 1 5 0 5
Parkwood Manor 1 21 0 8
Pennsport 1 24 0 13
Rhawnhurst 1 35 1 15
Society Hill 1 7 0 5
Upper Roxborough 1 5 0 6
West Parkside 1 2 0 0
Whitman 1 31 2 10
Yorktown 1 7 1 4
Academy Gardens 0 5 0 6
Andorra 0 2 0 2
Bustleton 0 32 4 18
Byberry 0 1 0 0
Chinatown 0 0 0 1
Crescentville 0 1 0 2
Dearnley Park 0 2 0 0
East Passyunk 0 22 1 11
East Poplar 0 2 0 1
Fairmount 0 14 1 3
Fitler Square 0 2 0 1
Germantown - Penn Knox 0 3 0 3
Logan Square 0 3 0 2
Mechanicsville 0 1 0 2
Melrose Park Gardens 0 8 0 4
Millbrook 0 5 0 2
Morrell Park 0 10 0 4
Northern Liberties 0 7 0 4
Old City 0 1 0 0
Packer Park 0 8 0 2
Passyunk Square 0 30 1 15
Pennypack 0 12 0 4
Pennypack Woods 0 4 0 2
Powelton 0 5 0 0
Rittenhouse 0 2 0 5
Riverfront 0 1 1 1
Roxborough 0 18 1 13
Sharswood 0 7 1 6
Spring Garden 0 8 0 1
Spruce Hill 0 13 1 9
Summerdale 0 1 0 1
Torresdale 0 12 0 10
University City 0 3 0 2
Washington Square West 0 14 1 4
West Park 0 1 0 0
West Poplar 0 4 0 4
Winchester Park 0 6 0 2
Wissahickon 0 6 0 5
Wissahickon Hills 0 1 0 0
Wissahickon Park 0 1 0 1
Wynnefield Heights 0 2 0 0
Count of Confusion Metrics by Neighborhood: Permit
Neighborhood True_Positive True_Negative False_Negative False_Positive
Oxford Circle 79 96 17 84
Cobbs Creek 65 266 52 113
Olney 63 157 36 98
Point Breeze 56 58 12 86
West Oak Lane 56 104 29 96
Richmond 50 169 34 78
Overbrook 46 115 29 72
Frankford 40 187 36 63
Mayfair 39 72 21 50
Lawndale 34 43 16 34
Logan 33 141 25 71
Tacony 31 52 7 50
Elmwood 29 147 18 40
Somerton 28 4 3 28
Wissinoming 28 89 19 45
Grays Ferry 27 61 2 50
Ogontz 26 84 7 37
Bustleton 25 11 3 25
East Kensington 24 12 1 30
Holmesburg 23 45 13 56
Hunting Park 23 160 24 50
Juniata Park 23 79 18 41
Paschall 23 120 17 58
Stanton 23 145 42 48
Strawberry Mansion 23 206 29 61
Newbold 22 15 2 34
Upper Kensington 22 341 34 53
Fishtown - Lower Kensington 20 21 3 22
Kingsessing 20 189 22 44
West Passyunk 20 41 11 32
Wynnefield 20 45 14 30
East Germantown 19 73 20 31
Northwood 19 30 4 14
Rhawnhurst 19 9 2 28
Tioga 19 126 24 42
Carroll Park 18 132 20 35
Cedarbrook 18 19 5 12
East Mount Airy 18 58 5 27
Passyunk Square 18 9 5 19
Southwest Schuylkill 18 77 10 28
West Kensington 18 101 12 19
East Passyunk 17 4 0 14
Fox Chase 16 12 2 18
Haddington 16 224 37 54
Hartranft 16 159 24 37
Lower Moyamensing 16 13 2 37
Whitman 16 12 0 22
Feltonville 15 87 18 37
North Central 14 65 7 30
Queen Village 14 4 2 10
Old Kensington 13 10 1 6
Girard Estates 12 16 3 14
Southwest Germantown 12 60 7 33
Dickinson Narrows 11 11 0 22
Parkwood Manor 11 11 2 9
Bridesburg 10 13 1 14
Fairmount 10 2 1 8
Modena 10 4 0 11
Washington Square West 10 1 2 11
Belmont 9 38 4 11
Brewerytown 9 52 15 18
Cedar Park 9 22 4 12
East Oak Lane 9 13 2 13
Harrowgate 9 76 13 18
Manayunk 9 12 2 13
Pennsport 9 10 0 20
West Mount Airy 9 7 1 17
Allegheny West 8 103 22 27
Bella Vista 8 2 1 13
Germantown - Westside 8 10 3 9
Graduate Hospital 8 4 0 14
Spruce Hill 8 4 1 11
Germantown - Morton 7 39 8 18
Hawthorne 7 1 0 6
Mantua 7 31 9 20
Roxborough 7 7 3 15
Stadium District 7 3 0 12
Upper Roxborough 7 0 0 7
Dunlap 6 22 4 6
Fairhill 6 86 11 13
Fern Rock 6 26 5 0
Rittenhouse 6 1 0 4
Society Hill 6 2 0 6
Walnut Hill 6 15 4 8
West Central Germantown 6 12 5 7
West Powelton 6 14 0 2
Chestnut Hill 5 1 1 3
Francisville 5 4 3 5
Mill Creek 5 62 4 16
Nicetown 5 41 10 8
Northern Liberties 5 4 1 5
Academy Gardens 4 4 0 3
Clearview 4 6 4 9
Glenwood 4 45 6 6
Melrose Park Gardens 4 5 1 2
Morrell Park 4 0 0 10
Normandy Village 4 1 0 1
Penrose 4 9 1 6
Winchester Park 4 5 0 1
Wissahickon 4 2 0 5
Wister 4 35 7 9
Burholme 3 2 0 1
East Parkside 3 28 5 1
Ludlow 3 6 1 2
Millbrook 3 0 1 3
Packer Park 3 1 0 7
Pennypack 3 6 0 9
Sharswood 3 9 1 2
Torresdale 3 6 1 14
Andorra 2 2 0 0
Aston-Woodbridge 2 9 6 2
Chinatown 2 0 0 1
Eastwick 2 4 4 7
Garden Court 2 5 0 0
Germany Hill 2 1 2 3
Greenwich 2 3 1 5
Haverford North 2 16 3 5
Lexington Park 2 0 1 3
Logan Square 2 1 0 3
Spring Garden 2 2 2 7
University City 2 2 0 1
West Poplar 2 3 2 1
Yorktown 2 3 1 7
Crescentville 1 1 0 1
Fitler Square 1 2 0 1
Franklinville 1 87 13 15
Pennypack Woods 1 3 1 2
Powelton 1 2 0 2
Riverfront 1 1 0 1
Wissahickon Hills 1 0 0 0
Wissahickon Park 1 1 0 0
Woodland Terrace 1 0 0 0
Bartram Village 0 3 1 1
Byberry 0 1 0 0
Dearnley Park 0 1 1 0
East Falls 0 2 0 2
East Poplar 0 1 1 1
Germantown - Penn Knox 0 5 1 1
McGuire 0 52 3 4
Mechanicsville 0 0 0 3
Old City 0 1 1 0
Pennypack Park 0 1 0 0
Summerdale 0 1 0 1
West Park 0 1 0 0
West Parkside 0 3 0 0
Wynnefield Heights 0 2 0 0
Count of Confusion Metrics by Neighborhood: Transfer
Neighborhood True_Positive True_Negative False_Negative False_Positive
Cobbs Creek 114 205 41 136
Upper Kensington 100 181 41 128
Olney 94 148 30 82
Richmond 92 102 20 117
Frankford 76 122 18 110
West Oak Lane 72 104 32 77
Point Breeze 71 44 6 91
Oxford Circle 66 96 16 98
Overbrook 63 87 27 85
Haddington 62 137 31 101
Strawberry Mansion 62 129 25 103
Hartranft 59 91 19 67
Logan 56 114 14 86
Mayfair 52 53 12 65
Stanton 51 97 22 88
Tioga 50 73 15 73
Carroll Park 49 88 12 56
Hunting Park 47 102 24 84
Paschall 47 78 15 78
Elmwood 44 109 14 67
Wissinoming 44 63 24 50
Kingsessing 42 126 26 81
Ogontz 40 55 19 40
Juniata Park 39 60 12 50
Tacony 39 42 15 44
Grays Ferry 37 39 8 56
East Germantown 36 57 10 40
Wynnefield 36 33 8 32
Feltonville 35 72 14 36
Holmesburg 35 39 13 50
Harrowgate 33 42 6 35
West Kensington 33 63 9 45
Southwest Germantown 32 28 11 41
East Kensington 31 7 2 27
Allegheny West 30 60 24 46
Southwest Schuylkill 30 46 6 51
Lawndale 28 34 11 54
North Central 26 42 6 42
East Mount Airy 25 39 8 36
West Passyunk 24 35 8 37
Brewerytown 23 29 4 38
Fairhill 23 58 12 23
Dickinson Narrows 21 9 1 13
Fox Chase 21 8 2 17
Bustleton 20 5 3 36
Franklinville 20 54 8 34
Germantown - Morton 19 31 8 14
Rhawnhurst 19 11 2 26
Mill Creek 18 42 5 22
West Mount Airy 18 4 0 12
Mantua 17 25 6 19
Girard Estates 16 15 3 11
Belmont 15 24 2 21
Cedar Park 15 14 4 14
Fishtown - Lower Kensington 15 15 3 33
Manayunk 15 6 2 13
Newbold 15 26 3 29
Northwood 15 24 5 23
Somerton 15 8 2 38
Cedarbrook 14 4 3 33
Glenwood 14 19 7 21
Lower Moyamensing 14 13 7 34
Nicetown 13 26 8 17
Passyunk Square 13 12 2 24
Roxborough 13 6 0 13
Pennsport 11 6 7 15
Queen Village 11 2 1 16
Spruce Hill 11 5 0 8
Wister 11 23 2 19
Dunlap 10 15 1 12
East Oak Lane 10 11 5 11
McGuire 10 33 4 12
Clearview 9 7 2 5
East Parkside 9 14 1 13
Germantown - Westside 9 5 2 14
Parkwood Manor 9 5 1 18
Fairmount 8 6 0 7
Graduate Hospital 8 6 0 12
Hawthorne 8 0 0 6
Whitman 8 11 6 25
Bridesburg 7 11 1 19
Modena 7 5 7 6
Pennypack 7 3 0 8
Stadium District 7 5 1 9
Walnut Hill 7 10 1 15
Academy Gardens 6 2 0 3
Bella Vista 6 5 1 12
East Passyunk 6 8 2 19
Francisville 6 4 1 6
Haverford North 6 9 3 8
Northern Liberties 6 4 0 5
Old Kensington 6 10 2 12
Rittenhouse 6 0 0 5
Upper Roxborough 6 1 0 7
West Central Germantown 6 7 2 15
Aston-Woodbridge 5 1 2 11
Fern Rock 5 21 2 9
Washington Square West 5 3 1 15
Germany Hill 4 0 1 3
Melrose Park Gardens 4 1 0 7
Packer Park 4 1 1 5
Society Hill 4 0 0 10
Torresdale 4 6 3 11
Winchester Park 4 4 0 2
Wissahickon 4 1 1 5
Logan Square 3 0 0 3
Ludlow 3 5 0 4
Millbrook 3 2 0 2
Morrell Park 3 4 0 7
Normandy Village 3 2 0 1
Penrose 3 9 3 5
Sharswood 3 6 0 6
Spring Garden 3 1 0 9
West Poplar 3 3 0 2
Yorktown 3 2 1 7
Burholme 2 0 0 4
Chestnut Hill 2 1 0 7
Chinatown 2 1 0 0
Crescentville 2 1 0 0
Eastwick 2 11 1 3
Garden Court 2 3 0 2
Germantown - Penn Knox 2 2 1 2
Greenwich 2 3 0 6
Pennypack Woods 2 3 0 2
Riverfront 2 1 0 0
West Powelton 2 7 3 10
Bartram Village 1 3 0 1
East Falls 1 2 0 1
Fitler Square 1 1 0 2
Lexington Park 1 0 0 5
Powelton 1 1 0 3
Wissahickon Park 1 0 0 1
Wynnefield Heights 1 0 0 1
Andorra 0 0 0 4
Byberry 0 0 0 1
Dearnley Park 0 1 1 0
East Poplar 0 1 0 2
Mechanicsville 0 0 0 3
Old City 0 1 0 1
Pennypack Park 0 1 0 0
Summerdale 0 1 0 1
University City 0 0 0 5
West Park 0 1 0 0
West Parkside 0 2 0 1
Wissahickon Hills 0 0 0 1
Woodland Terrace 0 0 0 1

We predicted particularly well with vacant and permits in the Temple, Kensington areas, and anywhere north of Market street, as you can see in the highlighted tables above.

4.6. Model Predictions

In order to conclude our modeling process, we predict the probability of each outcome occurring at the fire severity levels of 1-5.

Moving Forward

Our three models possess unique characteristics to categorize those models: Vacancy is spatially oriented,permits are people oriented and transfers are property oriented.

To move this research topic forward in the future, it is imperative to pay close attention to these profiles for each outcome to create more impactful results. However, the following variables remained consistent throughout all three models: fire severity, months since fire, owner occupancy, average census tract income.

Additionally, it is crucial for future research to incorporate information related to whether or not residents have housing insurance, information regarding landlords, and qualitative information regarding if residents have a local support network.

5. Conclusion & App

5.1. Reccomendations & Policy Interventions

We separated our recommendations into pre and post fire interventions. These recommendations aim to ensure that residents are protected both before and after a structural fire incident occurs, and that the necessary steps are taken to prevent such incidents from happening in the future.

Pre-Fire Incident Policies:

  • Increase awareness about the importance of homeowner and rentersā€™ insurance and provide education on the available options to renters in Philadelphia, specifically in areas with high rates of vacancy.

  • Incentivize Local Landlord & developers: Our qualitative research found that residents who have landlords and developers who are located outside of Philadelphia, find it challenging to get into contact with them when it comes to upgrades and repairs. This could delay vital repairs needed to prevent fires. Developing a streamlined permit process for property owners to quickly get necessary permits for repairs and renovations could improve fire safety.

  • Singe Room Occupancy Safety: Some of the most vulnerable and marginalized group opt for single room occupancy because of its affordability. However, these spaces tend to be unsafe and in need to be brought up to code. Prioritizing these spaces, where legal,could save lives.

  • Proactive Home Repair: Basic System Repairs program in Philly and Whole Home Repairs Program in the state promises assistance with proactive home repair. Philadelphia has a relatively old housing stock, so proactive repairs can help update wiring to prevent common electrical fires.

Post-Fire Incident Policies:

  • For homes with a low likelihood of repair, lowering the cost of repair or new construction would enable faster recovery. This can be done through:

    • Non-profit variance programs, like flexible financing for community development groups who want to reconstruct homes at affordable rates, but face a gap in their financing because they canā€™t receive the same loans that individual homeowners can.

    • Agreements with construction unions to establish an equitable wage rate, as described in Philadelphiaā€™s 2018 Housing Equity Plan.

    • Expert-led volunteer programs like Habitat for Humanityā€™s sweat equity.

    • Repair assistance programs like the proactive ones mentioned above.

  • If a house is likely to be sold, direct sales to local developers, land banks, and nonprofits through first look programs.

  • To address the likelihood of long term vacancy, lower the investment risk for fire damaged homes through public or philanthropic loans. These homes could offer a low acquisition cost to developers using the low-income housing tax credit, which has been a successful model in the past.

These policy recommendations aim to ensure that residents are protected both before and after a structural fire incident occurs, and that the necessary steps are taken to prevent such incidents from happening in the future. By promoting the availability of renters insurance, improving fire safety in rental properties, and providing emergency assistance and resources to affected renters, these policies can help mitigate the impact of fire incidents and provide the most efficient and equitable outcomes for impacted residents. To view our web application click here

5.2 Acknowledgments

Special thanks to the Philadelphia Fire Department specifically,Commissioner Adam Thiel, GIS Expert,Andrew Newell and Assistant Deputy Commissioner Kathy Matheson for their guidance and support for this project. We also want to give a special thanks to the Red Cross House who have helped us develop our qualitative research and does incredible work in the city of Philadelphia with helping those impacted by fires find resources and housing.

6.Code Appendix

knitr::opts_chunk$set(echo= TRUE, warning = FALSE, message = FALSE)






# Set Up
library(boxr)
library(mapview)
library(sf)
library(tidyverse)
library(knitr)
library(kableExtra)
library(tigris)
library(viridis)
library(dplyr)
library(tidycensus)
library(ggplot2)
library(RSocrata)
library(lubridate)
library(janitor)
library(proxy)
library(FNN)
library(plotROC)
library(pROC)
library(ggcorrplot)
library(plotly)
library(patchwork)
library(sp)
library(MatchIt)
library(installr)
library(rmdformats)
library(gridExtra)
library(magick)





options(scipen = 999)
mapTheme <- theme(plot.title =element_text(size=12),
                  plot.subtitle = element_text(size=8),
                  plot.caption = element_text(size = 6),
                  axis.line=element_blank(),
                  axis.text.x=element_blank(),
                  axis.text.y=element_blank(),
                  axis.ticks=element_blank(),
                  axis.title.x=element_blank(),
                  axis.title.y=element_blank(),
                  panel.background=element_blank(),
                  panel.border=element_blank(),
                  panel.grid.major=element_line(colour = 'transparent'),
                  panel.grid.minor=element_blank(),
                  legend.direction = "vertical", 
                  legend.position = "right",
                  #plot.margin = margin(1, 1, 1, 1, 'cm'),
                  legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))

#Color Palettes

palette2 <- c("#b9cfcf", "#e19825")

palette3_sat <- c("#e19825","#d55816","#7b230b")
palette3_desat <- c("#B19C7D","#7F5F52","#262626")

palette4 <- c("#f1c82b","#e19825","#d55816","#7b230b")
palette4_desat <- c("#B19C7D","#B27D49","#7F5F52","#262626")

palette5_sat <- c("#f1c82b","#e19825","#d55816","#7b230b","#413028")
pallette5_desat <- c("#ead5b7","#d2b190","#b18e6f","#7f5f52","#413028")

palette6_sat <- c("white","#f1c82b","#e19825","#d55816","#7b230b","#413028")
palette7_cats <- c("#b9cfcf","#20b1ae","#b47c49", "#3f3128", "#8f8172")
palette8<- c("#b9cfcf", "#20b1ae", "#8f8172")
palette5_permits <- c("#BAE4B6", "#85C07F", "#5E8C59", "#4A7246", "#2D4A2A")
#Sources for Graphs
creditFire <- "Source: Philadelphia Fire Department"
creditOpen <- "Source: Open Data Philly"

g<-glimpse
source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

crs <- "EPSG:4326"
PHL_crs <- 'ESRI:102411' #This measurement-based CRS projection is for the West Coast
#Loading in Fire Data 
## INSERT BOX WORKFLOW HERE ##
box_auth(client_id = "5uqluwfy2fsqfwts7hfofn1yuq5q3snf", 
         client_secret = "y11sGdz8felfrTpT7cSWzhS0QXOZOkKi")
box_setwd(186732420366)
box_getwd()
box_ls()
list <- box_ls() %>% as.data.frame()
structureFire <- box_read_excel(1093000179542) 

dat <- structureFire

# Creating geometry for the fires
dat <- dat %>% drop_na("Longitude", "Latitude") %>%
  st_as_sf(coords = c("Longitude", "Latitude"), crs = "EPSG:4326")

# Cleaning the column names
dat<-clean_names(dat)

# Address string
dat <-
  dat %>% mutate(street_type = ifelse(street_type == 'AV', "AVE", street_type)) %>% 
  unite(address, c('address_number', 'street_prefix', 'street_name', 'street_type'), sep = " ", remove = FALSE, na.rm=TRUE)

# Extracting quarter
dat <- dat %>% mutate(quarter = floor_date(alarm_date, unit="quarter"))

# Reducing columns
dat <- dat %>%
  dplyr::select(address, quarter, property_use, incident_number, number_of_exposures, incident_type, building_status, fire_spread, no_spread, code_description, geometry, alarm_date, cad_nature_code_description,
                minor_damage, significant_damage, heavy_damage, extreme_damage)

# Removing duplicates
dat <- dat[!duplicated(dat$incident_number),]

#Count the number of Fires per address
nFires_perAddress <- dat%>%
  st_drop_geometry()%>%
  count(address, sort=TRUE)%>%
  left_join(dplyr::select(dat, address), by="address")%>% #removed na.rm=TRUE
  st_as_sf()

#remove duplicates from above
nFires_perAddress <- nFires_perAddress[!duplicated(nFires_perAddress$address),]

#Barplot of Counts of Fires for Each Address
nFires_perAddress_Plot<-nFires_perAddress%>% 
  filter(n < 7)%>%
  ggplot()+
  geom_bar(mapping=aes(x=as.factor(n)), fill="#b9cfcf")+
  labs(title="Number of Fires Per Address",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Count of Fires")+
  ylab("Number of Structures")+
  theme(panel.background = element_rect(fill = "#f3efe0"))
nFires_perAddress_Plot <- ggplotly(nFires_perAddress_Plot)

nFires_perAddress_Plot
#Line plot of Fires per quarter, Min to Max
dat_plot<-dat %>%
  ggplot(aes(x=as_date(quarter))) +
      geom_bar(fill="#2D4A2A")+
      labs(title = "Quarterly Count of Unique Fire Incidents",
           subtitle = "Philadelphia County, 2009 Q1 - 2022 Q4", 
           y = "Number of Fires")+   
    scale_x_date(name = "Year", date_breaks = "1 year")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.background = element_rect(fill = "#f3efe0"))
dat_plot <- ggplotly(dat_plot)

dat_plot

# Render the plot
#Building Use for All Fires
#Plotting the Frequency of Fires based on Property Use

nFires_perAddress_BuildAll <- nFires_perAddress %>%
  left_join(st_drop_geometry(dplyr::select(dat, property_use, address)), by="address")

#Bar plot of property use counts
prop_use_plot <-dplyr::select(nFires_perAddress_BuildAll, -address)%>%
  st_drop_geometry()%>%
  gather(Variable, value, -n)%>%
  count(Variable, value)%>%
  group_by(Variable)%>%
  filter(n > 150)%>%
  ggplot(., aes(value, n))+
      geom_bar(position = "dodge", stat="identity", fill="#b9cfcf") +
      labs(x="Category", y="Frequency",
           title = "Top 10 Property Uses Among All Structure Fires",
           subtitle = "Philadelphia County, 2009-2022")+
          theme(axis.text.x = element_text(angle = 45, hjust = 1),
                
        panel.background = element_rect(fill = "#f3efe0"))
prop_use_plot <- ggplotly(prop_use_plot)

prop_use_plot

#Count Fires by PropUse and Add Super Category column
## Error with lines 203-205
#nFires_perAddress_PropUse <- nFires_perAddress %>%
 #left_join(st_drop_geometry(dplyr::select(structureFire_sf_addressU,`Property Use`, address)), by="address")%>%
  #mutate(Property_Use_SuperCat = substr(`Property Use`, 1, 1))
#Commenting this out for now, fix later!
#Chart of Building Status counts
nFires_perAddress_BuildAll%>%
  mutate(Property_Use_SuperCat = substr(property_use, 1, 1))%>%
  dplyr::select(-address, -property_use)%>%
  st_drop_geometry()%>%
  gather(Variable, value, -n)%>%
  count(Variable, value, sort = TRUE)%>%
  group_by(Variable)%>%
  summarize(`Property Use Supercategory` = value, `Share (%)` = round((n/sum(n)*100), 2))%>%
  kable()%>%
  kable_material_dark()


firespread_plot <-dat %>%
  st_drop_geometry()%>%
  count(fire_spread)%>%
  ggplot()+
    geom_col(mapping=aes(x=as.factor(fire_spread), y=n, fill=fire_spread))+
    labs(title = "Number of Fires by Fire Spread", 
         subtitle = "Philadelphia County, 2009-2022", 
         caption = creditFire,
         x="Fire Spread Code",
         y="Number of Fires")+
    scale_fill_manual(values = palette5_sat,
                      name = "Fire Spread \nConfined To:", 
                      labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
    theme(plot.title = element_text(size=18),
          
        panel.background = element_rect(fill = "#f3efe0"))

firespread_plot <- ggplotly(firespread_plot)

firespread_plot


g(dat)
#How to measure severity in detail beyond 
sFire_severity <- dat%>%
  st_drop_geometry()%>%
  dplyr::select(incident_type, minor_damage, significant_damage, heavy_damage, extreme_damage, fire_spread)%>%
  mutate(Worst_Damage = ifelse(extreme_damage > 0, "Extreme",
                          ifelse(heavy_damage > 0, "Heavy",
                            ifelse(significant_damage > 0, "Significant",
                              ifelse(minor_damage > 0, "Minor", "No Record")))))%>%
  count(Worst_Damage, fire_spread)
damagecount_plot <- ggplot(sFire_severity)+
  geom_col(mapping=aes(x=as.factor(Worst_Damage), y=n, fill=fire_spread))+
  labs(title = "Number of Fires by Worst Recorded Floor Damage", 
       subtitle = "Philadelphia County, 2009-2022", 
       caption = creditFire,
       x="Incident Type Code",
       y="Number of Fires")+      
  scale_fill_manual(values = palette5_sat,
                      name = "Fire Spread \nConfined To:", 
                      labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
  theme(plot.title = element_text(size=18),
        panel.background = element_rect(fill = "#f3efe0"))

damagecount_plot <- ggplotly(damagecount_plot)

damagecount_plot

#Count the unique values for CAD
nFires_CADDescr <- dat %>%
  st_drop_geometry%>%
  group_by(fire_spread)%>%
  count(cad_nature_code_description, sort=TRUE)

#Barplot of counts, by count
cad_plot<-nFires_CADDescr%>%
  filter(n>50)%>%
  ggplot()+
  geom_col(mapping=aes(x=cad_nature_code_description, y=n, fill=fire_spread))+
  labs(title="Frequency of Fire Types",
       subtitle="Philadelphia County, 2009-2022, Above 50 Unique Incidents")+
  xlab("CAD Nature Code Description")+
  ylab("Number of Fires")+
      scale_fill_manual(values = palette5_sat,
                      name = "Fire Spread \nConfined To:", 
                      labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
            theme(axis.text.x = element_text(angle = 45, hjust = 1),
              panel.background = element_rect(fill = "#f3efe0"))

cad_plot<-ggplotly(cad_plot)

cad_plot
IncidentType_plot <-dat %>%
  st_drop_geometry()%>%
  count(incident_type, fire_spread)%>%
  ggplot()+
    geom_col(mapping=aes(x=as.factor(incident_type), y=n, fill=fire_spread))+
    labs(title = "Number of Fires by Incident Type, All Severities", 
         subtitle = "Philadelphia County, 2009-2022", 
         caption = creditFire,
         x="Incident Type Code",
         y="Number of Fires")+
    scale_fill_manual(values = palette5_sat,
                      name = "Fire Spread \nConfined To:", 
                      labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
    theme(plot.title = element_text(size=18),
        panel.background = element_rect(fill = "#f3efe0"))


IncidentType_plot <- ggplotly(IncidentType_plot)

IncidentType_plot
GreaterSev<-dat %>%
  st_drop_geometry()%>%
  count(incident_type, fire_spread)%>%
  filter(incident_type != 111 & incident_type != 1110)%>%
  ggplot()+
    geom_col(mapping=aes(x=as.factor(incident_type), y=n, fill=fire_spread))+
    labs(title = "Number of Fires by Incident Type, Greater Severities", 
         subtitle = "Philadelphia County, 2009-2022", 
         caption = creditFire,
         x="Incident Type Code",
         y="Number of Fires")+
    scale_fill_manual(values = palette5_sat,
                      name = "Fire Spread \nConfined To:", 
                      labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
    theme(plot.title = element_text(size=18),
        panel.background = element_rect(fill = "#f3efe0"))

GreaterSev <- ggplotly(GreaterSev)

GreaterSev


acs_vars <- c("B01001_001E")

acsTractsPHL.2020 <- get_acs(geography = "tract",
                             year = 2020, 
                             variables = acs_vars, 
                             geometry = TRUE, 
                             state = "PA", 
                             county = "Philadelphia", 
                             output = "wide") 

nFires_perAddress_Outliers <- filter(nFires_perAddress, n>2)

outliers<-ggplot()+
  geom_sf(data=acsTractsPHL.2020, fill='#f0efe0', color='dark gray')+
  geom_sf(data=nFires_perAddress_Outliers, aes(color=q5(n)), alpha=0.5)+
    scale_color_manual(values=palette5_sat, labels=qBr(nFires_perAddress_Outliers, "n"))+
  labs(title = "Addresses With 3+ Fires") + mapTheme()
outliers<- ggplotly(outliers)

outliers
#Plotting the Frequency of Fires based on the Outliers' property use

top10<-dplyr::select(nFires_perAddress_BuildAll, -address)%>%
  filter(n>2)%>%
  st_drop_geometry()%>%
  gather(Variable, value, -n)%>%
  count(Variable, value)%>%
  group_by(Variable)%>%
  filter(n>23)%>%
  ggplot(., aes(value, n))+
      geom_bar(position = "dodge", stat="identity", fill="#b9cfcf") +
      labs(x="Category", y="Frequency",
           title = "Top 10 Property Uses Among Addresses with 3+ Fires",
           subtitle = "Philadelphia County, 2009-2022",
           credit = creditFire)+
          theme(axis.text.x = element_text(angle = 45, hjust = 1),
                
        panel.background = element_rect(fill = "#f3efe0"))


top10 <- ggplotly(top10)

top10
# Initial panel 
dat.panel <-
  expand.grid(quarter = unique(dat$quarter), 
              address = unique(dat$address))

# Count Panel
count.panel <- 
  dat %>%
  st_drop_geometry() %>%
  group_by(quarter, address, fire_spread) %>%
  count(address, sort=TRUE)

# Changing address to factor for join purposes later
count.panel$address <-
  as.factor(count.panel$address)
# Final Panel
final.panel <- left_join(dat.panel, count.panel, by=c("address", "quarter")) # Join
final.panel <- final.panel %>% dplyr::select(address, quarter, fire_spread, n) %>% rename(count = n) # Condensing & renaming
final.panel[is.na(final.panel)] <- 0 # Assigning 0 to NA for everything
final.panel$fire_spread <- as.numeric(final.panel$fire_spread) # Making fire_spread numeric

# Calculating the maximum severity score 
final.panel <-
  final.panel %>%
  group_by(address, quarter, count) %>% summarise(severity_index = max(fire_spread))

g(final.panel)

#Data from https://www.opendataphilly.org/dataset/opa-property-assessments

#metadata at: https://metadata.phila.gov/#home/datasetdetails/5543865f20583086178c4ee5/representationdetails/55d624fdad35c7e854cb21a4/?view_287_page=3

# Reading the data
# Kendra
opa_dat <- read_csv("/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/opa_properties_public-2.csv")

# Myron
#opa_dat <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-opa_properties_public.csv")

# Creating geometry for the properties
opa_dat <- opa_dat%>%
  drop_na(lng, lat)%>%
  st_as_sf(coords = c("lng", "lat"),
           crs = "EPSG:4326")

g(opa_dat)
# Reducing columns
opa_dat_small_sf <- opa_dat[!duplicated(opa_dat$location),] %>%
  dplyr::select(location, category_code, category_code_description, building_code, building_code_description, building_code_new, building_code_description_new, total_area, total_livable_area, owner_1, owner_2, market_value, market_value_date, mailing_street, number_of_bedrooms, number_of_bathrooms, number_stories, interior_condition, assessment_date, year_built, year_built_estimate, zoning, quality_grade, central_air, exterior_condition, fireplaces, fuel, taxable_building, topography, type_heater, sale_price, separate_utilities)%>%
  rename(address = location)

# Filtering for residential properties
opa_dat_small_sf <- opa_dat_small_sf %>% filter(category_code == 1 | category_code == 2 | category_code == 3)

# Extracting just the addresses
opa_dat_small <- opa_dat_small_sf%>%
  dplyr::select(address)%>%
  st_drop_geometry()

# Time Panel
quarter <- c("Q1", "Q2", "Q3", "Q4") # Creating quarters
year <- c(2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022) # Creating years

comb <- expand.grid(year = year, quarter = quarter)%>%
  mutate(yq = paste(year, ":", quarter))%>%
  mutate(yqDT = yq(yq))%>%
  arrange(yqDT)

time.panel <- as_tibble(comb)%>%
  dplyr::select(yqDT)
opa.panel <- expand.grid(address = opa_dat_small$address, 
             quarter = time.panel$yqDT)
# Combining OPA and Fire
opa_count.panel <- full_join(opa.panel, final.panel, by=c("address", "quarter")) # Join
opa_count.panel[is.na(opa_count.panel)] <- 0 # Assigning 0 to NA for everything 
#311 Data Upload, downloaded from https://data.phila.gov/visualizations/311-requests/

#All311 <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-311Call#s.csv")

AllLI <- st_read("/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/complaints.geojson")

All311 <- read_csv("/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/OpenDataPhilly_311Calls.csv")

#All311 <- read_csv("/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/OpenDataPhilly_311Calls.csv")

#All311 <- read_csv("OpenDataPhilly_311Calls.csv")

#All311<- OpenDataPhilly_311Calls
#Filtering to only the fire/building-relevant terms

#strictly filtering to vacancy complaints for initial combination
property311 <- filter(All311, 
                        #service_name == "Building Dangerous" |  
                        #service_name == "Dangerous Building Complaint " |  
                        #service_name == "Fire Safety Complaint" | 
                        #service_name == "Maintenance Complaint" |
                        #service_name == "Maintenance Residential or Commercial" |
                        service_name == "Vacant House or Commercial" ) %>%
                        #service_name == "Fire Residential or Commercial" |
                        #service_name == "Complaints against Fire or EMS"
dplyr::select(objectid, service_request_id, status, service_name, service_code, requested_datetime, agency_responsible, address, zipcode, lat, lon)%>%
  drop_na(lat, lon, address)%>%
  st_as_sf(coords = c("lon", "lat"),
           crs = "EPSG:4326")

#Reducing variables and calculating the quarter of the calls
prop311_small <- property311%>%
  dplyr::select(service_name, requested_datetime, address)%>%
  st_drop_geometry()%>%
  mutate(quarter = floor_date(requested_datetime, unit="quarter"))

#counting the calls per address per quarter
vacant311_count <- prop311_small%>%
  group_by(address, quarter)%>%
  count(address, sort=TRUE)%>%
  rename(n_311Vacant = n)

vacant311_count%>%
  group_by(quarter)%>%
  summarize(count = sum(n_311Vacant))%>%
  ggplot(aes(x=quarter, y=count))+
  geom_col(fill="#8F8172")+
    labs(title="311 Vacancy Complaints",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Date, Rounded to Beginning of Quarter")+
  ylab("Number of Fires")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
#Data is now ready to join with panel

#strictly filtering to vacancy complaints for initial combination
vacantLI <- filter(AllLI, 
                        complaintcodename == "VACANT HOUSE" |
                        complaintcodename == "VACANT HOUSE RESIDENTIAL" |
                        complaintcodename == "SPECIAL VACANT HOUSE" |
                        complaintcodename == "VACANT PROPERTY COMPLAINT" ) %>%
  dplyr::select(address, addressobjectid, complaintdate, complaintcodename, geometry)%>%
  mutate(quarter = as_date(floor_date(complaintdate, unit="quarter")))%>%
  st_set_crs("EPSG:4326")

vacantLI_count <- vacantLI%>%
  st_drop_geometry%>%
  drop_na(address)%>%
  group_by(address, quarter)%>%
  count(address, sort=TRUE)%>%
  rename(n_Vacant = n,
         address = address,
         quarter = quarter)
  
vacantLI_count$address <- as.factor(vacantLI_count$address)  

vacantLI_count%>%
  group_by(quarter)%>%
  summarize(count = sum(n_Vacant))%>%
  ggplot(aes(x=quarter, y=count))+
  geom_col(fill="#B9CFCF")+
    labs(title="L&I Vacancy Complaints",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Date, Rounded to Beginning of Quarter")+
  ylab("Number of Fires")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
#Data is now ready to join with panel
#Outer join to ensure no date/quarter combos are the same, getting unique values
vacant311_count_Clean <- vacant311_count%>%
  anti_join(vacantLI_count, by=c("address", "quarter"))

#Row bind L&I and Clean 311 together.
vacantLI311 <- rbind(vacantLI_count, vacant311_count_Clean)
vacantLI311%>%
  group_by(quarter)%>%
  summarize(count = sum(n_Vacant))%>%
  ggplot(aes(x=quarter, y=count))+
  geom_col(fill="#A5300F")+
    labs(title="L&I Vacancy Complaints",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Date, Rounded to Beginning of Quarter")+
  ylab("Number of Fires")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))

vacant311_count$address <- as.factor(vacant311_count$address) 
#Importing Permit Data
#Data from https://www.opendataphilly.org/dataset/licenses-and-inspections-building-permits

#metadata at: https://metadata.phila.gov/#home/datasetdetails/5543868920583086178c4f8f/representationdetails/5e9a01ac801624001585ca11/

#permits0715 <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-permits_0715.csv")
#permits1623 <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-permits_1623.csv")

#Kendra's File Path
permits0715 <- read_csv("PhilaFireData/OpenDataPhilly-permits_0715.csv")
permits1623 <- read_csv("PhilaFireData/OpenDataPhilly-permits_1623.csv")
                  
permitsAll <- rbind(permits0715, permits1623)

permits_sf <- permitsAll%>%
  drop_na(lng, lat)%>%
  st_as_sf(coords = c("lng", "lat"),
           crs = "EPSG:4326")

permits_sf_res <- permits_sf%>%
  dplyr::select(permittype, permitdescription, permitissuedate, commercialorresidential, address)%>%
 #filter(permitdescription == "DEMOLITION PERMIT" |
       # permitdescription == "GENERAL PERMIT" |
       # permitdescription == "NEW CONTRUCTION PERMIT" |
       # permitdescription == "RESIDENTIAL BUILDING PERMIT" |
        #permitdescription == "FAST FORM BUILDING PERMIT" |
        # permitdescription == "ALTERATION PERMIT")%>%
  #filter(commercialorresidential != "COMMERCIAL")%>%
  mutate(quarter = as_date(floor_date(permitissuedate, unit="quarter")))%>%
  filter(year(quarter) >= 2009)

permits_count <- permits_sf_res %>%
  st_drop_geometry()%>%
  group_by(address, quarter)%>%
  count(address, sort = TRUE)%>%
  rename(n_permits = n)

permits_count%>%
  ggplot(aes(x=quarter, y=n_permits))+
  geom_col(fill="#A5300F")+
    labs(title="Permit Records",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Date, Rounded to Beginning of Quarter")+
  ylab("Number of Records")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
#transfers_count <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/transfers_count.csv")

transfers_count <- read_csv("PhilaFireData/transfers_count.csv")
#transfers <- read_csv("Data/OpenDataPhilly-transfers.csv")

#Select relevant fields and filter to fire data range
#transfers_data <- transfers%>%
#  dplyr:: select(objectid, recording_date, street_address, document_type)%>%
#  filter(year(recording_date) > 2008,
#         !is.na(street_address))%>%
#  rename(address = street_address)%>%
#  mutate(quarter = as_date(floor_date(recording_date, unit="quarter")))
#transfers_count <- transfers_data %>%
#  group_by(address, quarter)%>%
#  count(address, sort = TRUE)%>%
#  rename(n_transfers = n)

#write.csv(transfers_count, "~/Desktop/Coding/MUSA Practicum/MUSA_Practicum-/Data/transfers_count.csv")
transfers_count%>%
  ggplot(aes(x=quarter, y=n_transfers))+
  geom_col(fill="#b9cfcf")+
    labs(title="Real Estate Transfer Records",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Date, Rounded to Beginning of Quarter")+
  ylab("Number of Records")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
#panel_OPAFireOpenData <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/panel_OPAFireOpenData.csv")

panel_OPAFireOpenData <- read_csv("PhilaFireData/panel_OPAFireOpenData.csv")
#panel_OPAFire311 <- left_join(opa_count.panel, vacant311_count, by=c("address", "quarter"))
#panel_OPAFire311$n_311Vacant[is.na(panel_OPAFire311$n_311Vacant)] <- 0

#panel_OPAFire311Permit <- left_join(panel_OPAFire311, permits_count, by=c("address", "quarter"))
#panel_OPAFire311Permit$n_permits[is.na(panel_OPAFire311Permit$n_permits)] <- 0

#panel_OPAFireOpenData <- left_join(panel_OPAFire311Permit, transfers_count, by=c("quarter", "address"))
#panel_OPAFireOpenData$n_transfers[is.na(panel_OPAFireOpenData$n_transfers)] <- 0
#filter panel to just outcomes or fires
panel_Positives <- panel_OPAFireOpenData %>%
  filter(count > 0 | n_Vacant > 0 | n_permits > 0 | n_transfers > 0)

#filter to just fires, then combine those addresses with additional outcomes and building categories
panel_FirePositives <- panel_OPAFireOpenData %>%
  filter(count > 0)%>%
  dplyr::select(address)%>%
  distinct(address, .keep_all = TRUE)%>%
  left_join(panel_Positives, by="address")%>%
  left_join(dplyr::select(opa_dat_small_sf, address, category_code_description, mailing_street, building_code_description), by="address")%>%
  mutate(condo = ifelse(grepl("CONDO", building_code_description) == TRUE, TRUE, FALSE),
          owner_occ = ifelse(condo == FALSE & category_code_description != "MULTI FAMILY",
                             ifelse(address == mailing_street, TRUE, FALSE),
                             NA))%>%
  dplyr::select(-mailing_street, -condo, -building_code_description)%>%
  st_drop_geometry()
  #mutate(diff = interval(lag(quarter, n=1),quarter) %/% years(1))  

#Eliminate the data that comes before the fires, as those are not outcomes of the fire
panel_FirePositives <- panel_FirePositives%>%
  group_by(address)%>%
  mutate(f = cumsum(count))%>% #counts the cumulative sum of the number of fires at that address so far.
  filter(f > 0)%>% #if that number is zero, then we don't want the data
  dplyr::select(-f)

# edit here if we wanna play with the lags
#Join to get fire incident number and date
#calculate the difference between the quarter of the incident date and the quarter of the outcome
panel_FirePositivesDiff <- panel_FirePositives%>%
  left_join(st_drop_geometry(dplyr::select(dat, incident_number,address, quarter)), by=c("address"))%>%
  group_by(incident_number)%>% #some addresses have multiple fires, so we use incident number instead
  mutate(mSinceFire = interval(quarter.y, quarter.x) %/% months(1),
         ySinceFire = mSinceFire / 12,
         cat_code = toupper(category_code_description))%>%
  filter(mSinceFire >= -1,#Eliminate entries before fires (occurs because of incident_number group duplicates)
         mSinceFire < 49, #Eliminate entries after four years, as they are irrelevant (arbitrary)
         !(count > 0 & ySinceFire > 0))%>% #For addresses with multiple incidents, take out repeated fire observ's
  dplyr::select(-category_code_description)%>%
  st_as_sf()

#Chart:
#For every outcome, what is the median time since a fire occurred?
panel_FirePositivesDiff%>%
  st_drop_geometry()%>%
  ungroup()%>%
  filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0)) %>%
  mutate(ySinceFire = mSinceFire / 12)%>%
  dplyr::select(n_Vacant, n_permits, n_transfers, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  filter(value > 0)%>%
  group_by(Variable)%>%
  summarize(`Median Years Since Fire` = median(ySinceFire))%>%
  kable()%>%
  kable_material_dark()
#panel_Results2Y <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/panel_Results2Y.csv")

panel_Results2Y <- read_csv("PhilaFireData/panel_Results2Y.csv")
#six months boolean outcome code
panel_Results2Q <- panel_OPAFireOpenData %>%
    mutate(fireVacant2Q = ifelse(address == lag(address, n=1) & 
                               (count > 0 | lag(count, n=1) > 0) &
                                (n_Vacant > 0) > 0 , 1, 0),
          firePermit2Q = ifelse(address == lag(address, n=1) & 
                               (count > 0 | lag(count, n=1) > 0) &
                               (n_permits > 0) > 0 , 1, 0),
        fireTransfer2Q = ifelse(address == lag(address, n=1) & 
                  (count > 0 | lag(count, n=1) > 0) &
                     (n_transfers > 0) > 0 , 1, 0))

# ear Outcomes for Each Incident
panel_Results2Y <- panel_FirePositivesDiff %>%
    st_drop_geometry()%>%
    dplyr::select(-mSinceFire, -cat_code, -quarter.y)%>%
    filter(., ySinceFire <= 2)%>% # edit here if we wanna play with the lags
    group_by(address, incident_number)%>%
    summarize(count = sum(count),
             severity_index = max(severity_index),
             outcome_vacant = sum(n_Vacant),
             outcome_permit = sum(n_permits),
             outcome_transfer = sum(n_transfers),
             quarter = min(quarter.x))

# Later: Join back to original dataset to get the spatial features
# 5 Feature Engineering,OPA and Fire Dataset 


# OPA - Creating an OPA dataset to get just the variables we want for feature engineering and doing slight feature engineering
# Numeric
opa_dat_small_sf_num <- opa_dat_small_sf %>% 
  dplyr::select(address, total_livable_area, market_value, sale_price, number_of_bedrooms, number_of_bathrooms, number_stories, 
  interior_condition) %>% st_drop_geometry()
opa_dat_small_sf_num[is.na(opa_dat_small_sf_num)] <- 0 # Assigning 0 to NA for everything

# Categorical
opa_dat_small_sf_cat <- opa_dat_small_sf %>% 
  dplyr::select(address, quality_grade, year_built, central_air, exterior_condition, fireplaces, fuel, taxable_building, 
  topography, type_heater)  %>% st_drop_geometry()

opa_dat_small_sf_fe <- left_join(opa_dat_small_sf_num,opa_dat_small_sf_cat, by = "address") # Joining

# OPA - Quality Grade
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(grade = case_when(
    quality_grade == "A+" |quality_grade == "A" | quality_grade == "A-" | quality_grade == "B+" | quality_grade == "B" | 
      quality_grade == "B-"| quality_grade == "C+"| quality_grade == "C" ~ "Average or Better",
    TRUE ~ "Below Average")) 

# OPA - Fuel
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(fuel_type = case_when(
    fuel == "A" | fuel == "B" ~ "Fossil",
    fuel == "D" | fuel == "F" ~ "Solid", 
    fuel == "C" | fuel == "E" ~ "Alternative", 
    TRUE ~ "Other"))

# OPA - Topography
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(topo = case_when(
    topography == "A" ~ "Above Street Level",
    topography == "B" ~ "Below Street Level", 
    topography == "C" ~ "Flood Plain",
    topography == "D" ~ "Rocky",
    topography == "F" ~ "Street Level",
    TRUE ~ "Other"))

# OPA - Heater
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(electric_heater = case_when(
    type_heater == "C" ~ "Yes",
    TRUE ~ "No"))

# OPA - Central Air
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(air_central = case_when(
    central_air == "Y" ~ "Yes",
    TRUE ~ "No"))

# OPA - Fireplace
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(fireplace = case_when(
    fireplaces > 0 ~ "Yes",
    TRUE ~ "No"))

# OPA - Year Built
opa_dat_small_sf_fe$year_built <- as.numeric(as.character(opa_dat_small_sf_fe$year_built))

# OPA - Cleaning
opa_dat_small_sf_fe <- opa_dat_small_sf_fe %>% dplyr::select(-quality_grade, -central_air, -fuel, -topography, -type_heater)
opa_dat_small_sf_fe[is.na(opa_dat_small_sf_fe)] <- 0 # Assigning 0 to NA for everything

# Fire Data - Creating an fire dataset to get just the variables we want for feature engineering
dat_fe <- dat %>% dplyr::select(address, incident_type, number_of_exposures, minor_damage, significant_damage, heavy_damage, extreme_damage)
#Joining panel_Results2Y with Time outcome FE variables

panel_Results2Y_sf <- left_join(panel_Results2Y, opa_dat_small_sf_fe, by="address") # Joining with OPA FE variables
#panel_Results2Y_sf[is.na(panel_Results2Y_sf)] <- 0 # Assigning 0 to NA for everything


# Joining panel_Results2Y with Time outcome FE variables
panel_FirePositivesDiff_fe <- panel_FirePositivesDiff %>% 
  dplyr::select(address, mSinceFire, ySinceFire, cat_code, owner_occ) %>% st_drop_geometry() # Condensing dataframe
fire_panel <- merge(panel_Results2Y_sf,panel_FirePositivesDiff_fe ) # Join
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates
# Turning outcomes into binary
fire_panel <-fire_panel %>%
  mutate(vacant = case_when(
    outcome_vacant > 0 ~ 1,
    TRUE ~ 0
  ))

fire_panel <-fire_panel %>%
  mutate(permit = case_when(
    outcome_permit > 0 ~ 1,
    TRUE ~ 0
  ))

fire_panel <-fire_panel %>%
  mutate(transfer = case_when(
    outcome_transfer > 0 ~ 1,
    TRUE ~ 0
  ))

# Market Value
fire_panel <-
  fire_panel %>%
  mutate(mkt_value = case_when(
    market_value < 25000 ~ "Low", # < $250,000
    market_value > 24999 & market_value < 500000 ~ "Medium", # $250,000-$500,000
    TRUE ~ "High")) # $500,000+

# Number of Bedrooms
fire_panel <-
  fire_panel %>%
  mutate(bedrooms = case_when(
    number_of_bedrooms < 4 ~ "Low", # 1-3 Bedrooms
    number_of_bedrooms > 3 & number_of_bedrooms < 8 ~ "Medium", # 4-7 Bedrooms
    TRUE ~ "High")) # 8+

# Number of Bathrooms
fire_panel <-
  fire_panel %>%
  mutate(bathrooms = case_when(
    number_of_bathrooms < 3 ~ "Low", # 1-2 Bathroom
    number_of_bathrooms > 2 & number_of_bathrooms < 6 ~ "Medium", # 3-5 Bathrooms
    TRUE ~ "High")) # $6+

# Livable Area 
fire_panel <-
  fire_panel %>%
  mutate(area = case_when(
    total_livable_area < 5001 ~ "Low", # < 0-5,000 sqft
    total_livable_area > 4999 & total_livable_area < 10000 ~ "Medium", # 5,001-10,000 sqft
    TRUE ~ "High")) # 10,001+ sqft

# Year Built
fire_panel <-
  fire_panel %>%
  mutate(built_year = case_when(
    year_built < 1951 ~ "Up to 1951", 
    year_built > 1959 & year_built < 2000 ~ "1950 to 1999",
    TRUE ~ "2000 to Present")) 

# Interior Condition when fire happened - Fire in 2018 and condition is how it is today. remove?
fire_panel <-
  fire_panel %>%
  mutate(condition = case_when( 
    interior_condition > 0 & interior_condition < 5 ~ "Average", 
    interior_condition == 5 ~ "Below Average",
    interior_condition == 6 ~ "Vacant",
    interior_condition == 7 ~ "Sealed",
    interior_condition == 0 ~ "Unknown")) 

# Sale vs Market
fire_panel <-
  fire_panel %>%
  mutate(sale_value = case_when( 
    sale_price > market_value ~ "Above Market", 
    TRUE ~ "Below Market")) 

# Sale vs Market
fire_panel <-
  fire_panel %>%
  mutate(resident_type = case_when( 
    owner_occ == "TRUE" ~ "Owner", 
    owner_occ == "FALSE" ~ "Renter", 
    TRUE ~ "Other")) 

dat_geo <- dat %>% dplyr::select(address, geometry) # Geometry
fire_panel <- left_join(fire_panel, dat_geo, by = "address") # Joining to get geometry
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates
#For the 6 month outcome, get all positive entries. Not needed for 2Year code, all entries #are already positive.
panelResults_Positive<- panel_Results2Q%>%
  filter(count > 0 | fireVacant2Q > 0 | firePermit2Q > 0 | fireTransfer2Q > 0)
#Checking how many positive outcomes we have for 2Q
results_summary2Q <- panelResults_Positive%>%
  summarize(Total_Fires = length(unique(address)),
            Total_311Vacant = sum(fireVacant2Q),
           Total_Permit = sum(firePermit2Q),
            Total_Transfers = sum(fireTransfer2Q))
#results_summary2Q%>%
  #kable()%>%
 #kable_styling()
#Checking how many positive outcomes we have for 2Y
results_summary2Y <- panel_Results2Y%>%
  ungroup%>%
  summarize(Total_Fires = length(unique(address)),
           Total_Vacant = sum(outcome_vacant>0),
            Total_Permit = sum(outcome_permit>0),
            Total_Transfers = sum(outcome_transfer>0))
results_summary2Y%>%
  kable(caption = "Total Outcomes Observed, Up to 2 Years After Fires")%>%
 kable_styling()
#The Big Summary
#Distribution of Outcomes, Measuring Time between Fire and Outcome
panel_FirePositivesDiff%>%
  st_drop_geometry%>%
  ungroup()%>%
  filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0))%>%
  dplyr::select(n_Vacant, n_permits, n_transfers, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  filter(value > 0)%>%
  ggplot(aes(x=ySinceFire, fill=Variable))+
    geom_bar(position="dodge")+
      labs(title="Time Since Fire, by Outcome",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Years Since Fire")+
  ylab("Count")+
  scale_fill_manual(values= palette7_cats)+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
#filter to just fires, then combine those addresses with additional outcomes and building categories

panel_FirePositives1120 <- panel_OPAFireOpenData %>%
  filter(count > 0,
         year(quarter) < 2019,
         year(quarter) > 2011)%>% #filter out fires that occurred before 2011 after 2019, as to not weight too heavily toward ends of the data. Creates four year tails of outcomes without bias.
 dplyr::select(address)%>%
  distinct(address, .keep_all = TRUE)%>%
  left_join(panel_Positives, by="address")%>%
  left_join(dplyr::select(opa_dat_small_sf, address, category_code_description, mailing_street, building_code_description), by="address")%>%
  mutate(condo = ifelse(grepl("CONDO", building_code_description) == TRUE, TRUE, FALSE),
          owner_occ = ifelse(condo == FALSE & category_code_description != "MULTI FAMILY",
                             ifelse(address == mailing_street, "OWNER", "RENTER"),
                             NA))%>%
  dplyr::select(-mailing_street, -condo, -building_code_description)%>%
  st_drop_geometry()


  #mutate(diff = interval(lag(quarter, n=1),quarter) %/% years(1))    
#Join to get fire incident number and date
#calculate the difference between the quarter of the incident date and the quarter of the outcome
panel_FirePositivesDiff_All <- panel_FirePositives1120%>%
  anti_join(nFires_perAddress_Outliers, by="address")%>% #take out outliers of 3+ fires
  left_join(st_drop_geometry(dplyr::select(dat, incident_number,address, quarter)), by=c("address"))%>%
  group_by(incident_number)%>% #some addresses have multiple fires, so we use incident number instead
  mutate(mSinceFire = interval(quarter.y, quarter.x) %/% months(1),
         ySinceFire = mSinceFire / 12,
         cat_code = toupper(category_code_description))%>%
  filter(mSinceFire <= 48 &  mSinceFire >= -48)%>% #limiting outcomes to four years for relevancy

  dplyr::select(-category_code_description)%>%
  st_as_sf()
Plot<-panel_FirePositivesDiff_All%>%
  st_drop_geometry%>%
  ungroup()%>%
  filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0))%>%
  dplyr::select(n_Vacant, n_permits, n_transfers, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  filter(value > 0)%>%
  ggplot(aes(x=ySinceFire, fill=Variable))+
    geom_bar(position="dodge", just = 1)+
      labs(title="Number of Vacancies, Permits, and Transfers by Time Since Fire",
       subtitle="Philadelphia County, Fires Occuring Between 2012-2018 (Buildings with 1-2 Fires)")+
  xlab("Years Since Fire (By Quarter)")+
  ylab("Quantity of Outcomes")+
  scale_fill_manual(values = palette7_cats )+
  scale_x_continuous(breaks = c(-4, -3, -2, -1, 0, 1, 2, 3, 4))+
    theme(panel.background = element_rect(fill = "#f3efe0"))+
  geom_vline(xintercept = 0, color = "#7b230b", size = 1, linetype = "dashed")

Plot


#Measuring Time between fire and Permit Requests, By Ownership
panel_FirePositivesDiff_All%>%
  ungroup()%>%
  st_drop_geometry()%>%
  filter(n_permits > 0)%>%
  dplyr::select(owner_occ, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  ggplot(aes(x=ySinceFire, fill=value))+
    geom_bar(position="dodge", just = 1)+
      labs(title= "How Soon Does a Permit Request Happen After A Fire, by Owner/Renter",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Years from Fire (Rounded to Quarter)")+
  ylab("Number of Permit Requests")+
  scale_fill_manual(values=palette7_cats)+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))+
   scale_x_continuous(breaks = c(-4, -3, -2, -1, 0, 1, 2, 3, 4))+
    theme(panel.background = element_rect(fill = "#f3efe0"))+
  geom_vline(xintercept = 0, color = "#7b230b", size = 1, linetype = "dashed")
#Measuring Time between fire and Permit Requests, By Ownership
panel_FirePositivesDiff_All%>%
  ungroup()%>%
  st_drop_geometry()%>%
  filter(n_Vacant > 0)%>%
  dplyr::select(owner_occ, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  ggplot(aes(x=ySinceFire, fill=value))+
    geom_bar(position="dodge", just = 1)+
      labs(title="Proximity of Vacancy Complaints to Fires, by Owner/Renter",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Years from Fire (Rounded to Quarter)")+
  ylab("Frequency")+
  scale_fill_manual(values=palette7_cats)+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))+
   scale_x_continuous(breaks = c(-4, -3, -2, -1, 0, 1, 2, 3, 4))+
    theme(panel.background = element_rect(fill = "#f3efe0"))+
  geom_vline(xintercept = 0, color = "#7b230b", size = 1, linetype = "dashed")
#Measuring Time between fire and Permit Requests, By Ownership
panel_FirePositivesDiff_All%>%
  ungroup()%>%
  st_drop_geometry()%>%
  filter(n_transfers > 0)%>%
  dplyr::select(owner_occ, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  ggplot(aes(x=ySinceFire, fill=value))+
    geom_bar(position="dodge", just = 1)+
      labs(title="Time Proximity of Sales to Fires, by Owner/Renter",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Years from Fire (Rounded to Quarter)")+
  ylab("Frequency")+
  scale_fill_manual(values=palette7_cats)+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))+
   scale_x_continuous(breaks = c(-4, -3, -2, -1, 0, 1, 2, 3, 4))+
    theme(panel.background = element_rect(fill = "#f3efe0"))+
  geom_vline(xintercept = 0, color = "#7b230b", size = 1, linetype = "dashed")
#Measuring Time between fire and Permit Requests, By Building Type
panel_FirePositivesDiff%>%
  ungroup()%>%
  st_drop_geometry()%>%
  filter(n_permits == 1)%>%
  dplyr::select(cat_code, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  filter(value > 0)%>%
  ggplot(aes(x=ySinceFire, fill=value))+
    geom_bar(position="dodge")+
      labs(title="Time Between Permit Request and Fire, by Type",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Years from Fire (Rounded to Quarter)")+
  ylab("Count")+
  scale_fill_manual(values=palette7_cats)+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
#Measuring Time between fire and Vacancy Complaints, By Building Type
panel_FirePositivesDiff%>%
  ungroup()%>%
  st_drop_geometry()%>%
  filter(n_Vacant == 1)%>%
  dplyr::select(cat_code, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  filter(value > 0)%>%
  ggplot(aes(x=ySinceFire, fill=value))+
    geom_bar(position="dodge")+
      labs(title="Time Between vacancies and Fire, by Type",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Years from Fire (Rounded to Quarter)")+
  ylab("Count")+
  scale_fill_manual(values=palette7_cats)+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
#Measuring Time between fire and Real Estate Transfers, By Building Type
panel_FirePositivesDiff%>%
  ungroup()%>%
  st_drop_geometry()%>%
  filter(n_transfers == 1)%>% #change this line to switch variables
  dplyr::select(cat_code, ySinceFire)%>%
  gather(Variable, value, -ySinceFire)%>%
  filter(value > 0)%>%
  ggplot(aes(x=ySinceFire, fill=value))+
    geom_bar(position="dodge")+
      labs(title="Time Between Transfer Request and Fire, by Type",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Years from Fire (Rounded to Quarter)")+
  ylab("Count")+
  scale_fill_manual(values=palette7_cats)+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))

## 3.5.2. Mapping Time Between Fire and Outcome


panel_FirePositivesDiff1 <- panel_FirePositivesDiff

panel_FirePositivesDiff1$X <- st_coordinates(panel_FirePositivesDiff1)[, 1]
panel_FirePositivesDiff1$Y <- st_coordinates(panel_FirePositivesDiff1)[, 2]
panel_FirePositivesDiff1 <- panel_FirePositivesDiff1 %>% st_drop_geometry() %>% drop_na()
panel_FirePositivesDiff1 <- st_as_sf(panel_FirePositivesDiff1, coords = c("Y", "X"), crs = "EPSG:4326")
#Transfers 
map_plot<- panel_FirePositivesDiff1%>%
  filter(n_transfers > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(color=ySinceFire), size=0.9)+
    scale_color_viridis_c()+
    labs(title = "Fires with Property Transfers, by Years since Fire") +
    mapTheme()
 
map_plot

#Permits
map_plot2<- panel_FirePositivesDiff1%>%
  filter(n_permits > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(color=ySinceFire), size=0.9)+
    scale_color_viridis_c()+
    labs(title = "Fires with Property Permits, by Years since Fire") +
    mapTheme()
 
map_plot2


#Vacancies 
map_plot3<- panel_FirePositivesDiff1%>%
  filter(n_Vacant > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(color=ySinceFire), size=0.9)+
    scale_color_viridis()+
    labs(title = "Fires with Vacancy Complaints, by Years since Fire") +
    mapTheme()
 
map_plot3
Vacant_Plot<-panelResults_Positive %>%
  filter(count > 0 | fireVacant2Q > 0, year(quarter) > 2013)%>%
  dplyr::select(quarter, count, severity_index, fireVacant2Q)%>%
  group_by(quarter)%>%
  summarize(totalFires = sum(count),
            Vacancy = sum(fireVacant2Q))%>%
  gather(Variable, value, -quarter)%>%
  ggplot(aes(x=quarter, y=value, fill =Variable))+
    geom_bar(stat="identity", position="dodge")+
      labs(title="Fires with Vacancy Within 6 Months",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Date, Rounded to Beginning of Quarter")+
  ylab("Number of Fires")+
  scale_fill_manual(values=c("#e19825", "#20b1ae"))+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
 Vacant_Plot 

### **Mapping outcome distribution across PHL(errors on these maps)**{.unlisted .unnumbered}

opa_dat_small_sf_join <- opa_dat_small_sf %>%
  dplyr::select(address)


opaFireVacant_PlotData <- opa_dat_small_sf_join %>%
  left_join(filter(panelResults_Positive, fireVacant2Q > 0), by="address")%>%
  drop_na(fireVacant2Q, geometry)%>%
  st_as_sf%>%
  st_set_crs("EPSG:4326")

opaFireVacant_PlotData <- opa_dat_small_sf_join %>%
  left_join(filter(panelResults_Positive, fireVacant2Q > 0), by="address")%>%
  drop_na(fireVacant2Q, geometry)%>%
  st_as_sf%>%
  st_set_crs("EPSG:4326")

g(opaFireVacant_PlotData)
table(opaFireVacant_PlotData$year)
# getting error for following code chunk
#ggplot()+
  #geom_sf(data=acsTractsPHL.2020, fill='#f0efe0', color='dark gray')+
  #geom_sf(data=opaFireVacant_PlotData, aes(color= year), alpha=0.5)+
  #scale_color_viridis_c()+
  #labs(title = "Fires with Vacancy Complaints within 6 Months") +
 # mapTheme()
 

ggplot()+
  geom_sf(data=acsTractsPHL.2020, fill='#f0efe0', color='dark gray')+
  geom_sf(data=vacantLI, color="#8F8172", alpha=1/10, size=0.2)+
  labs(title = "Fires with Vacancy Complaints within 6 Months") +
  mapTheme()

### Categorical Differences 
#DONT RUN
#plot3<-panelResults_Positive %>%
  filter(count > 0 | firePermit2Q > 0)%>%
  dplyr::select(quarter, count, severity_index, firePermit2Q)%>%
  group_by(quarter)%>%
  summarize(totalFires = sum(count),
            permits = sum(firePermit2Q))%>%
  gather(Variable, value, -quarter)%>%
  ggplot(aes(x=quarter, y=value, fill =Variable))+
    geom_bar(stat="identity", position="dodge")+
      labs(title="Fires with Permit Records Within 6 Months",
       subtitle="Philadelphia County, 2009-2022")+
  xlab("Date, Rounded to Beginning of Quarter")+
  ylab("Number of Fires")+
  scale_fill_manual(values=c("#20b1ae", "#e19825"))+
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "#f3efe0"))
 plot3 
#g(panelResults_Positive)
opaFirePermit_PlotData <- opa_dat_small_sf_join %>%
  left_join(filter(panelResults_Positive, firePermit2Q > 0), by="address")%>%
  drop_na(firePermit2Q, geometry)%>%
  st_as_sf%>%
  st_set_crs("EPSG:4326")

opaFirePermit_PlotData <-
  opaFirePermit_PlotData[acsTractsPHL.2020,]%>%
  mutate(year = year(quarter))

g(opaFirePermit_PlotData)

#Error 
table(opaFirePermit_PlotData$year)

#Error 
ggplot()+
  geom_sf(data=acsTractsPHL.2020, fill='#f0efe0', color='dark gray')+
  geom_sf(data=opaFirePermit_PlotData, aes(color=year), alpha=0.5)+
  scale_color_viridis_c()+
  labs(title = "Fires with L&I Permits within 6 Months") +
  mapTheme()

#dDONT RUN
#g(panelResults_Positive)
#TransferPlot<-panelResults_Positive %>%
  #filter(count > 0 | fireTransfer2Q > 0)%>%
  #dplyr::select(quarter, count, severity_index, fireTransfer2Q)%>%
 # group_by(quarter)%>%
 # summarize(totalFires = sum(count),
            #transfers = sum(fireTransfer2Q))%>%
  #gather(Variable, value, -quarter)%>%
#  ggplot(aes(x=quarter, y=value, fill =Variable))+
  #  geom_bar(stat="identity", position="dodge")+
  #    labs(title="Fires with Real Estate Transfers Within 6 Months",
  #     subtitle="Philadelphia County, 2009-2022")+
 # xlab("Date, Rounded to Beginning of Quarter")+
#  ylab("Number of Fires")+
 # scale_fill_manual(values=c("#e19825", "#20b1ae"))+
 #   theme(axis.text.x = element_text(angle = 45, hjust = 1),
 #   panel.background = element_rect(fill = "#f3efe0"))
# TransferPlot 

#SAME HERE DONT RUN THIS CHUNK
opaFireTransfer_PlotData$X <- st_coordinates(opaFireTransfer_PlotData)[, 1]
opaFireTransfer_PlotData$Y <- st_coordinates(opaFireTransfer_PlotData)[, 2]
opaFireTransfer_PlotData <- opaFireTransfer_PlotData %>% st_drop_geometry() %>% drop_na()
opaFireTransfer_PlotData <- st_as_sf(opaFireTransfer_PlotData, coords = c("Y", "X"), crs = "EPSG:4326")




opaFireTransfer_PlotData <- opa_dat_small_sf_join %>%
  left_join(filter(panelResults_Positive, fireTransfer2Q > 0), by="address")%>%
  drop_na(fireTransfer2Q, geometry)%>%
  st_as_sf%>%
  st_set_crs("EPSG:4326")
 

opaFireTransfer_PlotData <-
  opaFireTransfer_PlotData[acsTractsPHL.2020,]%>%
  mutate(year = year(quarter))

g(opaFireTransfer_PlotData)
table(opaFireTransfer_PlotData$year)

ggplot()+
  geom_sf(data=acsTractsPHL.2020, fill='#20b1ae', color='dark gray')+
  geom_sf(data=opaFireTransfer_PlotData, aes(color=year), alpha=0.5)+
  scale_color_viridis_c()+
  labs(title = "Fires with Real Estate Transfers within 6 Months") +
  mapTheme()







#Before copying Ben;s code 
#prop <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/propMatch_v02.csv") %>% dplyr::select(address, fire)
prop <- read_csv("PhilaFireData/propMatch_v02.csv")

prop_fire_panel <- left_join(prop,panel_Results2Y, by="address")
prop_fire_panel <- prop_fire_panel[!duplicated(prop_fire_panel$incident_number),] # Removing Duplicates

# Removing duplicates
dat <- dat[!duplicated(dat$incident_number),]

dat_boolean <- dat%>%
  st_drop_geometry%>%
  mutate(fire = 1)%>%
  dplyr::select(address, fire)

taney <- dat %>% filter(grepl("N TANEY", address))

#rowhomes <- opa_ps %>% filter(grepl("ROW", building_code_description))

#nrow(rowhomes)/nrow(opa_dat)



census_api_key("3c9540be1434ac4b38e6e55d60e8ee95909f2254", overwrite = TRUE)

#Variables
acs_variable_list.2020 <- load_variables(2020, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)

acs_variable_list.2016 <- load_variables(2016, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)

acs_vars <- c("B02001_001E", #Total Population
              "B02001_002E", #Total Population White Only
              "B19013_001E", #Median HH Income
              "B25002_001E", #Total Units
              "B25002_003E" #Total Vacant Units
              )

acsTractsPHL.2020 <- get_acs(geography = "tract",
                             year = 2020, 
                             variables = acs_vars, 
                             geometry = TRUE, 
                             state = "PA", 
                             county = "Philadelphia", 
                             output = "wide")%>%
  rename(Total_Pop = B02001_001E,
         Total_Pop_WhiteOnly = B02001_002E,
         Med_Income = B19013_001E,
         Total_Vacant = B25002_003E,
         Total_Units = B25002_001E)%>%
  mutate(Perc_White = Total_Pop_WhiteOnly/Total_Pop,
         Perc_Vacant = Total_Vacant*100/Total_Units)%>%
  dplyr::select(-B02001_001M, -B02001_002M, -B19013_001M, -Total_Pop_WhiteOnly)%>%
  st_set_crs("EPSG:4326")

# Reading the data
opa_dat <- read_csv("PhilaFireData/OpenDataPhilly-opa_properties_public.csv")

# Creating geometry for the properties
opa_dat <- opa_dat%>%
  drop_na(lat, lng)%>%
  st_as_sf(coords = c("lat", "lng"),
           crs = "EPSG:4326")

### Cleaning
# Reducing columns
opa_ps <- opa_dat[!duplicated(opa_dat$location),] %>%
  dplyr::select(location, category_code, category_code_description, building_code, building_code_description, total_area, total_livable_area, market_value, mailing_street, number_of_bedrooms, number_of_bathrooms, number_stories, year_built, quality_grade)%>%
  filter(category_code == 1 | category_code == 2 | category_code == 3)%>%
  mutate(Price_Sqft = ifelse(is.na(total_livable_area) == FALSE & total_livable_area != 0, market_value / total_area, NA),
         quality_grade_mod = case_when(quality_grade == 1 ~ "E",
                                   quality_grade == 2 ~ "D",
                                   quality_grade == 3 ~ "C",
                                   quality_grade == 4 ~ "B",
                                   quality_grade == 5 ~ "A",
                                   quality_grade == 6 ~ "A+", 
                                   TRUE ~ quality_grade,
                                   ))%>%
  rename(address = location)%>%
  mutate(condo = ifelse(grepl("CONDO", building_code_description) == TRUE, TRUE, FALSE),
        owner_occ = ifelse(condo == FALSE & category_code_description != "MULTI FAMILY",
                           ifelse(address == mailing_street, TRUE, FALSE),
                           NA))%>%
  filter(Price_Sqft < 100000)%>%
  dplyr::select(-quality_grade)

opa_ps$quality_grade_mod <- factor(opa_ps$quality_grade_mod, order = TRUE,
                               levels = c("A+", "A", "A-", "B+", "B", "B-", "C+", "C", "C-", "D+", "D", "D-", "E+", "E", "E-"))

rowhomes <- opa_ps %>% filter(grepl("ROW", building_code_description))

nrow(rowhomes)/nrow(opa_dat)


## Loading in neighborhoods 
#https://github.com/azavea/geo-data

nhoods <- st_read("~/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson")%>%
  st_set_crs("EPSG:4326")

## Combining Data

#Fires
opa_fire <- opa_ps%>%
  left_join(dat_boolean, by="address")

opa_fire$fire <- as.factor(ifelse(is.na(opa_fire$fire), 0, 1))

#ACS and NHoods
opa_sf <- opa_fire%>%
  st_join(acsTractsPHL.2020)%>%
  st_join(nhoods)
# PreTests
g(filter(opa_sf, total_livable_area > 10000))

opa_sf_cov <- c('total_livable_area', 'number_of_bedrooms', 'Price_Sqft', 'Med_Income', 'owner_occ', 'Perc_White', 'quality_grade_mod', 'name')

opa_sf %>%
  filter(total_livable_area < 10000,
       Price_Sqft < 10000)%>%
  st_drop_geometry()%>%
  group_by(fire) %>%
  dplyr::select(one_of(opa_sf_cov)) %>%
  summarise_all(funs(mean(., na.rm = T)))

# Propensity Score Estimation
#Getting Errors following chunk 
g(opa_sf)

m_ps <- glm(fire ~ total_livable_area + number_of_bedrooms + Price_Sqft + Med_Income + Perc_White, 
            family = binomial(), data = opa_sf)

summary(m_ps)


m_ps2 <- glm(fire ~ total_livable_area + number_of_bedrooms + Price_Sqft + Med_Income + Perc_White + owner_occ + quality_grade_mod + name, 
            family = binomial(), data = opa_sf)

summary(m_ps2)



prs_df <- data.frame(pr_score = predict(m_ps2, type = "response"),
                     fire = m_ps2$model$fire)
head(prs_df)

# REgion of common support
labs <- paste("Actual school type attended:", c("Fire", "No Fire"))
prs_df %>%
  mutate(fire = ifelse(fire == 1, labs[1], labs[2])) %>%
  ggplot(aes(x = pr_score)) +
  geom_histogram(color = "white") +
  facet_wrap(~fire) +
  xlab("Probability of Having A Fire") +
  theme_bw()

# Matching Algorithm
opa_nomiss <- opa_sf %>%  # MatchIt does not allow missing values
  dplyr::select(address, fire, one_of(opa_sf_cov)) %>%
  na.omit()

mod_match <- matchit(fire ~ total_livable_area + number_of_bedrooms + Price_Sqft + Med_Income + Perc_White + owner_occ + quality_grade_mod + name,
                     method = "nearest", data = opa_nomiss)


dta_m <- match.data(mod_match)

#Difference In Means
dta_m %>%
  st_drop_geometry()%>%
  group_by(fire) %>%
  dplyr::select(one_of(opa_sf_cov)) %>%
  summarize_all(funs(mean))

# Calculating Differences Between Prop-Matched Structures

## Loading In Data
panel_opa <- read_csv("~/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/panel_OPA_Fire_OpenData.csv")

#panel_opa <- panel_OPA_Fire_OpenData
# Filter Panel By Addresses in Prop-Matched Set

#assign fire time and incident number to one member of the subclass
dta_m_1 <- dta_m %>%
  filter(fire == 1)%>%
  left_join(st_drop_geometry(dplyr::select(dat, incident_number, address, quarter)))

#assign same data to other member of the subclass
dta_m_2 <- dta_m %>%
  filter(fire == 0)%>%
  left_join(st_drop_geometry(dplyr::select(dta_m_1, incident_number, quarter, subclass)), by="subclass")

dta_m_withTime <- rbind(dta_m_1, dta_m_2) #combine sets

panel_pm_4Y <- panel_opa%>%
  inner_join(dta_m_withTime, by="address")%>%
  mutate(mSinceFire = interval(quarter.y, quarter.x) %/% months(1),
         ySinceFire = mSinceFire / 12)%>%
  filter(mSinceFire >= -49,#Eliminate entries before fires (occurs because of incident_number group duplicates)
         mSinceFire < 49, #Eliminate entries after four years, as they are irrelevant (arbitrary)
         !(count > 0 & ySinceFire > 0))%>% #For addresses with multiple incidents, take out repeated fire observ's
  st_as_sf()

panel_pm_4Y$count <- as.factor(panel_pm_4Y$count)


panel_pm_4YForward <- panel_opa%>%
  inner_join(dta_m_withTime, by="address")%>%
  mutate(mSinceFire = interval(quarter.y, quarter.x) %/% months(1),
         ySinceFire = mSinceFire / 12)%>%
  filter(mSinceFire >= -1,#Eliminate entries before fires (occurs because of incident_number group duplicates)
         mSinceFire < 49, #Eliminate entries after four years, as they are irrelevant (arbitrary)
         !(count > 0 & ySinceFire > 0))%>% #For addresses with multiple incidents, take out repeated fire observ's
  st_as_sf()


#2 Year Outcomes for Each Incident
panel_pm_Results2Y <- panel_pm_4YForward %>%
    st_drop_geometry()%>%
    filter(., ySinceFire <= 2)%>%
    group_by(address, incident_number, subclass, fire)%>%
    summarize(count = ifelse(sum(count) > 0, 1, 0),
              severity_index = max(severity_index),
              outcome_vacant = sum(n_Vacant),
              outcome_permit = sum(n_permits),
              outcome_transfer = sum(n_transfers),
              quarter = min(quarter.x))
  

#Graph Outcome Proximity to Fires
#Distribution of Outcomes, Measuring Time between Fire and Outcome
panel_pm_4Y%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  st_drop_geometry%>%
  ungroup()%>%
  filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0))%>%
  dplyr::select(n_transfers, n_Vacant, n_permits, ySinceFire, fire)%>%
  gather(Variable, value, -ySinceFire, -fire)%>%
  filter(value > 0)%>%
    count(ySinceFire, fire, Variable)%>%
  ggplot(aes(x=ySinceFire, y=n, fill=fire))+
    geom_col(position="dodge", just = 0.5)+
      labs(title="Permits, Vacancies, and Sales By Time From Fire, Propensity-Matched Set",
       subtitle="Philadelphia County, 2013-2018, By Quarter")+
      geom_smooth(se=FALSE, aes(color = fire, group = paste(fire, (ySinceFire >= 0))), size = 1.5)+
  facet_wrap(~Variable) + 
  xlab("Years Since Fire")+
  ylab("Count of Outcomes")+
  scale_x_continuous(breaks = c(-4, -3, -2, -1, 0, 1, 2, 3, 4))+
    theme(panel.background = element_rect(fill = "#f3efe0"))+
  scale_fill_manual(values = c("#b18e6f", "#b9cfcf"))+
  scale_color_manual(values = c("#3f3128", "#20b1ae"))+
  geom_vline(xintercept = 0, color = "#d55816", size = 1, linetype = "dashed")


## Map of Same Outcomes (Forward) 
#Fires
panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(fire == 1)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(color = "#20b1ae", size=0.9, alpha = 0.5)+
    labs(title = "Fires with Property Transfers, by Years since Fire") +
    mapTheme()

#No Fires
panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(fire == 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(color = "#3f3128", size=0.9, alpha = 0.5)+
    labs(title = "Fires with Property Transfers, by Years since Fire") +
    mapTheme()

#Distribution of Outcomes, Measuring Time between Fire and Outcome
panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012,
         fire == 1)%>%
  filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0))%>%
  dplyr::select(n_transfers, n_Vacant, n_permits, ySinceFire, fire)%>%
  gather(Variable, value, -ySinceFire, -fire, -geometry)%>%
  filter(value > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(color=ySinceFire), size=0.9)+
    scale_color_viridis_c()+
    facet_wrap(~Variable)+
    labs(title = "Fires with Property Transfers, by Years since Fire") +
    mapTheme()
  
#Distribution of Outcomes, Measuring Time between Fire and Outcome
panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012,
         fire == 0)%>%
  filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0))%>%
  dplyr::select(n_transfers, n_Vacant, n_permits, ySinceFire, fire)%>%
  gather(Variable, value, -ySinceFire, -fire, -geometry)%>%
  filter(value > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(color=ySinceFire), size=0.9)+
    scale_color_viridis_c()+
    facet_wrap(~Variable)+
    labs(title = "Prop-Matched Set, No Fires, by Years since Fire") +
    mapTheme()
  


#Filter results between 0 and 1 year, 1.25 and 2 years, 2.25 and 3 years, 3.25 and 4 years

panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0))%>%
  mutate(ySinceFire_floor = floor(ySinceFire))%>%
  dplyr::select(n_transfers, n_Vacant, n_permits, fire, ySinceFire_floor)%>%
  gather(Variable, value, -ySinceFire_floor, -fire, -geometry)%>%
  filter(value > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(color=fire), size=0.75)+
    scale_color_manual(values= c("#3f3128", "#e19825"))+
    facet_wrap(~Variable + ySinceFire_floor, nrow=3, ncol=5)+
    labs(title = "Permits in Properties With Fires, by Years After The Fire") +
    mapTheme()


palette5_permits <- c("#BAE4B6", "#85C07F", "#5E8C59", "#4A7246", "#2D4A2A")

#Divide permits by before 1 and after 1 year

delayed.labs <- c("0-1 Years", "1-4 Years")
names(delayed.labs) <- c(0, 1)

panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_permits == 0))%>%
  mutate(delayed = ifelse(ySinceFire <= 1, 0, 1))%>%
  dplyr::select(n_permits, fire, delayed)%>%
  gather(Variable, value, -delayed, -fire, -geometry)%>%
  filter(value > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, aes(fill=q5(Med_Income)), color=NA)+
    scale_fill_manual(values=palette5_permits, labels=qBr(acsTractsPHL.2020, "Med_Income"))+
    geom_sf(aes(color=fire), size=0.50)+
    scale_color_manual(values= c("#20B1AE", "#F1C82B"))+
    facet_wrap(~delayed, nrow=1, labeller = labeller(delayed = delayed.labs))+
    labs(title = "Permits in Properties With Fires, by Years After The Fire") +
    mapTheme()




palette5_vacant <- c("#F0EBDE", "#F9F8EB", "#F0EBDE", "#D1BCA6", "#3A352F")

#Divide vacancies by before 1 and after 1 year

lingering <- c("0-1 Years", "1-4 Years")
names(lingering) <- c(0, 1)

panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_Vacant == 0))%>%
  mutate(lingering = ifelse(ySinceFire <= 1, 0, 1))%>%
  dplyr::select(n_Vacant, fire, lingering)%>%
  gather(Variable, value, -lingering, -fire, -geometry)%>%
  filter(value > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, aes(fill=q5(Perc_Vacant)), color=NA)+
        scale_fill_manual(values=palette5_vacant, labels=qBr(acsTractsPHL.2020, "Perc_Vacant"))+
    geom_sf(aes(color=fire), size=0.50)+
    scale_color_manual(values= c("#20B1AE", "#F1C82B"))+
    facet_wrap(~lingering, nrow=1, labeller = labeller(lingering= lingering))+
    labs(title = "Matched-Set Vacancy Reports Cluster in Areas of High Vacancy",
         subtitle = "Philadelphia County, Fires from 2013-2018, Vacancy Reports from 2013-2022",
         credit = "Source: US ACS and Philadelphia Fire Department") +
    mapTheme()


library(gridExtra)

delayedMap <- panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_permits == 0), fire == 1)%>%
  mutate(delayed = ifelse(ySinceFire <= 1, 0, 1))%>%
  dplyr::select(n_permits, delayed, name)%>%
  st_drop_geometry()%>%
  group_by(name) %>%
  summarize(delayRate = sum(delayed)/sum(n_permits>0), 
            permits = sum(n_permits))%>%
  arrange(desc(delayRate))%>%
  filter(permits > 1)%>%
  left_join(nhoods, by="name")%>%
  st_as_sf()%>%
  ggplot()+
    geom_sf(data=nhoods, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(fill=delayRate))+
    scale_fill_viridis_c()+
    labs(title = "Rate of Delayed Permits",
         subtitle = "Philadelphia County, 2009-2022") +mapTheme()
#Neighborhoods with Permits Over 1 Year After A Fire vs Total Fire-Property Permits
lingeringMap <- panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_Vacant == 0), fire == 1)%>%
  mutate(lingering = ifelse(ySinceFire <= 1, 0, 1))%>%
  dplyr::select(n_Vacant, lingering, name)%>%
  st_drop_geometry()%>%
  group_by(name) %>%
  summarize(lingeringRate = sum(lingering)/sum(n_Vacant>0), 
            vacancies = sum(n_Vacant))%>%
  arrange(desc(lingeringRate))%>%
  filter(vacancies > 1)%>%
  left_join(nhoods, by="name")%>%
  st_as_sf()%>%
  ggplot()+
    geom_sf(data=nhoods, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(fill=lingeringRate))+
    scale_fill_viridis_c()+
    labs(title = "Rate of Delayed Vacancies",
         subtitle = "Philadelphia County, 2009-2022") +
    mapTheme()

grid.arrange(delayedMap, lingeringMap, nrow=1)





#g(panel_pm_4YForward)


transfers <- read_csv("PhilaFireData/OpenDataPhilly-transfers.csv")
#Select relevant fields and filter to fire data range
transfers_data <- transfers%>%
  dplyr:: select(objectid, recording_date, street_address, document_type, grantors, grantees)%>%
  filter(year(recording_date) > 2008,
         !is.na(street_address))%>%
  rename(address = street_address)%>%
  mutate(quarter = as_date(floor_date(recording_date, unit="quarter")))


toMatch <- c('LLC', 'LP', 'REO', 'Investments', 'Homes', 'Trust', 'Corp', 'Inc')

investors <- unique(grep(paste(toMatch,collapse="|"), 
                        transfers_data$grantees, value=TRUE))

#Check for owners with 5+ properties
investors_opa <- opa_dat%>%
  st_drop_geometry()%>%
  dplyr::select(owner_1)%>%
  group_by(owner_1)%>%
  filter(n()>5)

investors_list_opa <- unique(investors_opa)

transfers_data <- transfers_data%>%
  mutate(investorBuy = ifelse(grantees %in% investors | grantees %in% investors_list_opa, TRUE, FALSE),
         investorSell = ifelse(grantors %in% investors | grantors %in% investors_list_opa, TRUE, FALSE))%>%
  mutate(homeownerToInvestor = ifelse(investorBuy == TRUE & investorSell== FALSE, TRUE, FALSE),
         investorToHomeowner = ifelse(investorSell == TRUE & investorBuy== FALSE, TRUE, FALSE))

#join transfers data to property by address and quarter

panel_pm_4Y_sales <- panel_pm_4Y%>%
  rename(quarter = quarter.x)%>%
  left_join(dplyr::select(transfers_data, address, quarter, homeownerToInvestor, investorToHomeowner), by=c("address", "quarter"))



panel_pm_4Y_sales%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  st_drop_geometry%>%
  ungroup()%>%
  filter(!(n_transfers == 0))%>%
  dplyr::select(ySinceFire, fire, homeownerToInvestor)%>%
  gather(Variable, value, -ySinceFire, -fire)%>%
  filter(value == TRUE)%>%
    count(ySinceFire, fire, Variable)%>%
  ggplot(aes(x=ySinceFire, y=n, fill=fire))+
    geom_col(position="dodge", just = 0.5)+
      labs(title="Property Transfers: Sales from Homeowners To Investors by Time to Fire",
       subtitle="Philadelphia County, Fires in 2013-2018, Sales in 2009-2022, By Quarter")+
      geom_smooth(se=FALSE, aes(color = fire, group = paste(fire, (ySinceFire >= 0))), size = 1.5)+
  facet_wrap(~Variable) + 
  xlab("Years Since Fire")+
  ylab("Count of Outcomes")+
  scale_x_continuous(breaks = c(-4, -3, -2, -1, 0, 1, 2, 3, 4))+
    theme(panel.background = element_rect(fill = "#f3efe0"))+
  scale_fill_manual(values = c("#b18e6f", "#b9cfcf"))+
  scale_color_manual(values = c("#3f3128", "#20b1ae"))+
  geom_vline(xintercept = 0, color = "#d55816", size = 1, linetype = "dashed")


panel_pm_4Y_sales%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  st_drop_geometry%>%
  ungroup()%>%
  filter(!(n_transfers == 0))%>%
  dplyr::select(ySinceFire, fire, investorToHomeowner)%>%
  gather(Variable, value, -ySinceFire, -fire)%>%
  filter(value == TRUE)%>%
    count(ySinceFire, fire, Variable)%>%
  ggplot(aes(x=ySinceFire, y=n, fill=fire))+
    geom_col(position="dodge", just = 0.5)+
      labs(title="Property Transfers: Sales from Investors To Homeowners by Time to Fire",
       subtitle="Philadelphia County, Fires in 2013-2018, Sales in 2009-2022, By Quarter")+
      geom_smooth(se=FALSE, aes(color = fire, group = paste(fire, (ySinceFire >= 0))), size = 1.5)+
  facet_wrap(~Variable) + 
  xlab("Years Since Fire")+
  ylab("Count of Outcomes")+
  scale_x_continuous(breaks = c(-4, -3, -2, -1, 0, 1, 2, 3, 4))+
    theme(panel.background = element_rect(fill = "#f3efe0"))+
  scale_fill_manual(values = c("#b18e6f", "#b9cfcf"))+
  scale_color_manual(values = c("#3f3128", "#20b1ae"))+
  geom_vline(xintercept = 0, color = "#d55816", size = 1, linetype = "dashed")

#this might need to go 
panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_permits == 0), fire == 1)%>%
  mutate(delayed = ifelse(ySinceFire <= 1, 0, 1))%>%
  dplyr::select(n_permits, delayed, name)%>%
  st_drop_geometry()%>%
  group_by(name) %>%
  summarize(delayRate = sum(delayed)/sum(n_permits>0), 
            permits = sum(n_permits))%>%
  arrange(desc(delayRate))%>%
  filter(permits > 1)%>%
  left_join(nhoods, by="name")%>%
  st_as_sf()%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(fill=delayRate))+
    scale_fill_viridis_c()+
    labs(title = "Percentage of properties with permits 1 year after a fire") +
    mapTheme()

#g(panel_pm_4YForward)


#Divide vacancies by before 1 and after 1 year

panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_Vacant == 0))%>%
  mutate(lingering = ifelse(ySinceFire <= 1, 0, 1))%>%
  dplyr::select(n_Vacant, fire, lingering)%>%
  gather(Variable, value, -lingering, -fire, -geometry)%>%
  filter(value > 0)%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(color=fire), size=0.75)+
    scale_color_manual(values= c("#3f3128", "#20b1ae"))+
    facet_wrap(~lingering, nrow=1)+
    labs(title = "Vacancies in Properties With Fires, by Years After The Fire") +
    mapTheme()


panel_pm_4YForward%>%
  filter(year(quarter.y) < 2019 & year(quarter.y) > 2012)%>%
  filter(!(n_Vacant == 0), fire == 1)%>%
  mutate(lingering = ifelse(ySinceFire <= 1, 0, 1))%>%
  dplyr::select(n_Vacant, lingering, name)%>%
  st_drop_geometry()%>%
  group_by(name) %>%
  summarize(lingeringRate = sum(lingering)/sum(n_Vacant>0), 
            vacancies = sum(n_Vacant))%>%
  arrange(desc(lingeringRate))%>%
  filter(vacancies > 1)%>%
  left_join(nhoods, by="name")%>%
  st_as_sf()%>%
  ggplot()+
    geom_sf(data=acsTractsPHL.2020, fill='#f3efe0', color='dark gray')+
    geom_sf(aes(fill=lingeringRate))+
    scale_fill_viridis_c()+
    labs(title = "Rate of Lingering Vacancies by Neighborhood",
         subtitle = "Neighborhoods with Vacancies 1 Year After A Fire vs Total Vacancies") +
    mapTheme()

#g(panel_pm_4YForward)

# Create a table with three columns
img1 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /1.jpg"
img2 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /2.jpg"
img3 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /3.jpg"
img4 <- "/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /4.jpg"
tbl <- tibble::tibble(
  Photo1 = paste0("![One month before fire, 2017](", img1, ")"),
  Photo2 = paste0("![One year later, 2018](", img2, ")"),
  Photo3 = paste0("![Vacant, 2019](", img3, ")"),
  Photo4 = paste0("![Sold, 2022](", img4, ")")
)

# Display the table without borders
kable(tbl, format = "html", escape = FALSE) %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE) %>%
  column_spec(1:4, width = "25%")
load.fun <- function(x) { 
  x <- as.character(x) 
  if(isTRUE(x %in% .packages(all.available=TRUE))) { 
    eval(parse(text=paste("require(", x, ")", sep=""))) 
    print(paste(c(x, " : already installed; requiring"), collapse=''))
  } else { 
    #update.packages()
    print(paste(c(x, " : not installed; installing"), collapse=''))
    eval(parse(text=paste("install.packages('", x, "')", sep=""))) 
    print(paste(c(x, " : installed and requiring"), collapse=''))
    eval(parse(text=paste("require(", x, ")", sep=""))) 
  } 
} 

packages = c("bayesplot", "lme4","RcppEigen", "tidyr", "broom", "caret", "dials", "doParallel", "e1071", "earth",
             "ggrepel", "glmnet", "ipred", "klaR", "kknn", "pROC", "rpart", "randomForest",
             "sessioninfo", "tidymodels","ranger", "recipes", "workflows", "themis","xgboost",
             "sf", "nngeo", "mapview")

for(i in seq_along(packages)){
  packge <- as.character(packages[i])
  load.fun(packge)
}
session_info()
library(gridExtra)

#fire_panel_nogeo <- st_read("~/Desktop/Coding/PhilaFireData/fire_panel_nogeo.csv")
#fire_panel_nogeo <- fire_panel_nogeo %>% dplyr::select(-field_1)
# OPA - Creating an OPA dataset to get just the variables we want for feature engineering and doing slight feature engineering
# Numeric
opa_dat_small_sf_num <- opa_dat_small_sf %>% 
  dplyr::select(address, total_livable_area, market_value, sale_price, number_of_bedrooms, number_of_bathrooms, number_stories, 
  interior_condition) %>% st_drop_geometry()
opa_dat_small_sf_num[is.na(opa_dat_small_sf_num)] <- 0 # Assigning 0 to NA for everything

# Categorical
opa_dat_small_sf_cat <- opa_dat_small_sf %>% 
  dplyr::select(address, quality_grade, year_built, central_air, exterior_condition, fireplaces, fuel, taxable_building, 
  topography, type_heater)  %>% st_drop_geometry()

opa_dat_small_sf_fe <- left_join(opa_dat_small_sf_num,opa_dat_small_sf_cat, by = "address") # Joining

# OPA - Quality Grade
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(grade = case_when(
    quality_grade == "A+" |quality_grade == "A" | quality_grade == "A-" | quality_grade == "B+" | quality_grade == "B" | 
      quality_grade == "B-"| quality_grade == "C+"| quality_grade == "C" ~ "Average or Better",
    TRUE ~ "Below Average")) 

# OPA - Fuel
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(fuel_type = case_when(
    fuel == "A" | fuel == "B" ~ "Fossil",
    fuel == "D" | fuel == "F" ~ "Solid", 
    fuel == "C" | fuel == "E" ~ "Alternative", 
    TRUE ~ "Other"))

# OPA - Topography
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(topo = case_when(
    topography == "A" ~ "Above Street Level",
    topography == "B" ~ "Below Street Level", 
    topography == "C" ~ "Flood Plain",
    topography == "D" ~ "Rocky",
    topography == "F" ~ "Street Level",
    TRUE ~ "Other"))

# OPA - Heater
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(electric_heater = case_when(
    type_heater == "C" ~ "Yes",
    TRUE ~ "No"))

# OPA - Central Air
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(air_central = case_when(
    central_air == "Y" ~ "Yes",
    TRUE ~ "No"))

# OPA - Fireplace
opa_dat_small_sf_fe <-
  opa_dat_small_sf_fe %>%
  mutate(fireplace = case_when(
    fireplaces > 0 ~ "Yes",
    TRUE ~ "No"))

# OPA - Year Built
opa_dat_small_sf_fe$year_built <- as.numeric(as.character(opa_dat_small_sf_fe$year_built))

# OPA - Cleaning
opa_dat_small_sf_fe <- opa_dat_small_sf_fe %>% dplyr::select(-quality_grade, -central_air, -fuel, -topography, -type_heater)
opa_dat_small_sf_fe[is.na(opa_dat_small_sf_fe)] <- 0 # Assigning 0 to NA for everything

# Fire Data - Creating an fire dataset to get just the variables we want for feature engineering
dat_fe <- dat %>% dplyr::select(address, incident_type, number_of_exposures, minor_damage, significant_damage, heavy_damage, extreme_damage)
panel_Results2Y_sf <- left_join(panel_Results2Y, opa_dat_small_sf_fe, by="address") # Joining with OPA FE variables
#panel_Results2Y_sf[is.na(panel_Results2Y_sf)] <- 0 # Assigning 0 to NA for everything

# Joining panel_Results2Y with Time outcome FE variables
panel_FirePositivesDiff_fe <- panel_FirePositivesDiff %>% 
  dplyr::select(address, mSinceFire, ySinceFire, cat_code, owner_occ) %>% st_drop_geometry() # Condensing dataframe
fire_panel <- merge(panel_Results2Y_sf,panel_FirePositivesDiff_fe ) # Join
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates
# Turning outcomes into binary
fire_panel <-fire_panel %>%
  mutate(vacant = case_when(
    outcome_vacant > 0 ~ 1,
    TRUE ~ 0
  ))

fire_panel <-fire_panel %>%
  mutate(permit = case_when(
    outcome_permit > 0 ~ 1,
    TRUE ~ 0
  ))

fire_panel <-fire_panel %>%
  mutate(transfer = case_when(
    outcome_transfer > 0 ~ 1,
    TRUE ~ 0
  ))

# Market Value
fire_panel <-
  fire_panel %>%
  mutate(mkt_value = case_when(
    market_value < 25000 ~ "Low", # < $250,000
    market_value > 24999 & market_value < 500000 ~ "Medium", # $250,000-$500,000
    TRUE ~ "High")) # $500,000+

# Number of Bedrooms
fire_panel <-
  fire_panel %>%
  mutate(bedrooms = case_when(
    number_of_bedrooms < 4 ~ "Low", # 1-3 Bedrooms
    number_of_bedrooms > 3 & number_of_bedrooms < 8 ~ "Medium", # 4-7 Bedrooms
    TRUE ~ "High")) # 8+

# Number of Bathrooms
fire_panel <-
  fire_panel %>%
  mutate(bathrooms = case_when(
    number_of_bathrooms < 3 ~ "Low", # 1-2 Bathroom
    number_of_bathrooms > 2 & number_of_bathrooms < 6 ~ "Medium", # 3-5 Bathrooms
    TRUE ~ "High")) # $6+

# Livable Area 
fire_panel <-
  fire_panel %>%
  mutate(area = case_when(
    total_livable_area < 5001 ~ "Low", # < 0-5,000 sqft
    total_livable_area > 4999 & total_livable_area < 10000 ~ "Medium", # 5,001-10,000 sqft
    TRUE ~ "High")) # 10,001+ sqft

# Year Built
fire_panel <-
  fire_panel %>%
  mutate(built_year = case_when(
    year_built < 1951 ~ "Up to 1951", 
    year_built > 1959 & year_built < 2000 ~ "1950 to 1999",
    TRUE ~ "2000 to Present")) 

# Interior Condition when fire happened - Fire in 2018 and condition is how it is today. remove?
fire_panel <-
  fire_panel %>%
  mutate(condition = case_when( 
    interior_condition > 0 & interior_condition < 5 ~ "Average", 
    interior_condition == 5 ~ "Below Average",
    interior_condition == 6 ~ "Vacant",
    interior_condition == 7 ~ "Sealed",
    interior_condition == 0 ~ "Unknown")) 

# Sale vs Market
fire_panel <-
  fire_panel %>%
  mutate(sale_value = case_when( 
    sale_price > market_value ~ "Above Market", 
    TRUE ~ "Below Market")) 

# Sale vs Market
fire_panel <-
  fire_panel %>%
  mutate(resident_type = case_when( 
    owner_occ == "TRUE" ~ "Owner", 
    owner_occ == "FALSE" ~ "Renter", 
    TRUE ~ "Other")) 

dat_geo <- dat %>% dplyr::select(address, geometry) # Geometry
fire_panel <- left_join(fire_panel, dat_geo, by = "address") # Joining to get geometry
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates
# Load Data
neighborhoods <- st_read("~/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson") %>% st_transform(crs = 3652) %>% dplyr::select(mapname, geometry) # Remove extra columns

fire_panel_sf <- st_as_sf(fire_panel) %>% st_transform(st_crs(neighborhoods)) # Make sf

fire_panel <- st_join(fire_panel_sf, neighborhoods) # Join
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates

fire_panel <- fire_panel %>% rename(neighborhood = mapname) # Renaming
#https://www.opendataphilly.org/dataset/redevelopment-certified-areas

rca <- st_read("~/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson") %>% 
  st_transform(crs = 3652) %>% dplyr::select(name)

fire_panel <- st_join(fire_panel, rca) # Join
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates

# 
fire_panel <-fire_panel %>%
  mutate(redevelopment_area = case_when(
    is.na(fire_panel$name) ~ 0, # Is not in a redevelopment certified area
    TRUE ~ 1 # Is in a redevelopment certified area
  ))

fire_panel <- fire_panel %>% dplyr::select(-name)
census_api_key("05b9c101eb2ee7dc7abb88140da527ce637ac07f", overwrite = TRUE)

## Income, Age, White, Black, AIAN, Asian, NHPI, Other, Two or More, Hispanic
acs_fe <- c("B19013_001", "B11005_001", "B02001_002", "B02001_003", "B02001_004", "B02001_005", "B02001_006", "B02001_007", "B02001_008","B03002_001")

acs_data <- load_variables(year = 2019, dataset = "acs5", cache = TRUE)
acs_features <- get_acs(geography = "tract",
                             year = 2020, 
                             variables = acs_fe, # Income, Age, Race
                             geometry = TRUE, 
                             state = "PA", 
                             county = "Philadelphia", 
                             output = "wide") %>% st_transform(st_crs(neighborhoods))
acs_features <- acs_features %>% rename(income = B19013_001E, 
                                        age = B11005_001E, 
                                        white = B02001_002E,
                                        black = B02001_003E,
                                        AIAN = B02001_004E,
                                        asian = B02001_005E,
                                        NHPI = B02001_006E,
                                        other = B02001_007E,
                                        multi = B02001_008E,
                                        hispanic = B03002_001E) %>% 
                      dplyr::select(NAME, income, age, white, black, AIAN, asian, NHPI, other, multi, hispanic)

## Percent Change in White Population
vars <- c("B03002_003") #Variable for white alone
philly_data_2010 <- get_acs(geography = "tract",
                        variables = vars,
                        year = 2010,
                        state = "PA",
                        county = "Philadelphia County",
                        key = census_api_key) %>% dplyr::select(NAME, estimate) %>% rename(estimate_2010 = estimate)
philly_data_2019 <- get_acs(geography = "tract",
                        variables = vars,
                        year = 2019,
                        state = "PA",
                        county = "Philadelphia County",
                        key = census_api_key) %>% dplyr::select(NAME, estimate) %>% rename(estimate_2019 = estimate)
philly_data <- left_join(philly_data_2010, philly_data_2019, by="NAME" ) # Joining frames together
philly_data <- philly_data %>% mutate(white_change = (estimate_2019 - estimate_2010)/estimate_2010) # Calculation
philly_data$white_change[!is.finite(philly_data$white_change)] <- 0 # Getting rid of Inf
philly_data$white_change <- round(philly_data$white_change, digits = 2) # Rounding

# Joins
acs_features_full <- left_join(acs_features, philly_data, by="NAME") %>% dplyr::select(-estimate_2010, -estimate_2019)
fire_panel <- st_join(fire_panel, acs_features_full) 
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates

## Poverty
fire_panel <-
  fire_panel %>%
  mutate(poverty = case_when( 
    income < 25000 ~ "Yes", 
    TRUE ~ "No")) 

## Income Level - 40k is best so far
fire_panel <-
  fire_panel %>%
  mutate(income_level = case_when(
    income < 40000 ~ "Low", 
    TRUE ~ "High")) 

## Price per sqft
fire_panel <-
  fire_panel %>%
  mutate(value_sqft = market_value/total_livable_area) 

fire_panel$value_sqft[!is.finite(fire_panel$value_sqft)] <- 0
fire_panel_nogeo <- fire_panel %>% st_drop_geometry()
fire_panel <- left_join(fire_panel_nogeo, dat_geo, by="address")
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates
fire_panel_nogeo <- fire_panel %>% st_drop_geometry()

schools <- st_read("~/Desktop/Spring 2023/MUSA_Practicum/PhilaFireData/Schools.geojson") %>% st_transform(crs = 3652)

# Setting up nearest neighbors
st_c <- st_coordinates
nn_function <- function(measureFrom,measureTo,k) {
  measureFrom_Matrix <-
    as.matrix(measureFrom)
  measureTo_Matrix <-
    as.matrix(measureTo)
  nn <-   
    get.knnx(measureTo, measureFrom, k)$nn.dist
  output <-
    as.data.frame(nn) %>%
    rownames_to_column(var = "thisPoint") %>%
    gather(points, point_distance, V1:ncol(.)) %>%
    arrange(as.numeric(thisPoint)) %>%
    group_by(thisPoint) %>%
    summarize(pointDistance = mean(point_distance)) %>%
    arrange(as.numeric(thisPoint)) %>% 
    dplyr::select(-thisPoint) %>%
    pull()
  
  return(output)  
}

fire_panel <- fire_panel %>% st_as_sf() %>% st_transform(crs = 3652)

fire_panel <-
  fire_panel %>% 
    mutate(
      schools_nn1 = nn_function(st_c(fire_panel), st_c(schools), 1), 
      schools_nn2 = nn_function(st_c(fire_panel), st_c(schools), 2), 
      schools_nn3 = nn_function(st_c(fire_panel), st_c(schools), 3), 
      schools_nn4 = nn_function(st_c(fire_panel), st_c(schools), 4), 
      schools_nn5 = nn_function(st_c(fire_panel), st_c(schools), 5)) 

fire_panel_nogeo <- fire_panel %>% dplyr::select(-geometry)

#write.csv(fire_panel_nogeo, "~/Desktop/Coding/fire_panel_nogeo.csv")
# Random Forest
fire_panel_rf <- fire_panel

fire_panel_rf[is.na(fire_panel_rf)] <- 0

fire_panel_rf_nogeo <- fire_panel_rf %>% 
  dplyr::select(-address, -incident_number, -quarter, -cat_code, -NAME, -topo, -geometry, -count, -outcome_vacant, -outcome_permit, -outcome_transfer) %>% 
  st_drop_geometry()
## Preparing Prediction Data
all_properties <- left_join(opa_dat_small_sf_fe, panel_Results2Y, by="address") # Adds a few thousand more rows
panel_FirePositives1 <- panel_OPAFireOpenData %>% # Recreating this code to get owner occupancy for all
  dplyr::select(address)%>%
  distinct(address, .keep_all = TRUE)%>%
  left_join(panel_Positives, by="address")%>%
  left_join(dplyr::select(opa_dat_small_sf, address, category_code_description, mailing_street, building_code_description), by="address")%>%
  mutate(condo = ifelse(grepl("CONDO", building_code_description) == TRUE, TRUE, FALSE),
          owner_occ = ifelse(condo == FALSE & category_code_description != "MULTI FAMILY",
                             ifelse(address == mailing_street, TRUE, FALSE),
                             NA))%>%
  dplyr::select(-mailing_street, -condo, -building_code_description)%>%
  st_drop_geometry()
panel_FirePositives1 <- panel_FirePositives1 %>% dplyr::select(address, owner_occ) # Condensing columns

all_properties1 <- left_join(all_properties, panel_FirePositives1, by="address") # Created duplicates for address with no fire
all_properties_occ <- all_properties1[is.na(all_properties1$quarter), ] # Subset of addresses with no fire
all_properties_occ <- all_properties_occ[!duplicated(all_properties_occ$address),] # Removing Duplicates
all_properties_occ <- all_properties_occ %>% dplyr::select(address, owner_occ) # Condensing columns

prediction_data <- left_join(all_properties, all_properties_occ, by="address")

# Getting Geometry
opa_dat_sf <- opa_dat_small_sf %>% dplyr::select(address, geometry)
prediction_data <- left_join(prediction_data, opa_dat_sf, by="address") #488,514

# Time Data 
time <- fire_panel %>% dplyr::select(address, mSinceFire) %>% st_drop_geometry()
prediction_data <- left_join(prediction_data, time, by="address") # 491,328

# Cleaning
prediction_data$incident_number <- as.double(prediction_data$incident_number)

prediction_data <- prediction_data %>% dplyr::select(-quarter)
prediction_data[is.na(prediction_data)] <- 0 # Assigning 0 to NA for everything
prediction_data <-prediction_data %>%
  mutate(vacant = case_when(
    outcome_vacant > 0 ~ 1,
    TRUE ~ 0
  ))

prediction_data <-prediction_data %>%
  mutate(permit = case_when(
    outcome_permit > 0 ~ 1,
    TRUE ~ 0
  ))

prediction_data <-prediction_data %>%
  mutate(transfer = case_when(
    outcome_transfer > 0 ~ 1,
    TRUE ~ 0
  ))

# Market Value
prediction_data <-
  prediction_data %>%
  mutate(mkt_value = case_when(
    market_value < 25000 ~ "Low", # < $250,000
    market_value > 24999 & market_value < 500000 ~ "Medium", # $250,000-$500,000
    TRUE ~ "High")) # $500,000+

# Number of Bedrooms
prediction_data <-
  prediction_data %>%
  mutate(bedrooms = case_when(
    number_of_bedrooms < 4 ~ "Low", # 1-3 Bedrooms
    number_of_bedrooms > 3 & number_of_bedrooms < 8 ~ "Medium", # 4-7 Bedrooms
    TRUE ~ "High")) # 8+

# Number of Bathrooms
prediction_data <-
  prediction_data %>%
  mutate(bathrooms = case_when(
    number_of_bathrooms < 3 ~ "Low", # 1-2 Bathroom
    number_of_bathrooms > 2 & number_of_bathrooms < 6 ~ "Medium", # 3-5 Bathrooms
    TRUE ~ "High")) # $6+

# Livable Area 
prediction_data <-
  prediction_data %>%
  mutate(area = case_when(
    total_livable_area < 5001 ~ "Low", # < 0-5,000 sqft
    total_livable_area > 4999 & total_livable_area < 10000 ~ "Medium", # 5,001-10,000 sqft
    TRUE ~ "High")) # 10,001+ sqft

# Year Built
prediction_data <-
  prediction_data %>%
  mutate(built_year = case_when(
    year_built < 1951 ~ "Up to 1951", 
    year_built > 1959 & year_built < 2000 ~ "1950 to 1999",
    TRUE ~ "2000 to Present")) 

# Interior Condition when fire happened - Fire in 2018 and condition is how it is today. remove?
prediction_data <-
  prediction_data %>%
  mutate(condition = case_when( 
    interior_condition > 0 & interior_condition < 5 ~ "Average", 
    interior_condition == 5 ~ "Below Average",
    interior_condition == 6 ~ "Vacant",
    interior_condition == 7 ~ "Sealed",
    interior_condition == 0 ~ "Unknown")) 

# Sale vs Market
prediction_data <-
  prediction_data %>%
  mutate(sale_value = case_when( 
    sale_price > market_value ~ "Above Market", 
    TRUE ~ "Below Market")) 

# Resident Type
prediction_data <-
  prediction_data %>%
  mutate(resident_type = case_when( 
    owner_occ == "TRUE" ~ "Owner", 
    owner_occ == "FALSE" ~ "Renter", 
    TRUE ~ "Other")) 
pred_geo <- prediction_data
pred_geo <- pred_geo %>% st_as_sf()
pred_geo$X <- st_coordinates(pred_geo)[, 1]
pred_geo$Y <- st_coordinates(pred_geo)[, 2]
pred_geo <- pred_geo %>% st_drop_geometry()
pred_geo <- st_as_sf(pred_geo, coords = c("Y", "X"), crs = "EPSG:4326")
pred_geo <- pred_geo %>% dplyr::select(address, geometry)

p <- prediction_data # BACK UP

prediction_data <- prediction_data %>% st_as_sf()
prediction_data$X <- st_coordinates(prediction_data)[, 1]
prediction_data$Y <- st_coordinates(prediction_data)[, 2]
prediction_data <- prediction_data %>% st_drop_geometry()
prediction_data <- st_as_sf(prediction_data, coords = c("Y", "X"), crs = "EPSG:4326")
# Neighborhood
prediction_data <- prediction_data %>% st_transform(st_crs(neighborhoods))
prediction_data <- st_join(prediction_data, neighborhoods) # Join
prediction_data <- prediction_data %>% rename(neighborhood = mapname) # Renaming

# RCA - 497,728
#prediction_data <- st_join(prediction_data, rca) # Join
#prediction_data <-prediction_data %>%
#  mutate(redevelopment_area = case_when(
#    is.na(prediction_data$NAME) ~ 0, # Is not in a redevelopment certified area
#    TRUE ~ 1 # Is in a redevelopment certified area
#  ))
#prediction_data <- prediction_data %>% dplyr::select(-NAME)

# Demographics
prediction_data <- st_join(prediction_data, acs_features_full) #491328

prediction_data <- # Poverty
  prediction_data %>%
  mutate(poverty = case_when( 
    income < 25000 ~ "Yes", 
    TRUE ~ "No")) 

prediction_data <- # Income
  prediction_data %>%
  mutate(income_level = case_when(
    income < 40000 ~ "Low", 
    TRUE ~ "High")) 

# Price per square foot
prediction_data <-
  prediction_data %>%
  mutate(value_sqft = market_value/total_livable_area) 

prediction_data$value_sqft[!is.finite(prediction_data$value_sqft)] <- 0

# Schools NN
prediction_data <- prediction_data %>% st_transform(crs = 3652)

prediction_data <-
  prediction_data %>% 
    mutate(
      schools_nn1 = nn_function(st_c(prediction_data), st_c(schools), 1), 
      schools_nn2 = nn_function(st_c(prediction_data), st_c(schools), 2), 
      schools_nn3 = nn_function(st_c(prediction_data), st_c(schools), 3), 
      schools_nn4 = nn_function(st_c(prediction_data), st_c(schools), 4), 
      schools_nn5 = nn_function(st_c(prediction_data), st_c(schools), 5))
# Neighborhood
prediction_data$neighborhood <- ifelse(is.na(prediction_data$neighborhood ), "Not Applicable", prediction_data$neighborhood )

prediction_data <- prediction_data %>% dplyr::select(-NAME)

# Numeric
prediction_data[is.na(prediction_data)] <- 0 # Assigning 0 to NA for everything

# Set a seed for reproducibility
set.seed(123)

# Split for training/test test
rf_split_permit <- rsample::initial_split(fire_panel_rf_nogeo, strata = "permit", prop = 0.70)
rf_train_permit <- rsample::training(rf_split_permit)
rf_test_permit  <- rsample::testing(rf_split_permit)

# Creating cross validation
cv_splits_geo_permit <- rsample::group_vfold_cv(rf_train_permit, group = "neighborhood")

# Feature Creation 
model_rec_permit <- recipe(permit ~ severity_index + vacant + transfer + resident_type + mSinceFire + value_sqft  + air_central + 
                             built_year + poverty + condition + white + black + AIAN + asian + NHPI + other + multi + income_level 
                           + age + neighborhood, data = rf_train_permit) %>%
  update_role(neighborhood, new_role = "neighborhood") %>%
  step_dummy(all_nominal(), -vacant, -transfer, -resident_type, -air_central, -income_level, -neighborhood) %>% 
  step_zv(all_predictors(), -vacant, -transfer, -resident_type, -air_central, -income_level, -neighborhood) %>%
  step_center(all_predictors(),  -vacant, -transfer, -resident_type, -air_central, -income_level, -neighborhood) %>% 
  step_scale(all_predictors(),  -vacant, -transfer, -resident_type, -air_central, -income_level, -neighborhood) 


# See the data after all transformations
glimpse(model_rec_permit %>% prep() %>% juice())

# Model Specifications
rf_plan_permit <- parsnip::rand_forest() %>%
  parsnip::set_args(mtry  = tune()) %>%
  parsnip::set_args(min_n = tune()) %>%
  parsnip::set_args(trees = 100) %>% 
  parsnip::set_engine("ranger", importance = "impurity") %>% 
  parsnip::set_mode("regression")

# Hyperparameter 
rf_grid_permit <- expand.grid(mtry = c(2,5), 
                       min_n = c(1,5))
# Workflow Creation
rf_wf_permit <-
  workflow() %>% 
  add_recipe(model_rec_permit) %>% 
  add_model(rf_plan_permit)

# Fit Model To Workflow and Calculate Metrics
control_permit <- control_resamples(save_pred = TRUE, verbose = TRUE)
metrics_permit <- metric_set(rmse, mape, smape)

rf_tuned_permit <- rf_wf_permit %>%
  tune::tune_grid(.,
                  resamples = cv_splits_geo_permit,
                  grid      = rf_grid_permit,
                  control   = control_permit,
                  metrics   = metric_set(rmse))

show_best(rf_tuned_permit, metric = "rmse", n = 15, maximize = FALSE)
rf_best_params_permit     <- select_best(rf_tuned_permit, metric = "rmse", maximize = FALSE)

# Final Workflow
rf_best_wf_permit     <- finalize_workflow(rf_wf_permit, rf_best_params_permit)

rf_val_fit_geo_permit <- rf_best_wf_permit %>% 
  last_fit(split     = rf_split_permit,
           control   = control_permit,
           metrics   = metric_set(rmse)) 

rf_best_OOF_preds_permit <- collect_predictions(rf_tuned_permit) %>% 
  filter(mtry  == rf_best_params_permit$mtry[1] & min_n == rf_best_params_permit$min_n[1])

rf_val_pred_geo_permit     <- collect_predictions(rf_val_fit_geo_permit)

rf_metrics_permit<- collect_metrics(rf_val_fit_geo_permit)

# Aggregate OOF predictions 
OOF_preds_permit <- data.frame(dplyr::select(rf_best_OOF_preds_permit, .pred, permit), model = "rf") %>% 
  mutate(RMSE = yardstick::rmse_vec(permit, .pred),
         MAE  = yardstick::mae_vec(permit, .pred),
         MAPE = yardstick::mape_vec(permit, .pred)) %>% 
  mutate(model = factor(model, levels=c("rf"))) 

## Fit and Extract final model
## Final fit on all data, then extract model
full_fit_rf_permit     <- rf_best_wf_permit %>% fit(fire_panel_rf_nogeo)

predict(full_fit_rf_permit, new_data = fire_panel_rf_nogeo[3,]) %>% 
  mutate(.pred_original = exp(.pred))

# extract final model object
rf_full_mod_permit     <- full_fit_rf_permit  $fit$fit$fit
# Set a seed for reproducibility
set.seed(123)

# Split for training/test test
rf_split_transfer <- rsample::initial_split(fire_panel_rf_nogeo, strata = "transfer", prop = 0.70)
rf_train_transfer <- rsample::training(rf_split_transfer)
rf_test_transfer  <- rsample::testing(rf_split_transfer)

# Creating cross validation
cv_splits_geo_transfer <- rsample::group_vfold_cv(rf_train_transfer, group = "neighborhood")

# Feature Creation
model_rec_transfer <- recipe(transfer ~ severity_index + taxable_building + resident_type + mSinceFire + income + air_central + 
                               sale_value + area + neighborhood, data = rf_train_transfer) %>%
  update_role(neighborhood, new_role = "neighborhood") %>%
  step_dummy(all_nominal(), -transfer, -neighborhood, -resident_type, -air_central) %>% 
  step_zv(all_predictors(), -transfer, -neighborhood, -resident_type, -air_central) %>%
  step_center(all_predictors(), -transfer, -neighborhood, -resident_type, -air_central) %>% 
  step_scale(all_predictors(), -transfer, -neighborhood, -resident_type, -air_central) 


# See the data after all transformations
glimpse(model_rec_transfer %>% prep() %>% juice()) 

# Model Specifications
rf_plan_transfer <- parsnip::rand_forest() %>%
  parsnip::set_args(mtry  = tune()) %>%
  parsnip::set_args(min_n = tune()) %>%
  parsnip::set_args(trees = 100) %>% 
  parsnip::set_engine("ranger", importance = "impurity") %>% 
  parsnip::set_mode("regression")

# Hyperparameter
rf_grid_transfer <- expand.grid(mtry = c(2,5), 
                       min_n = c(1,5))
# Workflow Creation
rf_wf_transfer <-
  workflow() %>% 
  add_recipe(model_rec_transfer) %>% 
  add_model(rf_plan_transfer)

# Fit Model To Workflow and Calculate Metrics
control_transfer <- control_resamples(save_pred = TRUE, verbose = TRUE)
metrics_transfer <- metric_set(rmse, mape, smape)

rf_tuned_transfer <- rf_wf_transfer %>% 
  tune::tune_grid(.,
                  resamples = cv_splits_geo_transfer,
                  grid      = rf_grid_transfer,
                  control   = control_transfer,
                  metrics   = metric_set(rmse)) 

show_best(rf_tuned_transfer, metric = "rmse", n = 15, maximize = FALSE)
rf_best_params_transfer     <- select_best(rf_tuned_transfer, metric = "rmse", maximize = FALSE)

# Final Workflow
rf_best_wf_transfer     <- finalize_workflow(rf_wf_transfer, rf_best_params_transfer)

rf_val_fit_geo_transfer <- rf_best_wf_transfer %>% 
  last_fit(split     = rf_split_transfer,
           control   = control_transfer,
           metrics   = metric_set(rmse)) 

rf_best_OOF_preds_transfer <- collect_predictions(rf_tuned_transfer) %>% 
  filter(mtry  == rf_best_params_transfer$mtry[1] & min_n == rf_best_params_transfer$min_n[1])

rf_val_pred_geo_transfer    <- collect_predictions(rf_val_fit_geo_transfer)

rf_metrics_transfer <- collect_metrics(rf_val_fit_geo_transfer)

# Aggregate OOF predictions
OOF_preds_transfer <- data.frame(dplyr::select(rf_best_OOF_preds_transfer, .pred, transfer), model = "rf") %>% 
  mutate(RMSE = yardstick::rmse_vec(transfer, .pred),
         MAE  = yardstick::mae_vec(transfer, .pred),
         MAPE = yardstick::mape_vec(transfer, .pred)) %>% 
  mutate(model = factor(model, levels=c("rf")))

## Fit and Extract final model
## Final fit on all data, then extract model
full_fit_rf_transfer    <- rf_best_wf_transfer %>% fit(fire_panel_rf_nogeo)

predict(full_fit_rf_transfer, new_data = fire_panel_rf_nogeo[3,]) %>% 
  mutate(.pred_original = exp(.pred))

# extract final model object
rf_full_mod_transfer     <- full_fit_rf_transfer  $fit$fit$fit
rf_results_model <- c("Permit", "Transfer")
rf_results_accuracy <- c(0.2740, 0.3231)
rf_results_mae <- c(.3059, 0.365959)

rf_results <- data.frame(Model = rf_results_model, Accuracy = rf_results_accuracy, MAE = rf_results_mae)

rf_results %>% 
  kable(caption = "Random Forest Modeling Results") %>%
  kable_styling("striped",full_width = F)
glm_split_vacant <- initial_split(st_drop_geometry(fire_panel_nogeo), strata = "vacant", prop = 0.70) 
glm_train_vacant <- training(glm_split_vacant)
glm_test_vacant <- testing(glm_split_vacant)

# Model 1 - Kitchen Sink Model
#model_vacant_1 <- glm(vacant ~ severity_index + quarter + total_livable_area + market_value + sale_price + number_of_bedrooms + #number_of_bathrooms + number_stories + interior_condition  + year_built  + exterior_condition + fireplaces + taxable_building +  #cat_code + owner_occ + mSinceFire + ySinceFire + income + age + white + black + AIAN + asian + NHPI + other + multi + hispanic + #redevelopment_area,
#                  family="binomial" (link="logit"), data = glm_train_vacant)

# Model 2 - Feature Engineered
#model_vacant_2 <- glm(vacant ~ grade + fuel_type + topo + electric_heater + air_central + fireplace + mkt_value + bedrooms + #bathrooms + area + built_year + condition + sale_value + resident_type + redevelopment_area + poverty + income_level + value_sqft #+ schools_nn1 + schools_nn2 + schools_nn3 + schools_nn4 + schools_nn5,
#                  family="binomial" (link="logit"), data = glm_train_vacant)

# Model 3
#model_vacant_3 <- glm(vacant ~ severity_index + area + mkt_value + resident_type + income + asian  
#                      + other + multi + mSinceFire,
#                  family="binomial" (link="logit"), data = glm_train_vacant)


# Model 4
#model_vacant_4 <- glm(vacant ~ severity_index + area + mkt_value + resident_type + income + asian  
#                      + other + multi + mSinceFire + schools_nn1 + schools_nn2 + schools_nn3 + schools_nn4 + schools_nn5,
#                  family="binomial" (link="logit"), data = glm_train_vacant)

# Model 5 - Best Model
model_vacant_5 <- glm(vacant ~ severity_index + area + transfer + permit + mkt_value + resident_type + income + asian  
                      + other + multi + mSinceFire + schools_nn1 + schools_nn2 + schools_nn3 + schools_nn4 + schools_nn5,
                  family="binomial" (link="logit"), data = glm_train_vacant)
summary(model_vacant_5)

# Determining Probabilities
classProbs_vacant <- predict(model_vacant_5, glm_train_vacant, type="response")
#hist(classProbs_vacant)
testProbs_vacant <- data.frame(Outcome = as.factor(glm_test_vacant$vacant),
                        Probs = predict(model_vacant_5, glm_test_vacant, type= "response"))
glm_split_permit <- initial_split(fire_panel_nogeo, strata = "permit", prop = 0.70) 
glm_train_permit <- training(glm_split_permit)
glm_test_permit <- testing(glm_split_permit)

# Model 1 - Kitchen Sink Model
#model_permit_1 <- glm(permit ~ severity_index + quarter + total_livable_area + market_value + sale_price + number_of_bedrooms + 
#                        number_of_bathrooms + number_stories + interior_condition  + year_built  + exterior_condition + fireplaces
#                      + taxable_building +  cat_code + owner_occ + mSinceFire + ySinceFire + income + age + white + black + AIAN +
#                        asian + NHPI + other + multi + hispanic + redevelopment_area,
#                        family="binomial" (link="logit"), data = glm_train_permit)

# Model 2 - Feature Engineered
#model_permit_2 <- glm(permit ~ grade + fuel_type  + electric_heater + air_central + fireplace + mkt_value + bedrooms + bathrooms +
#                        area + built_year + condition + sale_value + resident_type + redevelopment_area + poverty + income_level +
#                        value_sqft + schools_nn1 + schools_nn2 + schools_nn3 + schools_nn4 + schools_nn5,
#                         family="binomial" (link="logit"), data = glm_train_permit)

# Model 3 - Best Model
model_permit_3 <- glm(permit ~ severity_index + vacant + transfer + resident_type + mSinceFire + value_sqft  + air_central + built_year + poverty + 
                        condition + white + black + AIAN + asian + NHPI + other + multi + income_level + age,
                          family="binomial" (link="logit"), data = glm_train_permit) 

# Model 4
#model_permit_4 <- glm(permit ~ severity_index + resident_type + mSinceFire + value_sqft  + air_central + built_year + poverty + 
#                        other + income_level + age,
#                      family="binomial" (link="logit"), data = glm_train_permit) 

# Model 5
#model_permit_5 <- glm(permit ~ severity_index + transfer + resident_type + mSinceFire + value_sqft  + air_central + built_year + poverty + 
#                        condition + white + asian + other + income_level,
#                          family="binomial" (link="logit"), data = glm_train_permit) 
summary(model_permit_3)

# Determining Probabilities
classProbs_permit <- predict(model_permit_3, glm_train_permit, type="response")
#hist(classProbs_permit)
testProbs_permit <- data.frame(Outcome = as.factor(glm_test_permit$permit),
                        Probs = predict(model_permit_3, glm_test_permit, type= "response"))
glm_split_transfer <- initial_split(st_drop_geometry(fire_panel_nogeo), strata = "transfer", prop = 0.70) 
glm_train_transfer <- training(glm_split_transfer)
glm_test_transfer <- testing(glm_split_transfer)

# Model 1 - Kitchen Sink
#model_transfer_1 <- glm(transfer ~ severity_index + total_livable_area + market_value + sale_price + number_of_bedrooms 
#                        + number_of_bathrooms + number_stories + interior_condition  + year_built  + exterior_condition + 
#                          fireplaces + taxable_building +  cat_code + owner_occ + mSinceFire + ySinceFire + income + age + white +
#                          black + AIAN +  asian + NHPI + other + multi + hispanic + redevelopment_area,
#                  family="binomial" (link="logit"), data = glm_train_transfer) 

# Model 2 - Feature Engineered
#model_transfer_2 <- glm(transfer ~ grade + fuel_type  + electric_heater + air_central + fireplace + mkt_value + bedrooms + 
#                          bathrooms + area + built_year + condition + sale_value + resident_type + redevelopment_area + poverty +
#                          income_level + value_sqft + schools_nn1 + schools_nn2 + schools_nn3 + schools_nn4 + schools_nn5,
#                  family="binomial" (link="logit"), data = glm_train_transfer) 

# Model 3 - Best Model
model_transfer_3 <- glm(transfer ~ severity_index + taxable_building + resident_type + mSinceFire + income + air_central + sale_value + area,
                  family="binomial" (link="logit"), data = glm_train_transfer) 
summary(model_transfer_3)

# Determining Probabilities
classProbs_transfer <- predict(model_transfer_3, glm_train_transfer, type="response")
#hist(classProbs_transfer)
testProbs_transfer <- data.frame(Outcome = as.factor(glm_test_transfer$transfer),
                        Probs = predict(model_transfer_3, glm_test_transfer, type= "response"))
proc_vacant <- pROC::auc(testProbs_vacant$Outcome, testProbs_vacant$Probs) %>% as.data.frame() # Vacant
proc_permit <- pROC::auc(testProbs_permit$Outcome, testProbs_permit$Probs) %>% as.data.frame() # Permit
proc_transfer <- pROC::auc(testProbs_transfer$Outcome, testProbs_transfer$Probs) %>% as.data.frame() # Transfer

proc <- rbind(proc_vacant, proc_permit, proc_transfer) # Creating a dataframe
proc$outcome <- c("Vacant", "Permit", "Transfer")
colnames(proc)[colnames(proc) == "."] <- "auc"

proc %>% 
  group_by(outcome) %>% 
  kable(caption = "AUC by Outcome") %>%
  kable_styling("striped",full_width = F)
grid.arrange(ncol = 1,
ggplot(testProbs_vacant, aes(d = as.numeric(testProbs_vacant$Outcome), m = Probs)) +
  geom_roc(n.cuts = 50, labels = FALSE, colour = "#b9cfcf") +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') +
  labs(title = "ROC Curve - Vacant"),

ggplot(testProbs_permit, aes(d = as.numeric(testProbs_permit$Outcome), m = Probs)) +
  geom_roc(n.cuts = 50, labels = FALSE, colour = "#b9cfcf") +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') +
  labs(title = "ROC Curve - Permit"),

ggplot(testProbs_transfer, aes(d = as.numeric(testProbs_transfer$Outcome), m = Probs)) +
  geom_roc(n.cuts = 50, labels = FALSE, colour = "#b9cfcf") +
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') +
  labs(title = "ROC Curve - Transfer")
)
# ROC Score
roc_obj_vacant <- roc(testProbs_vacant$Outcome, testProbs_vacant$Probs) # Vacant
roc_obj_permit <- roc(testProbs_permit$Outcome, testProbs_permit$Probs)  # Permit
roc_obj_transfer <- roc(testProbs_transfer$Outcome, testProbs_transfer$Probs) # Transfer

# Finding Optimal Threshold
roc_coords_vacant <- coords(roc_obj_vacant, "best", "threshold") %>% as.data.frame() # Vacant
roc_coords_permit <- coords(roc_obj_permit, "best", "threshold") %>% as.data.frame() # Permit
roc_coords_transfer <- coords(roc_obj_transfer, "best", "threshold") %>% as.data.frame() # Transfer

roc_coords <- rbind(roc_coords_vacant, roc_coords_permit, roc_coords_transfer) # Creating a dataframe
roc_coords$outcome <- c("Vacant", "Permit", "Transfer")
roc_coords <- cbind(roc_coords$outcome, roc_coords[, c("threshold", "sensitivity", "specificity")])
names(roc_coords)[names(roc_coords) == "roc_coords$outcome"] <- "outcome"

roc_coords %>% 
  group_by(outcome) %>% 
  kable(caption = "Optimal Threshold by Outcome") %>%
  kable_styling("striped",full_width = F)
grid.arrange(ncol = 1,
ggplot(testProbs_vacant, aes(Probs)) +
  geom_density(aes(fill=Outcome), alpha=0.5) +
  scale_fill_manual(values = palette2,
                    labels=c("No Vacancy","Vacant")) +
  labs(title = "Distribution of test set predicted probabilities: Vacant",
       x="Vacant",y="Density of probabilities"),

ggplot(testProbs_permit, aes(Probs)) +
  geom_density(aes(fill=Outcome), alpha=0.5) +
  scale_fill_manual(values = palette2,
                    labels=c("No Permit","Permit")) +
  labs(title = "Distribution of test set predicted probabilities: Permit",
       x="Permit",y="Density of probabilities"),

ggplot(testProbs_transfer, aes(Probs)) +
  geom_density(aes(fill=Outcome), alpha=0.5) +
  scale_fill_manual(values = palette2,
                    labels=c("No Transfer","Transferred")) +
  labs(title = "Distribution of test set predicted probabilities: Transfer",
       x="Transfer",y="Density of probabilities") 
            
)
# Refer to the outcomes of Optimal Thresholds section before running this and adjust accordingly

testProbs_vacant$Probs  = ifelse(testProbs_vacant$Probs > .096 ,1,0)
caret::confusionMatrix(reference = as.factor(testProbs_vacant$Outcome), 
                       data = as.factor(testProbs_vacant$Probs), 
                       positive = "1") 

testProbs_permit$Probs  = ifelse(testProbs_permit$Probs > .25,1,0) 
caret::confusionMatrix(reference = as.factor(testProbs_permit$Outcome), 
                       data = as.factor(testProbs_permit$Probs), 
                       positive = "1") 

testProbs_transfer$Probs_label  = factor(ifelse(testProbs_transfer$Probs > .26 ,1,0))
caret::confusionMatrix(reference = testProbs_transfer$Outcome, 
                       data = testProbs_transfer$Probs_label, 
                       positive = "1") 
ctrl <- trainControl(method = "cv", 
                     number = 100, 
                     savePredictions = TRUE)
fire_panel_nogeo_cv <- na.omit(fire_panel_nogeo)

cvFit_vacant <- train(as.factor(vacant) ~ severity_index + area + transfer + permit + mkt_value + resident_type + income + asian
                      + other + multi + mSinceFire + schools_nn1 + schools_nn2 + schools_nn3 + schools_nn4 + schools_nn5,  
               data = fire_panel_nogeo_cv, 
               method="glm", family="binomial",
               trControl = ctrl)

cvFit_permit <- train(as.factor(permit) ~ severity_index + vacant + transfer + resident_type + mSinceFire + value_sqft  + 
                        air_central + built_year + poverty + condition + white + black + AIAN + asian + NHPI + other + multi + 
                        income_level + age,  
               data = fire_panel_nogeo_cv, 
               method="glm", family="binomial",
               trControl = ctrl)

cvFit_transfer <- train(as.factor(transfer) ~ severity_index + taxable_building + resident_type + mSinceFire + income + 
                          redevelopment_area + air_central + sale_value + area,  
               data = fire_panel_nogeo_cv, 
               method="glm", family="binomial",
               trControl = ctrl)

ggplot(as.data.frame(cvFit_vacant$resample), aes(Accuracy)) + 
  geom_histogram() +
  scale_x_continuous(limits = c(0, 1)) +
  labs(x="Accuracy",
       y="Count")+
  plotTheme()

ggplot(as.data.frame(cvFit_permit$resample), aes(Accuracy)) + 
  geom_histogram() +
  scale_x_continuous(limits = c(0, 1)) +
  labs(x="Accuracy",
       y="Count")+
  plotTheme()

ggplot(as.data.frame(cvFit_transfer$resample), aes(Accuracy)) + 
  geom_histogram() +
  scale_x_continuous(limits = c(0, 1)) +
  labs(x="Accuracy",
       y="Count")+
  plotTheme()
# Vacant
allPredictions_vacant <- 
  predict(cvFit_vacant, fire_panel_nogeo_cv, type="prob")[,2] #Getting predictions for every property

fire_panel_nogeo_vacant <- 
  cbind(fire_panel_nogeo_cv,allPredictions_vacant) %>%
  mutate(allPredictions = round(allPredictions_vacant * 100))

fire_panel_nogeo_vacant <- fire_panel_nogeo_vacant %>% filter(allPredictions > 2)

fire_panel_nogeo_vacant <- fire_panel_nogeo_vacant %>%
  mutate(PredClass = ifelse(allPredictions > 9, 1, 0))

fire_panel_nogeo_vacant <- fire_panel_nogeo_vacant %>%
  mutate(Correct = ifelse(PredClass == vacant, "1", "0"),
         Incorrect = ifelse(PredClass != vacant, "1", "0"))

# Data Cleaning
fire_panel_nogeo_vacant <- fire_panel_nogeo_vacant %>% st_drop_geometry() # Dropping geometry because it's incorrect here
phila_vacant <- left_join(fire_panel_nogeo_vacant, dat_geo, by="address") # Joining back to the original data to get geometry
phila_vacant <- phila_vacant[!duplicated(phila_vacant$incident_number),] # Removing Duplicates
phila_vacant <- phila_vacant %>% mutate(Correct = as.numeric(Correct), # Turning correct/incorrect into numeric from character
                                  Incorrect = as.numeric(Incorrect))

# Probability for every property
phila_vacant_predictions <- phila_vacant %>% dplyr::select(address, vacant, allPredictions_vacant, allPredictions, PredClass, Correct, Incorrect, geometry) %>% st_as_sf()

# For Spatial Cross-Validation 
phila_vacant <- phila_vacant %>% dplyr::select(address, vacant, neighborhood, allPredictions, PredClass, Correct, Incorrect) %>%
  group_by(neighborhood) %>% 
  summarise(meanPrediction = mean(allPredictions),
            accuracy = sum(Correct) / ((sum(Correct) + sum(Incorrect)))) 

phila_vacant <- left_join(phila_vacant, neighborhoods, by=c("neighborhood" = "mapname")) # Get neighborhood geometry

phila_vacant <- phila_vacant %>% filter(accuracy < .99) # Removing anything that is 100% accurate
# Permit
allPredictions_permit <- 
  predict(cvFit_permit, fire_panel_nogeo_cv, type="prob")[,2] # Getting predictions for every property

fire_panel_nogeo_permit <- 
  cbind(fire_panel_nogeo_cv,allPredictions_permit) %>%
  mutate(allPredictions = round(allPredictions_permit * 100))

fire_panel_nogeo_permit <- fire_panel_nogeo_permit %>% filter(allPredictions > 2)

fire_panel_nogeo_permit <- fire_panel_nogeo_permit %>%
  mutate(PredClass = ifelse(allPredictions > 25, 1, 0))

fire_panel_nogeo_permit <- fire_panel_nogeo_permit %>%
  mutate(Correct = ifelse(PredClass == permit, "1", "0"),
         Incorrect = ifelse(PredClass != permit, "1", "0"))

# Data Cleaning
fire_panel_nogeo_permit <- fire_panel_nogeo_permit %>% st_drop_geometry() # Dropping geometry because it's incorrect here
phila_permit <- left_join(fire_panel_nogeo_permit, dat_geo, by="address") # Joining back to the original data to get geometry
phila_permit <- phila_permit[!duplicated(phila_permit$incident_number),] # Removing Duplicates
phila_permit <- phila_permit %>% mutate(Correct = as.numeric(Correct), # Turning correct/incorrect into numeric from character
                                  Incorrect = as.numeric(Incorrect))

# Probability for every property
phila_permit_predictions <- phila_permit %>% dplyr::select(address, permit, allPredictions_permit, allPredictions, PredClass, Correct, Incorrect, geometry) %>% st_as_sf()

# For Spatial Cross-Validation 
phila_permit <- phila_permit %>% dplyr::select(address, permit, neighborhood, allPredictions, PredClass, Correct, Incorrect) %>%
  group_by(neighborhood) %>% 
  summarise(meanPrediction = mean(allPredictions),
            accuracy = sum(Correct) / ((sum(Correct) + sum(Incorrect)))) 

phila_permit <- left_join(phila_permit, neighborhoods, by=c("neighborhood" = "mapname")) # Getting neighborhood geometry

phila_permit <- phila_permit %>% filter(accuracy < .99) # Removing anything that is 100% accurate

# Transfer
allPredictions_transfer <- 
  predict(cvFit_transfer, fire_panel_nogeo_cv, type="prob")[,2] # Getting predictions for every property

fire_panel_nogeo_transfer <- 
  cbind(fire_panel_nogeo_cv,allPredictions_transfer) %>%
  mutate(allPredictions = round(allPredictions_transfer * 100))

fire_panel_nogeo_transfer <- fire_panel_nogeo_transfer %>% filter(allPredictions > 2)

fire_panel_nogeo_transfer <- fire_panel_nogeo_transfer %>%
  mutate(PredClass = ifelse(allPredictions > 26, 1, 0)) # change 10?

fire_panel_nogeo_transfer <- fire_panel_nogeo_transfer %>%
  mutate(Correct = ifelse(PredClass == transfer, "1", "0"),
         Incorrect = ifelse(PredClass != transfer, "1", "0"))

# Data Cleaning
fire_panel_nogeo_transfer <- fire_panel_nogeo_transfer %>% st_drop_geometry() # Dropping geometry because it's incorrect here
phila_transfer <- left_join(fire_panel_nogeo_transfer, dat_geo, by="address") # Joining back to the original data to get geometry
phila_transfer <- phila_transfer[!duplicated(phila_transfer$incident_number),] # Removing Duplicates
phila_transfer <- phila_transfer %>% mutate(Correct = as.numeric(Correct), # Turning correct/incorrect into numeric
                                  Incorrect = as.numeric(Incorrect))

# Probability for every property
phila_transfer_predictions <- phila_transfer %>% dplyr::select(address, transfer, allPredictions_transfer, allPredictions, PredClass, Correct, Incorrect, geometry) %>% st_as_sf()

# For Spatial Cross-Validation 
phila_transfer <- phila_transfer %>% dplyr::select(address, transfer, neighborhood, allPredictions, PredClass, Correct, Incorrect) %>%
  group_by(neighborhood) %>% 
  summarise(meanPrediction = mean(allPredictions),
            accuracy = sum(Correct) / ((sum(Correct) + sum(Incorrect)))) 

phila_transfer <- left_join(phila_transfer, neighborhoods, by=c("neighborhood" = "mapname")) # Get neighborhood geometry

phila_transfer <- phila_transfer %>% filter(accuracy < .99) # Removing anything that is 100% accurate
grid.arrange(ncol = 3,
ggplot() +
  geom_sf(data = neighborhoods)+
  geom_sf(data = phila_vacant,
          aes(fill=accuracy, geometry = geometry), 
          colour=NA) +
  scale_fill_viridis(option="rocket", trans = "reverse") +
  labs(title = "Accuracy of Post-Fire Outcome",
       subtitle = "Vacant",
       caption = "(Average by Neighborhood)") +
  mapTheme(),

ggplot() +
  geom_sf(data = neighborhoods)+
  geom_sf(data = phila_permit,
          aes(fill=accuracy, geometry = geometry), 
          colour=NA) +
  scale_fill_viridis(option="rocket", trans = "reverse") +
  labs(subtitle = "Permit") +
  mapTheme(),

ggplot() +
  geom_sf(data = neighborhoods)+
  geom_sf(data = phila_transfer,
          aes(fill=accuracy, geometry = geometry), 
          colour=NA) +
  scale_fill_viridis(option="rocket", trans = "reverse") +
  labs(subtitle = "Transfer") +
  mapTheme() 
)
# Refer to the outcomes of Optimal Thresholds section before running this and adjust accordingly

# Vacant
phila_vacant_predictions <- phila_vacant_predictions %>%
  mutate(confResult=case_when(allPredictions < 9 & vacant==0 ~ "True_Negative",
                              allPredictions >= 9 & vacant==1 ~ "True_Positive",
                              allPredictions < 9 & vacant==1 ~ "False_Negative",
                              allPredictions >= 9 & vacant==0 ~ "False_Positive"))
# Permit
phila_permit_predictions <- phila_permit_predictions %>%
  mutate(confResult=case_when(allPredictions < 25 & permit==0 ~ "True_Negative",
                              allPredictions >= 25 & permit==1 ~ "True_Positive",
                              allPredictions < 25 & permit==1 ~ "False_Negative",
                              allPredictions >= 25 & permit==0 ~ "False_Positive")) 

# Transfer
phila_transfer_predictions <- phila_transfer_predictions %>%
  mutate(confResult=case_when(allPredictions < 26 & transfer==0 ~ "True_Negative",
                              allPredictions >= 26 & transfer==1 ~ "True_Positive",
                              allPredictions < 26 & transfer==1 ~ "False_Negative",
                              allPredictions >= 26 & transfer==0 ~ "False_Positive"))
CM_Vacant<-ggplot()+
  geom_sf(data = neighborhoods)+
  geom_sf(data = phila_vacant_predictions, aes(color = confResult), size = 0.2)+
  scale_color_manual(values = c("#f1c82b","#e19825", "#d55816", "#7b230b"),
                    name = "Outcome")+
  labs(title="Confusion Metrics: Vacant") +
  mapTheme()
CM_Vacant <- ggplotly(CM_Vacant)

CM_Vacant

CM_Permit <-ggplot()+
  geom_sf(data = neighborhoods)+
  geom_sf(data = phila_permit_predictions, aes(color = confResult), size = 0.2)+
  scale_color_manual(values = c("#f1c82b","#e19825", "#d55816", "#7b230b"),
                    name = "Outcome")+
  labs(title="Confusion Metrics: Permit") +
  mapTheme()
CM_Permit <- ggplotly(CM_Permit)
CM_Permit

CM_Transfers<- ggplot()+
  geom_sf(data = neighborhoods)+
  geom_sf(data = phila_transfer_predictions, aes(color = confResult), size = 0.2)+
  scale_color_manual(values = c("#f1c82b","#e19825", "#d55816", "#7b230b"),
                    name = "Outcome")+
  labs(title="Confusion Metrics: Transfer") +
  mapTheme()

CM_Transfers <- ggplotly(CM_Transfers)
CM_Transfers
# Neighborhood Geometry
phila_vacant_predictions <- phila_vacant_predictions %>% st_transform(crs = 3652) # Vacant
phila_vacant_predictions <- st_join(neighborhoods, phila_vacant_predictions) 

phila_permit_predictions <- phila_permit_predictions %>% st_transform(crs = 3652) # Permit
phila_permit_predictions <- st_join(neighborhoods, phila_permit_predictions) 

phila_transfer_predictions <- phila_transfer_predictions %>% st_transform(crs = 3652) # Transfer
phila_transfer_predictions <- st_join(neighborhoods, phila_transfer_predictions) 

# Vacant
phila_vacant_predictions %>% 
  na.omit() %>%
  st_drop_geometry() %>%
  rename(Neighborhood = mapname) %>%
  group_by(Neighborhood) %>% 
  summarize(
    True_Positive = sum(confResult == "True_Positive"),
    True_Negative = sum(confResult == "True_Negative"),
    False_Negative = sum(confResult == "False_Negative"),
    False_Positive = sum(confResult == "False_Positive")) %>%
  arrange(desc(True_Positive)) %>%
  kable(caption = "Count of Confusion Metrics by Neighborhood: Vacant") %>%
  kable_styling("striped",full_width = F) %>%
  row_spec(1:5, background = '#7b230b') %>%
  scroll_box(width = "800px", height = "500px")

# Permit
phila_permit_predictions %>% 
  na.omit() %>%
  st_drop_geometry() %>%
  rename(Neighborhood = mapname) %>%
  group_by(Neighborhood) %>% 
  summarize(
    True_Positive = sum(confResult == "True_Positive"),
    True_Negative = sum(confResult == "True_Negative"),
    False_Negative = sum(confResult == "False_Negative"),
    False_Positive = sum(confResult == "False_Positive")) %>%
  arrange(desc(True_Positive)) %>%
  kable(caption = "Count of Confusion Metrics by Neighborhood: Permit") %>%
  kable_styling("striped",full_width = F) %>%
  row_spec(1:5, background = '#7b230b') %>%
  scroll_box(width = "800px", height = "500px")

# Transfer
phila_transfer_predictions %>% 
  na.omit() %>%
  st_drop_geometry() %>%
  rename(Neighborhood = mapname) %>%
  group_by(Neighborhood) %>% 
  summarize(
    True_Positive = sum(confResult == "True_Positive"),
    True_Negative = sum(confResult == "True_Negative"),
    False_Negative = sum(confResult == "False_Negative"),
    False_Positive = sum(confResult == "False_Positive")) %>%
  arrange(desc(True_Positive)) %>%
  kable(caption = "Count of Confusion Metrics by Neighborhood: Transfer") %>%
  kable_styling("striped",full_width = F) %>%
  row_spec(1:5, background = '#7b230b') %>%
  scroll_box(width = "800px", height = "500px")
# Setting arbitrary values in a new column for severity index
prediction_data_lvl_1 <- prediction_data
prediction_data_lvl_1$severity_index <- 1

prediction_data_lvl_2 <- prediction_data
prediction_data_lvl_2$severity_index <- 2

prediction_data_lvl_3 <- prediction_data
prediction_data_lvl_3$severity_index <- 3

prediction_data_lvl_4 <- prediction_data
prediction_data_lvl_4$severity_index <- 4

prediction_data_lvl_5 <- prediction_data
prediction_data_lvl_5$severity_index <- 5

# Just neighborhoods
prediction_data_small <- prediction_data %>% dplyr::select(address, neighborhood) %>% st_drop_geometry 
vacant_lvl1 <- predict(model_vacant_5, prediction_data_lvl_1, type="response") %>% as.data.frame() # Predictions
vacant_lvl2 <- predict(model_vacant_5, prediction_data_lvl_2, type="response") %>% as.data.frame() # Predictions
vacant_lvl3 <- predict(model_vacant_5, prediction_data_lvl_3, type="response") %>% as.data.frame() # Predictions
vacant_lvl4 <- predict(model_vacant_5, prediction_data_lvl_4, type="response") %>% as.data.frame() # Predictions
vacant_lvl5 <- predict(model_vacant_5, prediction_data_lvl_5, type="response") %>% as.data.frame() # Predictions

vacant_predictions <- cbind(prediction_data$address, vacant_lvl1, vacant_lvl2, vacant_lvl3, vacant_lvl4, vacant_lvl5) # Joining
colnames(vacant_predictions) <- c("address", "level_one", "level_two", "level_three", "level_four", "level_five") # Renaming 
vacant_predictions <- left_join(vacant_predictions, prediction_data_small, by = "address") # Adding neighborhoods
vacant_predictions <- left_join(vacant_predictions, pred_geo, by = "address") # Adding geometry
vacant_predictions <- vacant_predictions[!duplicated(vacant_predictions$address),] # Drop duplicates
permit_lvl1 <- predict(model_permit_3, prediction_data_lvl_1, type="response") %>% as.data.frame() # Predictions
permit_lvl2 <- predict(model_permit_3, prediction_data_lvl_2, type="response") %>% as.data.frame() # Predictions
permit_lvl3 <- predict(model_permit_3, prediction_data_lvl_3, type="response") %>% as.data.frame() # Predictions
permit_lvl4 <- predict(model_permit_3, prediction_data_lvl_4, type="response") %>% as.data.frame() # Predictions
permit_lvl5 <- predict(model_permit_3, prediction_data_lvl_5, type="response") %>% as.data.frame() # Predictions

permit_predictions <- cbind(prediction_data$address, permit_lvl1, permit_lvl2, permit_lvl3, permit_lvl4, permit_lvl5) # Joining
colnames(permit_predictions) <- c("address", "level_one", "level_two", "level_three", "level_four", "level_five") # Renaming 
permit_predictions <- left_join(permit_predictions, prediction_data_small, by = "address") # Adding neighborhoods
permit_predictions <- left_join(permit_predictions, pred_geo, by = "address") # Adding geometry
permit_predictions <- permit_predictions[!duplicated(permit_predictions$address),] # Drop duplicates
transfer_lvl1 <- predict(model_transfer_3, prediction_data_lvl_1, type="response") %>% as.data.frame() # Predictions
transfer_lvl2 <- predict(model_transfer_3, prediction_data_lvl_2, type="response") %>% as.data.frame() # Predictions
transfer_lvl3 <- predict(model_transfer_3, prediction_data_lvl_3, type="response") %>% as.data.frame() # Predictions
transfer_lvl4 <- predict(model_transfer_3, prediction_data_lvl_4, type="response") %>% as.data.frame() # Predictions
transfer_lvl5 <- predict(model_transfer_3, prediction_data_lvl_5, type="response") %>% as.data.frame() # Predictions

transfer_predictions <- cbind(prediction_data$address, transfer_lvl1, transfer_lvl2, transfer_lvl3, transfer_lvl4, transfer_lvl5) # 
#Joining
colnames(transfer_predictions) <- c("address", "level_one", "level_two", "level_three", "level_four", "level_five") # Renaming 
transfer_predictions <- left_join(transfer_predictions, prediction_data_small, by = "address") # Adding neighborhoods
transfer_predictions <- left_join(transfer_predictions, pred_geo, by = "address") # Adding geometry
transfer_predictions <- transfer_predictions[!duplicated(transfer_predictions$address),] # Drop duplicates
#vacant_predictions <- vacant_predictions %>% st_as_sf()
#permit_predictions <- permit_predictions %>% st_as_sf()
#transfer_predictions <- transfer_predictions %>% st_as_sf()

#st_write(vacant_predictions, "~/Desktop/Coding/PhilaFireData/vacant_predictions.geojson")
#st_write(permit_predictions, "~/Desktop/Coding/PhilaFireData/permit_predictions.geojson")
#st_write(transfer_predictions, "~/Desktop/Coding/PhilaFireData/transfer_predictions.geojson")
#knitr::purl("test2_final.Rmd", output = "test2_final.html")