1. Introduction

1.1 About This Project

This project is part of the Spring 2021 Master of Urban Spatial Analytics Practicum course at the University of Pennsylvania taught by Ken Steif, Michael Fichman, and Matthew Harris. We are incredibly thankful for their their time, knowledge, and generosity during this challenging and fully online semester. Additionally, we want to take this opportunity to thank Daniel Lodise and Dave Maynard from Philadelphia City Councilmember Isaiah Thomas’ office for sharing their perspectives on policy decision-making for the nighttime economy. Additionally, we are incredibly grateful to Kae Anderson from the Fishtown Business Improvement District for fantastic insights about how economic development professionals manage corridors and make decisions. We also want to acknowledge Andrew Renninger and Eugene Chong for their invaluable support accessing the SafeGraph dataset that was integral to this project and sharing general knowledge about the dataset.

Finally we want to acknowledge the support and feedback from our classmates in the MUSA and city planning programs at the University of Pennsylvania. We thank you for your help!

1.1 Abstract

This project studies nightlife mobility patterns in Philadelphia, Pennsylvania and develops a web application for visualizing nightlife activity and predicting future nightlife traffic to commercial corridors. This document provides an overview of our process and key takeaways from our analysis.

While nighttime activity is an important component to Philadelphia’s economy, culture, and identity, there is limited empirical data available to help decision makers govern the nighttime economy. As a result, many of the logistical and policy decisions are largely based on emotional perceptions of the negative externalities of the nighttime economy, such as noise pollution and safety concerns. This project uses the novel SafeGraph dataset to analyze nighttime mobility patterns in Philadelphia and develop a model to predict nighttime trips to commercial corridors. Our model utilizes a variety of factors including retail mix, built environment features, and demographic data. Our findings indicate that changing the corridor retail mix impacts traffic during evening hours. Additionally, our research provides new quantitative insights into the behavior of visitors during nighttime hours in Philadelphia which has important implications for economic development in the city, particularly in the post COVID-19 climate.

All source code can be found throughout the document by pressing the code buttons throughout the report. Additional code we used for setup, data loading, and app development can be found in the appendix at the end of the document.

1.2 Motivation

Broadly defined as economic activity that takes place during the evening hours and centered around food, drink, and entertainment, the nighttime economy is an important component for any vibrant city. Philadelphia, the focus of this study, has a dynamic nightlife economy that features a dynamic arts and culture scene. In 2017, The Philadelphia Cultural Alliance estimated that arts and culture in the city has a total economic impact of over $4 billion in the region. Philadelphia is also home to a renowned selection of restaurants that have an important economic impact on the city. A 2019 report by the Economy League of Greater Philadelphia and the Philadelphia Department of Public Health estimated that food jobs account for 12% of all jobs in the city and food related businesses account for 18% of all firms. These sectors are clearly an important piece of the wider economy with much of their activity and business taking place during the evening hours.

Despite Philadelphia’s nighttime economy being both culturally and economically important to the city, there is relatively limited knowledge of the nighttime economy’s impact at the neighborhood or corridor scale. As many policy and logistical decisions are drawn from perceptions of the nighttime economy’s negative externalities, such as noise pollution and safety concerns, there is a lack of data explaining the positive effects of the nighttime economy at this hyper-local level. Therefore, a key motivation of this project is to better understand how the nighttime economy contributes to commercial corridor traffic, which we understand as a proxy for local economic activity. We hope that this research will help policy makers better weigh the costs and benefits when making decisions governing the nighttime economy.

A separate motivation for this project comes from the unprecedented economic shock of the COVID-19 pandemic beginning in March 2020 and continuing through the lifespan of this project in Spring 2021. As shown in the below Figure 1.2.1, traffic to restaurants, bars, and arts establishments have decreased by about 50% compared to 2018 measurements. While the days of the pandemic appear to wane in Philadelphia, there is still much uncertainty about how these establishments will rebound from this devastating event. As these business types that are integral to the nighttime economy largely depend on gathering large groups of people indoors, it is important for economic development professionals in Philadelphia to have a deep understanding of how the possible benefits of investing in the nighttime economy can benefit the city’s broader economy and help the city rebound from the pandemic.

dat_2018 <- 
  dat %>% 
  mutate(popularity_by_hour = str_remove_all(popularity_by_hour, pattern = "\\[|\\]")) %>% #remove brackets
  unnest(popularity_by_hour) %>% #unnest values
  separate(.,
           popularity_by_hour,
           c("18",
             "19", "20", "21", "22", "23"),
           sep = ",") %>%
  mutate(.,NightVisits2018 = as.numeric(`18`) + as.numeric(`19`) + as.numeric(`20`) + as.numeric(`21`) + as.numeric(`22`) + as.numeric(`23`))

dat_2018 <- dat_2018 %>% 
  dplyr::select(safegraph_place_id, 
                location_name,
                date_range_start, 
                date_range_end, 
                raw_visit_counts,
                raw_visitor_counts, 
                popularity_by_day,
                NightVisits2018) %>%
  rename(raw_visit_counts2018 = raw_visit_counts, 
         raw_visitor_counts2018 = raw_visitor_counts, 
         popularity_by_day2018 = popularity_by_day) %>%
  mutate(month = substring(date_range_start,6,7))

dat_2020 <- 
  data2020 %>% 
  mutate(popularity_by_hour = str_remove_all(popularity_by_hour, pattern = "\\[|\\]")) %>% #remove brackets
  unnest(popularity_by_hour) %>% #unnest values
  separate(.,
           popularity_by_hour,
           c("18",
             "19", "20", "21", "22", "23"),
           sep = ",") %>%
  mutate(.,NightVisits2020 = as.numeric(`18`) + as.numeric(`19`) + as.numeric(`20`) + as.numeric(`21`) + as.numeric(`22`) + as.numeric(`23`))

dat_join <- dat_2020 %>% 
  dplyr::select(safegraph_place_id, 
                location_name,
                date_range_start, 
                date_range_end, 
                raw_visit_counts,
                raw_visitor_counts, 
                popularity_by_day,
                NightVisits2020) %>%
  rename(raw_visit_counts2020 = raw_visit_counts, 
         raw_visitor_counts2020 = raw_visitor_counts, 
         popularity_by_day2020 = popularity_by_day) %>%
  mutate(month = substring(date_range_start,6,7)) %>%
  inner_join(., dat_2018, by = c("safegraph_place_id", "month", "location_name")) %>%
  mutate(Month = case_when(month == "01" ~ "January",
                    month == "02" ~ "February",
                    month == "03" ~ "March",
                    month == "04" ~ "April",
                    month == "05" ~ "May", 
                    month == "06" ~ "Jun",
                    month == "07" ~ "May",
                    month == "08" ~ "May",
                    month == "09" ~ "May",
                    month == "10" ~ "May",
                    month == "11" ~ "May",
                    month == "12" ~ "May"))

dat_join <- dat_join %>% 
  dplyr::select(safegraph_place_id, 
                location_name,
                raw_visit_counts2018,
                raw_visit_counts2020,
                NightVisits2018,
                NightVisits2020,
                popularity_by_day2018,
                popularity_by_day2020,
                month) %>%
  left_join(., phila, by = "safegraph_place_id") %>% 
  st_as_sf() %>%
  st_transform('ESRI:102728') 

dat_citywide <- dat_join %>%
  filter(top_category == "Drinking Places (Alcoholic Beverages)" |
           top_category == "Restaurants and Other Eating Places" |
           top_category == "Promoters of Performing Arts, Sports, and Similar Events" |
           top_category == "Performing Arts Companies") %>%
  group_by(month) %>%
  summarize(Total_Visits2018 = sum(NightVisits2018),
            Total_Visits2020 = sum(NightVisits2020)) %>%
  mutate(Percent_Change = (Total_Visits2020 - Total_Visits2018)/Total_Visits2018*100)

#Separate by commercial use  
dat_citywide2 <- dat_join %>%
    filter(top_category == "Drinking Places (Alcoholic Beverages)" |
             top_category == "Restaurants and Other Eating Places" |
             top_category == "Promoters of Performing Arts, Sports, and Similar Events" |
             top_category == "Performing Arts Companies") %>%
  mutate(category = case_when(top_category == "Drinking Places (Alcoholic Beverages)" ~ "Bars",
           top_category == "Restaurants and Other Eating Places" ~ "Restaurants",
           top_category == "Promoters of Performing Arts, Sports, and Similar Events" ~ "Arts",
           top_category == "Performing Arts Companies" ~ "Arts")) %>%
    group_by(category, month) %>%
    summarize(Total_Visits2018 = sum(NightVisits2018),
              Total_Visits2020 = sum(NightVisits2020)) %>%
    mutate(Percent_Change = (Total_Visits2020 - Total_Visits2018)/Total_Visits2018*100) 

#Generate lineplot
rbind(
  (dat_join %>% 
  group_by(month) %>%
  summarize(Total_Visits2018 = sum(NightVisits2018),
            Total_Visits2020 = sum(NightVisits2020)) %>%
  mutate(Percent_Change = (Total_Visits2020 - Total_Visits2018)/Total_Visits2018*100,
         category = "All")), 
  dat_citywide2) %>%
  ggplot(., aes(x = month, y = Percent_Change, group = category, color = category)) + 
  geom_line(lwd = 1.5) +
  geom_hline(yintercept=0, lwd = 1.5, linetype="dotted")+
  scale_color_manual(values = c("#d3d3d3", "#440154", "#21908C", "#FDE725")) +
  scale_x_discrete(name ="Month")+  
  scale_y_continuous(name ="Percent Change") +
  labs(title = "Philadelphia Commercial Trip % Change, 2018 & 2020",
       subtitle = "Figure 1.2.1") +
  plotTheme()

1.3 Use Case

Our goal is to help business improvement districts (BIDs) and economic development professionals understand nightlife patterns across Philadelphia’s commercial corridors. Not only do we believe this insight will provide valuable and interesting findings that will deepen the understanding of evening traffic patterns, but also that this can help further recovery efforts following the COVID-19 pandemic. While we believe that there are many potential applications for this work, we envisioned the following specific uses for our research, analysis, and web application.

  • Understanding relative popularity of commercial corridors to inform decisions to grant licenses or small business assistance.
  • Understanding origins and demographics of visitors to more effectively target marketing resources.
  • Understanding the effects of changing the nightlife retail mix in a given commercial corridor.

1.4 Summary of Methodology

In this study, we use a combination of public and private datasets to study nighttime trends across commercial corridors in Philadelphia. Our model predicts trips between the hours of 7PM and 12AM at the corridor level. After an iterative process of feature engineering, we built a model that combined features that described retail mix, visitor behaviors, the built environment, and demographic data. Ultimately, we embedded this model into a proof-of-concept web application that allows users to understand the impact of a corridor’s retail mix on the evening trip traffic. While the model has limited accuracy and generalizability indicating that it requires continual calibration, this proof-of-concept web application is an important step in making informed, data-driven policy decisions concerning the nighttime economy.

This document walks through our exploratory data analysis and model-building process in detail and includes relevant code to replicate our analysis. All source code can be found throughout the document by pressing the code buttons throughout the report. Additional code we used for setup, data loading, and app development can be found in the appendix at the end of the document.

2. The Data

2.1 The SafeGraph Dataset

SafeGraph is a novel data source that uses anonymized cell phone GPS data to record trips to commercial points of interest. For this project, we used two datasets from SafeGraph: the Monthly Patterns dataset and the Places dataset. For a given point of interest, the Monthly Patterns dataset tells us detailed information on the monthly traffic patterns, such as the number of trips and visitors, the median distance traveled, the median time spent at the establishment, and the origin census block group. Complementing the Patterns dataset, the Places dataset provides detailed information about the given point of interest, including the business category, the hours open, and other descriptive tags to help categorize the place.

The SafeGraph data are brand new and have numerous unexplored applications and insights. The dataset was essential to this project as it allowed us to see trip patterns with unprecedented detail and far higher accuracy than other mobility datasets. While SafeGraph is undoubtedly the cutting edge of mobility data, we encountered some limitations. Primarily, as SafeGraph is based on GPS data, it is inherently noisy with trips incorrectly attributed to certain points of interest. For example, if someone is waiting for a bus outside of an establishment, this may be incorrectly recorded in the data as a trip to the establishment. In dense urban areas where points of interest may be located on multiple floors of the same building or in close proximity to one another, this may result in inaccurate counts across the dataset.

Separately, one component of our app visualizes how many trips come from a given census block group. Due to privacy concerns, SafeGraph only records census block groups with at least 2 devices and any census block group with less than 5 devices are reported as 4. This limits the resolution with which we can see low-traffic census block groups.

2.2 Other Sources

We also used data from various sources on Open Data Philly to capture features in the built environment such as transit, park space, and building size. The dataset we used are summarized in the table in Section 4.1. While these features are relatively standard to cities, replicating this analysis and model for other cities would require that they have similar sources available.

Finally, we used 2018 American Community Survey 5-Year Estimates to describe demographic data for each corridor.

2.3 Philadelphia Commercial Corridors

As our model predicts nighttime trips at the commercial corridor level, we relied heavily on a shapefile from the City of Philadelphia’s Planning Department which demarcates individual corridors and districts throughout the city. According to the available metadata “locations range from large, regional and specialty destinations to corridors that reflect the evolving economy, culture, and aesthetic traditions of surrounding neighborhoods.” We also use the administrative corridor categories defined in the dataset. a description of these categories is included in the below table.

Corridor Type Gross Leasable Area Store Types
Neighborhood Subcenter 10,000 - 35,000 sq.ft. Convenience store grocery, pharmacy, dry cleaner, deli, etc.
Neighborhood Center 30,000 - 120,000 sq.ft. Supermarket, variety store, bank, pharmacy, post office, etc.
Community Center 100,000 - 500, 000 sq. ft. Discount dept store, home improvement, “big boxes” or equiv., “power center”
Regional Center 300,000 - 900,000+ sq.ft. One or two full-line department stores or equivalent
Superregional Center 500,000 - 2,000,000+ sq.ft. Three or more full-line department stores or equivalent
Specialty Center N/A Specialty goods or services, dining, bars, amusements, arts, etc.

Figure 2.1.1 below shows the spatial distribution of the different types of commercial corridors across the city.

3. Exploratory Data Analysis

Before explaining the model building process, it is important to understand the overall trends in the data. In addition to informing which variables to incorporate into the model, this section highlights the descriptive capabilities of the SafeGraph dataset. While our model uses a broad range of datasets to describe Philadelphia’s socio-economic and built environment characteristics, this section primarily focuses on our analysis of the SafeGraph variables.

Key findings from the analysis include:

  • While we primarily examine restaurants, bars, and arts venues as the key business types that contribute to the nighttime economy, all three business types in Philadelphia generate traffic during all hours of the day.
  • COVID-19 has caused a drastic decrease in business traffic to the businesses we consider nightlife establishments. While the data indicate that arts establishments across Philadelphia were able to recover some of the loss towards the end of 2020, bars and restaurants maintained over a 50% decrease in traffic throughout the end of the year. This could be related to the relatively high risk associated with frequenting these establishment types.
  • Of the total daily traffic frequenting commercial corridors across the city, Regional and Superregional Centers have the highest share of traffic during the workday. Smaller corridors outside the central core of Philadelphia tend to have a higher share of evening and early morning traffic. This suggests that visitors tend to leave Center City outside of working hours.
  • The volume of trips varies by corridor type. Regional and Superregional Centers tend to draw a higher volume of travelers, while the smaller neighborhood corridors draw fewer visitors. Community Centers and Specialty Centers fall somewhere in the middle.
  • People travel further distances to reach the Center City area.

3.1 Philadelphia Nightlife Establishments

First, we are curious to see where Philadelphia nightlife establishments are located across the city. The following maps indicate where businesses that contribute to the city’s nightlife economy are located. The categories include:

  • Bars
  • Restaurants
  • Arts Venues

Throughout the analysis, we pay special attention to bars and restaurants, as these organizations are well distributed throughout the city and apply to a local Philadelphia customer base.

Figure 3.1.1 below shows the spatial patterns of these business types. Bars and restaurants represent the highest number of businesses which are spread across the city. Hotels are mostly clustered in the central district of the city and near the airport in the southwest portion of the city. There are far fewer casinos and arts venues.

dat2 %>%
  filter(top_category == "Drinking Places (Alcoholic Beverages)" |
           top_category == "Restaurants and Other Eating Places" |
           top_category == "Promoters of Performing Arts, Sports, and Similar Events" |
           top_category == "Performing Arts Companies") %>%
  mutate(category = case_when(top_category == "Drinking Places (Alcoholic Beverages)" ~ "Bars",
                              top_category == "Restaurants and Other Eating Places" ~ "Restaurants",
                              top_category == "Promoters of Performing Arts, Sports, and Similar Events" ~ "Arts",
                              top_category == "Performing Arts Companies" ~ "Arts")) %>%
  ggplot() + 
  geom_sf(data = phl_cbg, fill = "grey80", color = "transparent") +
  geom_sf(color = "red", size = .1) +
  labs(title = "Location of Nightlife Establishments",
       subtitle = "Figure 3.1.1") +
  facet_wrap(~category, nrow = 1) +
  mapTheme()

To look at the spatial distribution of business types another way, we review the distribution of businesses with a fishnet grid. The fishnet allows us to visualize clusters of each business type, as clusters are aggregated to the individual cell level. This allows us to more effectively see the density of these business types more effectively than the above Figure 3.1.1. While all three business types are mostly concentrated within Center City, the below Figure 3.1.2 demonstrates how restaurants are most thoroughly distributed throughout Philadelphia.

fishnet <- 
  st_make_grid(phl_boundary, cellsize = 1000, square = FALSE) %>%
  .[phl_boundary] %>% 
  st_sf() %>%
  mutate(uniqueID = rownames(.))

#Filter restaurants
restaurants <- dat2 %>%
  filter(top_category == "Restaurants and Other Eating Places")%>%
  mutate(Legend = "Restaurants")

#aggregate restaurant count by fishnet cell
restaurants_net <-
  dplyr::select(restaurants) %>% 
  mutate(countRestaurants = 1) %>% 
  aggregate(., fishnet, sum) %>%
  mutate(countRestaurants = replace_na(countRestaurants, 0),
         uniqueID = rownames(.),
         cvID = sample(round(nrow(fishnet) / 24), 
                       size=nrow(fishnet), replace = TRUE))

#Bars
bars <- dat2 %>%
  filter(top_category == "Drinking Places (Alcoholic Beverages)") %>% 
  mutate(Legend = "Bars")

#aggregate bars by fishnet cell
bars_net <-
  dplyr::select(bars) %>% 
  mutate(countBars = 1) %>% 
  aggregate(., fishnet, sum) %>%
  mutate(countBars = replace_na(countBars, 0),
         uniqueID = rownames(.),
         cvID = sample(round(nrow(fishnet) / 24), 
                       size=nrow(fishnet), replace = TRUE)) %>%
  mutate(Legend = "Bars")

#Performing arts
performingarts <- dat2 %>%
  filter(top_category == "Promoters of Performing Arts, Sports, and Similar Events" |
           top_category == "Performing Arts Companies") %>%
  mutate(Legend = "Performing Arts")

performingarts_net <-
  dplyr::select(performingarts) %>% 
  mutate(countPerformingarts = 1) %>% 
  aggregate(., fishnet, sum) %>%
  mutate(countPerformingarts = replace_na(countPerformingarts, 0),
         uniqueID = rownames(.),
         cvID = sample(round(nrow(fishnet) / 24), 
                       size=nrow(fishnet), replace = TRUE))

# Combining fishnets into a single dataframe
vars_net <- 
  rbind(restaurants, 
        bars, 
        performingarts) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID, Legend) %>%
  dplyr::summarize(count = n()) %>%
    full_join(fishnet) %>%
    spread(Legend,count, fill=0) %>%
    st_sf() %>%
    dplyr::select(-`<NA>`) %>%
    na.omit() %>%
    ungroup()

vars_net.long <- gather(vars_net, Variable, value, -geometry, -uniqueID)

vars <- unique(vars_net.long$Variable)
mapList <- list()

#Plotting small multiple maps
for(i in vars){
  mapList[[i]] <- 
    ggplot() +
      geom_sf(data = filter(vars_net.long, Variable == i), aes(fill=value), colour=NA) +
      scale_fill_viridis(name="") +
      labs(title=i) +
      mapTheme()}

do.call(grid.arrange, c(mapList, 
                        ncol=3, top="Count of Nightlife Businesses per Fishnet", 
                        bottom = "Figure 3.1.2"))