The project is for a course in the Master of Urban Spatial Analysis at the University of Pennsylvania taught by Professor Ken Steif.

How to use this document This report is divided into two sections. The first explains our use case and results and the second provides source code intended to help others replicate our findings.

1.Introduction

1.1.Abstract

Indego, the bike share company in Philadelphia, is now serving actively for the center of Philadelphia while few of the stations benefit low-income neighborhoods. To make Indego a more sustainable and equitable bike share system, Bicycle Coalition of Greater Philadelphia wishes to expand the bike share stations into low-income communities. The aim of our project is to help them to figure out the best places for building new bike share stations.

In this project, we predict for bike share demand at every location citywide using machine learning algorithms, provide instant cost-benefit analysis based on the prediction results and design a web app visualizing our cost-benefit analysis results to assist with the bike share planning process.

1.2.Motivation

Subsidies are usually needed for the first-year construction of a new bike share station. As shown in the table below, the cost of equipment and installation in the first year is about 3 times more than yearly operating cost.

Station Size 1st Year (Equipment, install, operating) 2nd Year (operating) 3rd Year (operating) 4th Year (operating)
Station with 6 bikes & 11 docks $49,458 $11,718 $12,304 $12,919
Station with 8 bikes & 15 docks $60,913 $15,624 $16,405 $17,225
Station with 10 bikes & 19 docks $72,367 $19,530 $20,507 $21,532
Station with 12 bikes & 23 docks $83,822 $23,436 $24,608 $25,838
Station with 14 bikes & 27 docks $95,277 $27,342 $28,709 $30,145

Source: ashevillenc.gov

A successful bike share system balances the desire to profit with the need to benefit people, so it is meaningful to estimate bike share demand and cost for future bike share siting. In the past, the policy-makers usually predicted or evaluated siting suitability by simply overlaying different criteria, or based on their personal experience. In order to make the prediction more accurate, we build predictive models using machine learning algorithms and train the model on actual data. Then, we predict for the trip count at every location in Philadelphia with the model and design an user-friendly interactive app, which will allow the client to get an immediate sense of predicted bike share demand and subsidies needed.

Methods in Brief

Our project follows 4 steps:

Exploratory Analysis
First, we conduct exploratory analysis to answer several questions relevant to our scenario. After presenting examples of factors related to bike share demand, we visualize the relationship between them and summarize the important factors into 5 categories, which are to be processed in the next step.

Data
Next, we prepare our data to finalize the dataset. In order to build the relationship between our predictors and the dependent variable (trip count), we encode our variables in multiple ways. After that, we select the most influential predictors using machine learning methods and have our final dataset ready for the modeling part.

Modeling
Then, we build the predictive model using 6 different machine learning algorithms. We divide our dataset into a training set, which is to train the model, and a testing set, which is to test the accuracy of the model. To improve the performance of the model, we conduct various methods and algorithms in different models and choose the best model with the lowest error. Then we test the model with different training and testing sets to ensure the generalizability of our model.

Cost-Benefit Analysis
Finally, we analyze the cost side and benefits side based on the prediction for every location in Philadelphia. Besides, we give the interface design visualizing our cost-benefit analysis results.

2. Exploratory Analysis

In this section, we are going to explore several questions related to bike share demands.

Will new bike share stations affect the bike share demands of the old stations nearby?

Before building a new bike share station, the first question we need to ask is: will the new bike share station take away the bike share demand for existing stations? While we do not have the data to truly investigate this question, we do develop some plots to visualize the relationship between trip counts for stations around a newly opened station immediately before and after it opens (code).


After constructing some stations, the trip counts of nearby stations drop, which indicates a part of demands of old stations might be taken away when the new station offers users one more choices. However, for most stations, the nearby stations’ trip count remain the same or even increase a little bit, since the new stations might enlarge the bike share network and make it convenient for people to use surrounding old bike share stations. Therefore, we tend to believe that building a new station will not significantly influence the trip count of its nearby stations. In other words, the cost-benefit balance will not be affected.

What factors are associated with bike share demands?

The next step is to figure out which factors might contribute or reduce bike share demand. If we plot the total trip count of stations in 2016 (as shown in the figure below), we can discover that stations with higher trip count are obviously clustered in the Center City. So we want to explore relevant factors which would capture this spatial structure.

Demographic
The potential users of bike share matters to the demand prediction. So, we investigate the relationship between trip count and demographic variables, one of which is commute pattern. Bike share stations are expected to be used more frequently where more people commute to work by bike. The maps (code) show the percentage of residents cycling to work in each census tract and the percentage of residents commuting by car as a comparison. On the map, residents near the Center City are more likely to commute by bike, where the majority of bike share stations are located at.



Attractions and Repellent
Where do these potential users go? We should definitely consider the attractive destinations those potential users may cycle to. Around these attractions, the trip count is expected to be higher.

For example, Central Business District is regarded as one of the most important attractions. Bike share stations with larger trip count are mainly found to be closer to Center City, the Central Business District of Philadelphia, as demonstrated by the figure below (code).