The project is for a course in the Master of Urban Spatial Analysis at the University of Pennsylvania taught by Professor Ken Steif.
How to use this document This report is divided into two sections. The first explains our use case and results and the second provides source code intended to help others replicate our findings.
Indego, the bike share company in Philadelphia, is now serving actively for the center of Philadelphia while few of the stations benefit low-income neighborhoods. To make Indego a more sustainable and equitable bike share system, Bicycle Coalition of Greater Philadelphia wishes to expand the bike share stations into low-income communities. The aim of our project is to help them to figure out the best places for building new bike share stations.
In this project, we predict for bike share demand at every location citywide using machine learning algorithms, provide instant cost-benefit analysis based on the prediction results and design a web app visualizing our cost-benefit analysis results to assist with the bike share planning process.
Subsidies are usually needed for the first-year construction of a new bike share station. As shown in the table below, the cost of equipment and installation in the first year is about 3 times more than yearly operating cost.
| Station Size | 1st Year (Equipment, install, operating) | 2nd Year (operating) | 3rd Year (operating) | 4th Year (operating) |
|---|---|---|---|---|
| Station with 6 bikes & 11 docks | $49,458 | $11,718 | $12,304 | $12,919 |
| Station with 8 bikes & 15 docks | $60,913 | $15,624 | $16,405 | $17,225 |
| Station with 10 bikes & 19 docks | $72,367 | $19,530 | $20,507 | $21,532 |
| Station with 12 bikes & 23 docks | $83,822 | $23,436 | $24,608 | $25,838 |
| Station with 14 bikes & 27 docks | $95,277 | $27,342 | $28,709 | $30,145 |
A successful bike share system balances the desire to profit with the need to benefit people, so it is meaningful to estimate bike share demand and cost for future bike share siting. In the past, the policy-makers usually predicted or evaluated siting suitability by simply overlaying different criteria, or based on their personal experience. In order to make the prediction more accurate, we build predictive models using machine learning algorithms and train the model on actual data. Then, we predict for the trip count at every location in Philadelphia with the model and design an user-friendly interactive app, which will allow the client to get an immediate sense of predicted bike share demand and subsidies needed.
Our project follows 4 steps:
Exploratory Analysis
First, we conduct exploratory analysis to answer several questions relevant to our scenario. After presenting examples of factors related to bike share demand, we visualize the relationship between them and summarize the important factors into 5 categories, which are to be processed in the next step.
Data
Next, we prepare our data to finalize the dataset. In order to build the relationship between our predictors and the dependent variable (trip count), we encode our variables in multiple ways. After that, we select the most influential predictors using machine learning methods and have our final dataset ready for the modeling part.
Modeling
Then, we build the predictive model using 6 different machine learning algorithms. We divide our dataset into a training set, which is to train the model, and a testing set, which is to test the accuracy of the model. To improve the performance of the model, we conduct various methods and algorithms in different models and choose the best model with the lowest error. Then we test the model with different training and testing sets to ensure the generalizability of our model.
Cost-Benefit Analysis
Finally, we analyze the cost side and benefits side based on the prediction for every location in Philadelphia. Besides, we give the interface design visualizing our cost-benefit analysis results.
In this section, we are going to explore several questions related to bike share demands.