1. Introduction

1.1 About this Project

The following document presents an analysis of shared, dockless electric scooter systems in several American cities and a web tool for predicting scooter demand in cities that do not currently have shared scooters. We focus on the equity implications of these systems: who currently has access to scooters, and who will have access if we keep following the business-as-usual approach? This document presents an overview of our data and use case, a summary and key takeaways from our analysis, and an appendix with all of the R code used in the project.

This project was produced for the MUSA/Smart Cities Practicum course (MUSA 801) taught by Ken Steif, Michael Fichman, and Matt Harris in the Master of Urban Spatial Analytics and Master of City Planning Programs at the University of Pennsylvania. We are deeply grateful to our instructors for their guidance, feedback, and attention throughout the semester, despite the challenges brought on by the ongoing pandemic. We also thank Michael Schnuerle from the City of Louisville Metro Government and Sharada Strasmore from the DC Department of Transportation for providing data that made our rebalancing analysis possible as well as sharing their insights into and knowledge of the scooter and micromobility planning process. Lastly, we would like to acknowledge our classmates in MUSA and city planning, who not only produced incredible projects of their own this semester, but also provided thoughtful feedback and support throughout our time in the programs.

1.2 Abstract

In the few short years since they first launched, shared, dockless electric scooters have become ubiquitous sights on streets and sidewalks in cities across America. What may have first been seen as novelties or purely recreational vehicles now play critical roles in many people’s daily transportation routines. Despite being relative newcomers to the urban transportation scene, dockless scooters provided over 38 million trips in 2018, more than the number of rides taken on traditional station-based bikeshare systems that year. Yet, despite these vehicles having enmeshed themselves quickly in the urban fabric, access to electric scooters is not spread equitably across cities. While residents in wealthier, predominantly white downtown neighborhoods enjoy easy access to shared scooters, residents in poorer but comparably dense parts of cities outside of downtown are underserved by the systems.

In this study, we use a combination of open and private dockless scooter usage data from six American cities to construct a model for predicting ridership in ten cities that have not had scooter share systems in the past. While our model displayed sizable errors, showing that it requires further calibrating, it also suggests that the business-as-usual approach to introducing scooters into a new market is likely to create inequitable access to the vehicles for residents. While cities such as Louisville, KY have recognized these inequities and instituted distribution requirements to address them, we show through analysis of vehicle rebalancing data that providers do not seem to be complying with these requirements, and stronger enforcement may be necessary. Lastly, we introduce a proof-of-concept web application that allows users to explore the spatial distribution of our model’s predictions for each city and compare them to demographic and socioeconomic variables of interest. We believe that this tool will allow policymakers to anticipate the geography of scooter ridership in their cities and understand - and ultimately plan for - the inequities that may be created by the business-as-usual approach to launching and administering scooter share systems.

Riders using dockless scooters in San Francisco. [Source](https://www.thegazette.com/subject/news/government/iowa-lawmakers-consider-electronic-scooter-rules-as-cedar-rapids-weighs-including-them-in-bike-share-20190206).

Riders using dockless scooters in San Francisco. Source.

1.3 Motivation

Since Bird and Lime launched the first shared, dockless electric scooter services in Santa Monica, California in September 2017, scooters have rapidly spread across American cities, becoming a popular form of urban transportation. As of January 2020, there are 340 scooter share programs operating in 242 municipal areas and campuses across 40 different states (plus Washington, D.C.). In 2018 alone, users took 38.5 million trips on electric scooters, more than the number of trips taken on more familiar, traditional station-based bikeshare systems. While scooter share providers initially entered new municipalities and markets without local officials’ permission or oversight, leading to spikes in scooter-related injuries and complaints of vehicles blocking sidewalks, cities have begun collaborating through coalitions like the Open Mobility Foundation to institute some oversight over these programs. Many municipalities are now working with their scooter providers to ensure that their scooter share programs, among other goals, meet safety standards, distribute vehicles equitably, keep sidewalks clear, and protect rider privacy. Data standards like the Mobility Data Specification (MDS), created by the Los Angeles Department of Transportation, help cities share and monitor scooter ridership data and make sure that providers are complying with their policies.

While these data initiatives help address cities manage more mature scooter share programs, there are no widely adopted models or processes in place that help cities without shared scooters introduce the vehicles into their markets. Further, while some cities like Chicago have issued citations to enforce their requirements for equitable distribution, not all cities have done so, meaning that in some places, these distribution requirements are without teeth. As we see in our analysis of vehicle distribution in Louisville, scooter companies do not necessarily comply with existing distribution requirements. In this project, we use data from 6 different American cities with shared scooters to develop a model that estimates what peak-season demand will be in cities without existing programs. We use this model to build a prototype for a web application intended to help city officials anticipate the geography of scooter ridership in their cities and understand its relationship to the city’s social and economic geography. Our goal is to create a municipal scooter planning toolkit that helps cities interested in launching scooter share systems learn from other municipalities that already have these systems in place. We hope that cities like Philadelphia, Pennsylvania and Madison, Wisconsin, which are considering adopting scooter share programs, will find this toolkit helpful as they work with providers to bring the vehicles to their communities.

1.4 Summary

Using a combination of publicly available and private scooter ridership data from six American cities, we employ machine learning methods to create a model that predicts the total scooter trips that will be taken between July and September in each census tract in 10 cities that do not currently have scooter share programs. Our model uses a total of 24 features encompassing demographic, socioeconomic, and built environment characteristics for the cities to make its predictions. We emphasize that our model predictions reflect both the underlying demand for scooters that may exist in a census tract as well as the impact of the scooter companies’ fleet management and distribution choices. Our model uses existing ridership data to predict how scooter usage would look in a new city if it were to follow the business-as-usual approach.

We then propose an Equity Score, a single metric for describing the equitability of scooter access in a city. This score compares observations and predictions of scooter ridership with various socioeconomic indicators to provide a sense of who a city’s scooter system is or would be serving. Cities could customize this score to their own policy priorities by including different indicators and weighting them accordingly.

Our results show that our model produces reasonable projections for the distribution of ridership in new cities, with most rides occurring in downtown areas and near universities, but that further tuning and additional data are needed to calibrate the raw numbers of scooter rides predicted. Additionally, while the geography of predicted ridership aligns well with our observations in cities that currently have scooters, the statistical distributions for our predictions are not more uniform than they are in reality. This leads to overoptimistic projected Equity Scores for the prediction cities compared to the Equity Scores we observe in the cities that already have shared scooters. As we make improvements to the ridership model in the future, we expect the predicted Equity Scores to align more closely with observed Equity Scores.

2. Data

2.1 Outcome Variable and Unit of Analysis

For our unit of analysis, we use the total number of rides taken between July and September of 2019 in each census tract for each city. We chose this time period partly due to data limitations - Chicago only recently instituted scooter share and does not have a full year of data available - and also because the late summer and early fall represent peak ridership. We chose census tracts as our spatial unit of analysis because they represent the highest level of geographic aggregation in the scooter ridership datasets. While the private Louisville and Washington, D.C. datasets provide coordinates for ride pick-ups and drop-offs, Austin’s publicly available dataset aggregates rides to the pick-up and drop-off census tract to protect rider privacy.

In addition to the level of geographic aggregation, the ridership data provided varying information:

City Geographic Aggregation Time Period Available Temporal Precision Other Info Fleet/Rebalancing Info
Louisville Coordinates Nov. 2018 - Dec. 2019 Actual time Trip ID
Vehicle ID
Battery Level
Operator
Rebalancing
Vehicle Maintenance/Retirement/Entry
Washington, DC Coordinates Actual time Trip ID
Vehicle ID
Trip Distance
Trip Duration
Operator
Austin Census Tract April 2018 - Present 15 minutes Trip ID
Vehicle ID
Trip Distance
Trip Duration
Council District
No
Minneapolis Street May 2019 - Sept. 2019 30 minutes Trip ID
Vehicle ID
Trip Distance
Trip Duration
No
Kansas City Truncated Coordinates June 2019 - Dec. 2019 15 minutes Trip ID
Vehicle ID
Trip Distance
Trip Duration
No
Chicago Census Tract June 2019 - Sept. 2019 Hour Trip ID
Vehicle ID
Trip Distance
Trip Duration
Community Area Name
No

Part of our data wrangling process was transforming the ridership data into the same level of spatial aggregation. Chicago, for instance, was already aggregated at the census tract level, so it did not require any additional aggregation.

Louisville and DC, on the other hand, provided point data. We aggregated this to the census tract level.

2.2 Explanatory Variables

For our model features, we use variables from the US Census Bureau and OpenStreetMap that we believe would reflect both the underlying demand for scooters in a census tract and the likelihood that a provider would make more vehicles available in a tract.

Demographic

  • Total Population
  • Median Age
  • Percentage White Population
  • Percentage Female Population

Socio-economic

  • Household Income
  • Home Values and Rental Prices
  • Commute Modeshare (transit v driving)
  • Commute Distance (30+ minutes)
  • Housing Units and Occupancy Rates
  • Vehicle Ownership
  • Jobs

Built Environment

  • Retail Stores
  • Restaurants
  • Leisure Activities and Tourism Destinations
  • Transportation Infrastructure
  • Offices

Our final model uses 24 features built from these variables as predictors for scooter ridership in a census tract. Our data panel looks like the below:

ORIGINS_CNT TOTPOP TOTHSEUNI MDHHINC MDAGE MEDVALUE MEDRENT PWHITE PTRANS PDRIVE PFEMALE PCOM30PLUS POCCUPIED PVEHAVAI RATIO_RETAIL RATIO_OFFICE RATIO_RESTAURANT RATIO_PUBLIC_TRANSPORT RATIO_LEISURE RATIO_TOURISM RATIO_COLLEGE RATIO_CYCLEWAY RATIO_STREET JOBS_IN_TRACT WORKERS_IN_TRACT
349 2802 80 28490 35.8 47300 703 0.8183440 0.0423620 0.8844673 0.4857245 0.0483450 0.7001634 0.8459564 0.0102041 0.0102041 0.0102041 0.1326531 0.8469388 0.0102041 0.0102041 0.0183206 0.0150866 991 1059
138 2399 80 25673 40.6 48000 710 0.5043768 0.0842956 0.7297921 0.5518966 0.0535211 0.8185841 0.8937644 0.1836735 0.0102041 0.0102041 0.1836735 0.7346939 0.0102041 0.0102041 0.0136005 0.0103508 456 1052
76 4612 150 29733 39.5 75600 804 0.1986123 0.0582278 0.8449367 0.5615785 0.0552426 0.8496132 0.9468354 0.0102041 0.0102041 0.0102041 0.1224490 2.6530612 0.0102041 0.0102041 0.0276389 0.0171585 75 1913
164 1790 100 25435 34.6 50200 708 0.0212291 0.1343931 0.6734104 0.5122905 0.0557905 0.7820372 0.8294798 0.0102041 0.0102041 0.0102041 0.0306122 0.0102041 0.0102041 0.0102041 0.0216312 0.0115308 579 720
56 2724 80 19746 35.6 49800 778 0.1119677 0.1368984 0.7358289 0.5499266 0.0512122 0.8283358 0.8278075 0.0102041 0.0102041 0.0102041 0.0204082 0.3673469 0.0102041 0.0102041 0.0113234 0.0065086 57 1124
56 2152 100 35625 35.7 74500 664 0.0185874 0.0992366 0.8636859 0.5836431 0.0480070 0.7626208 0.8822246 0.0102041 0.0102041 0.0102041 0.0918367 1.0102041 0.0102041 0.0102041 0.0139273 0.0069871 49 951
26 2022 100 20500 38.4 57600 567 0.0558853 0.1664296 0.8065434 0.5158259 0.0518519 0.8058252 0.8449502 0.0102041 0.0102041 0.0102041 0.0102041 0.0102041 0.0102041 0.0102041 0.0057834 0.0041512 31 800
49 2729 80 23533 30.3 57000 726 0.0974716 0.1342952 0.6082131 0.5844632 0.0584891 0.7244656 0.7913430 0.0102041 0.0102041 0.0102041 0.0102041 0.0816327 0.0102041 0.0102041 0.0190098 0.0083222 1290 1030
34 3075 100 38145 40.9 75800 695 0.0344715 0.1090742 0.8157654 0.5479675 0.0518346 0.8011118 0.8900092 0.0102041 0.0102041 0.0102041 0.0102041 0.0102041 0.0102041 0.0102041 0.0199580 0.0096401 64 1482
15 3202 90 31000 34.0 69800 853 0.0062461 0.1282985 0.6451319 0.5908807 0.0432909 0.8966038 0.8917197 0.0102041 0.0102041 0.0102041 0.0102041 0.5816327 0.0102041 0.0102041 0.0145422 0.0136271 1077 1325

3. Exploratory Analysis and Feature Engineering

3.1 Scooter Ridership Data

A map of the 6 cities shows that most rides originate and end in a small number of census tracts.

Scooter Trips by City

Louisville

Washington, DC

Austin

Minneapolis

Inflow data is not available for Minneapolis.

Kansas City

Chicago

A persistent problem in micromobility programs is unbalanced vehicle flow, when riders take more vehicles away from a place than other riders bring in. Which tracts are “gaining” and “losing” vehicles from user activity alone? While, of course, many rides begin and end within the same census tract, we see below that regular user activity leads to unbalanced flows. Without active rebalancing from providers, vehicles would become concentrated in just a few tracts, and user demand in other tracts would go unsatisfied.

The plots on the left show the net inflow/outflow of vehicles for each census tract during the study period. The maps on the right show this rate relative to its total inflow; a tract that gained a net of 10 vehicles while seeing a total inflow of 20 vehicles would have an inflow rate of 0.5.

Net Scooter Flows by City

Louisville

Washington, DC

Austin

Minneapolis

Net flow data is not available for Minneapolis.

Kansas City

Chicago

Some of the data sets include information on ride durations and distances. We don’t investigate those data here, as they were not pertinent to our prediction model, but we do explore them later on when we discuss compliance with distribution requirements.

3.2 Feature Variables

The six cities we’ve chosen for the analysis vary greatly in size and demographic and socioeconomic characteristics. This makes producing a model that predicts raw trip counts a difficult challenge, but it also protects against the possibility of our model overfitting to a certain type of city.

Distributions by City

Demographic

Socio-economic

During the feature engineering process, we experimented with variations of the built environment variables. We tried the variations below:

  • Density: The number of restaurants per square mile in the tract
  • Count: The total number of restaurants in the tract
  • KNN: The distance from the tract centroid to the newest k restaurants (where we experimented with a range of k values)
  • Ratio: The percentage of the city’s restaurants located within that tract

Ultimately, we selected the Ratio versions, because those displayed the greatest correlation with user pickups in each tract. Below, we see the correlation plots for every feature variable in our analysis with the number of pickups in each tract.

Correlation Plots

Demographic

Socio-economic

Built Environment