This project is conducted to fulfill the course requirements for the 8010 Master of Urban Spatial Analytics (MUSA)/Smart Cities Practicum, Spring 2025. It is a collaboration between the University of Pennsylvania and the DC Office of the Deputy Mayor for Education (DME).
We sincerely appreciate Professor Michael Fichman and Professor Matthew Harris, as well as Rebecca Lee and Rory Lawless at DME, for their invaluable guidance, support, and dedication throughout the project.
[Project GitHub Page] - to be added after the cargo period
Four team members collaboratively contributed to this project. To reach out, please find their contact details below:
Jingmiao Fei: MUSA ’25 - LinkedIn
Jia Yue Ong: MUSA ’25 - LinkedIn
Amy Solano: MUSA ’25 - LinkedIn
Neve Zhang: MCP ’25 - LinkedIn
The project is commissioned by the DC Office of the Deputy Mayor for Education (DME), who are responsible for developing and implementing the Mayor’s vision for academic excellence and creating a high-quality education continuum from birth to 24 (from early childhood to K-12 to post-secondary and the workforce). The three major functions of the DME include:
Overseeing a District-wide education strategy
Managing interagency and cross-sector coordination
Providing oversight and/or support for the following education-related agencies including the DC Public Schools (DCPS) and Public Charter School Board (PCSB).
This student yield prediction project aims to enhance the DME’s ability to accurately forecast student enrollment across DCPS schools and boundaries, enabling more efficient resource allocation. This will support DME’s efforts including updates to the Master Facilities Plan, a key document that helps education stakeholders understand the current landscape and emerging needs, guiding decision-making and investments for a comprehensive approach to facility planning over the next 5 to 10 years.
Students’ decisions to enroll in public schools have traditionally been influenced by four key factors: parental demand, neighborhood characteristics, school availability, and policy interventions. As these factors have grown more complex, Washington, DC, has experienced steady growth in school-aged children between 2010 and 2020, an increase in highly educated families driving parental and housing demand, rising real estate prices, and migration patterns shaped by the pursuit of better education in specific wards.
Collectively, since existing year-on-year prediction methods do not adequately account for exogenous factors such as population and housing trends, our project provides a more comprehensive, data-informed tool to support the client in forecasting matriculation numbers. Per the client’s request, we focus specifically on elementary school (K–5) enrollments, with particular attention to kindergarten (grade K), as its enrollment cannot be predicted based on the previous grade. Our deliverables include this markdown and an app-format dashboard, to support the use case as below:
The project supports our client (DME) to make informed decision-making for education-related resource allocation, including personnel and capital investments, at each DCPS elementary school. The client can input housing pipeline and demographic data into a system to generate boundary school-specific matriculation predictions for incoming elementary students in a near-future year.
For the scope of our project and up-to-date data availability, our app forecasts for each school year between SY25-26 and SY29-30.
DC’s public education system consists of two categories of schools: DC Public schools and charter schools. DCPS includes all of the traditional schools in the Districts which have centralized school policies subject to regulation and oversight from the District government. Charter schools, comparatively, operate on innovative and non-traditional school models and policies in exchange of greater accountability for student performance. (Source: DC Pave)
schools_table %>%
kbl() %>%
column_spec(1, bold = T, border_right = T) %>%
kable_styling(bootstrap_options = "striped", font_size = 12)
| School Type | Facilities | Funding | Funding Formula | Regulation | Leadership | Curriculum | Staffing |
|---|---|---|---|---|---|---|---|
| DCPS | Provided by the District | Funded by the District on a per-student basis | By law, DCPS is required to give schools at least 95 percent of their previous year’s budget. A student-based budgeting (SBB) formula,Limited staff-based allocations, Program grants, and Stability funding. | Must follow all District laws and regulations | Centralized decision-making process led by the Chancellor | Chosen by DC Central Office with some input from principals | Teacher positions are allocated using the Comprehensive Staffing Model Formula |
| Charter | Funded with the Charter Facilities Allotment | Funded by the District on a per-student basis | Funded based on their audited, or actual, enrollment Unlike with DCPS, public charter LEAs do not receive one large lump sum of UPSFF funds. | Freedom to create their own policies | Individual Local Education Agency Executive Leaders have autonomy to make decisions | Each individual Local Education Agency chooses their own curriculum | LEA leaders decide how they want to structure staff positions at their schools |
In terms of school choice, public school students in the district have multiple options.
enroll in their in-boundary DCPS school, determined by their home address;
enroll at a DCPS school outside of their geographic school boundary;
enroll in a DCPS citywide or selective school; or
enroll at a public charter school.
In SY2022-23, 28% of students were enrolled at their in-boundary schools, with Ward 3 having the highest percentage of in-boundary attendance (84%), and Wards 5, 7, and 8 having the lowest (16-19%). Notably, 53-58% of students living in Wards 5, 7, and 8 were enrolled in public charter schools.
Overall, 21% of students chose to attend an out-of-boundary school, while only 6% of students from Ward 3 made that choice.
The current business-as-usual approach that Washington D.C. leverages to make enrollment forecasts is the Grade Progression Ratio (https://edscape.dc.gov/node/1607141), which operates on the rationale that enrollments at higher grade levels can be extrapolated from past enrollments at a corresponding lower grade (i.e. K-grade students from the same school facility will become the incoming 5th grade cohort in five years).
To calculate the Grade Promotion Rate (GPR) from Kindergarten to Grade 1 for the 2014-15 school year, one divides the number of students in Grade 1 during the 2014-15 audit by the number of students who were in Kindergarten during the 2013-14 audit. A ratio more than 1 (or greater than 100%) suggests that there are additional students joining, who did not enroll in the previous grade. A ratio less than 1 (or less than 100%) suggests that students may be transferring away from the school, or out of the public school system. It is important to note that GPRs are different than re-enrollment because they don’t necessarily reflect the same students.
However, this approach has several limitations. First, while the approach is more effective in making one-year forecasts, particularly for higher grade levels, it is inapplicable to make long-term forecasts for lower grade enrollments. For instance, while it is possible to deduce 5th grade enrollments in the next five years from current K-grade students, K level enrollments could be forecasted due to a lack of associated lower grade data, resulting in significantly fewer predictions feasible. Likewise, this method does not sufficiently account for external factors dynamically shaping enrollment preferences, such as facility expansion / closure, housing market changes, and socioeconomic trends, but rather assumes enrollment as a static choice once a cohort is enrolled, reinforcing a “business-as-usual” rationale.
Our approach improves the business-as-usual approach by improving on the two critical shortcomings highlighted. By incorporating historic and current enrollment records with comprehensive data sources such as the American Community Survey 5-Year Estimates and DC’s permits and residential mass appraisals, we deploy a regression-based method that enables significantly more predictions across all grades and years. Section 2 “Understanding Key Influences through Exploratory Data Analysis” further unwraps our thinking behind the new mechanism.
elementary <- list('K','1','2','3','4','5')
# Manually removed a record associated with school_id == 247 - this is the only duplicated row
schools_cleaned <- schools %>%
filter(grade_band %in% c("Elementary", "K-8")) %>% # Neve added
group_by(school_year, lea_id, school_id, mar_latitude) %>%
slice_max(order_by = facility_capacity, n = 1, with_ties = FALSE) %>%
ungroup()
elementary_by_year <- students_by_tract_sy1314_to_sy2425 %>%
filter(grade_level %in% elementary) %>%
left_join(schools_cleaned %>%
dplyr::select(-school_name, -school_sector),
by=c("lea_code"="lea_id","school_code"="school_id","school_year"="school_year", "school_main_facility_lat" = "mar_latitude", "school_main_facility_long" = "mar_longitude")) %>%
dplyr::select(# original variables from the secret data
school_year, residence_census_tract, lea_code, school_code, school_name,
school_sector, in_boundary_indicator, school_main_facility_lat, school_main_facility_long,
grade_level, total_students,
# variables added through the merge
dcps_facility_name, facility_sector, grade_band,`ward_(2022)`,
dcps_boundary, facility_capacity, enrollment, utilization, unfilled_seats) %>%
rename(ward_22 = `ward_(2022)`) # tidied up workflow
# Create unique facility ID for each school facility
elementary_by_year <- elementary_by_year %>%
group_by(school_code, grade_band, school_main_facility_lat, school_main_facility_long) %>% # this ensures distinct facilities are selected
mutate(unique_id = paste0("F", sprintf("%06d", cur_group_id()))) %>%
ungroup()
# Create new column variable for matching census data
elementary_by_year <- elementary_by_year %>%
mutate(census_year = case_when(
school_year == 'SY13-14' ~ 2013,
school_year == 'SY14-15' ~ 2014,
school_year == 'SY15-16' ~ 2015,
school_year == 'SY16-17' ~ 2016,
school_year == 'SY17-18' ~ 2017,
school_year == 'SY18-19' ~ 2018,
school_year == 'SY19-20' ~ 2019,
school_year == 'SY20-21' ~ 2020,
school_year == 'SY21-22' ~ 2021,
school_year == 'SY22-23' ~ 2022,
school_year == 'SY23-24' ~ 2023,
school_year == 'SY24-25' ~ 2024,
TRUE ~ NA_real_
))
Plot of the Errors of the Business-As-Usual Approach
ggplot(df5, aes(x = `sum(total_students).x`, y = `sum(total_students).y`)) +
geom_point(alpha = 0.3, color = "#440154") +
geom_abline(linetype = "dashed", color = "#fc8961") +
geom_smooth(method = "lm", color = "black") +
coord_equal() +
scale_x_continuous(
expand = c(0, 0),
limits = c(0, NA),
breaks = seq(0, max(df5$`sum(total_students).x`, na.rm = TRUE), by = 40)
) +
scale_y_continuous(
expand = c(0, 0),
limits = c(0, NA),
breaks = seq(0, max(df5$`sum(total_students).y`, na.rm = TRUE), by = 50)
) +
labs(
title = "Plot of Predicted Against Observed with Best Fit Line",
x = "Observed",
y = "Predicted"
) +
theme_bw()
Summary Table of Errors of the Business-As-Usual Approach
# MAE
df5$error = abs(df5$`sum(total_students).y`-df5$`sum(total_students).x`)
mean(df5$error, na.rm = TRUE)
## [1] 12.2847
# MAPE
df5$APE = abs(df5$`sum(total_students).y`-df5$`sum(total_students).x`)/(df5$`sum(total_students).x`)
mean(df5$APE, na.rm = TRUE)
## [1] 0.4295409
# RMSE
df5$sqerror = (df5$`sum(total_students).y` - df5$`sum(total_students).x`)^2
rmse_5_year <- sqrt(mean(df5$error, na.rm = TRUE))
# Create summary table
progression_error_summary <- data.frame(
Metric = c("Mean Absolute Error", "Mean Absolute Percentage Error", "Root Mean Square Error"),
Value = c(
mean(df5$error, na.rm = TRUE),
mean(df5$APE, na.rm = TRUE),
sqrt(mean(df5$sqerror, na.rm = TRUE))
)
)
print(progression_error_summary)
## Metric Value
## 1 Mean Absolute Error 12.2846975
## 2 Mean Absolute Percentage Error 0.4295409
## 3 Root Mean Square Error 15.4071666
As the team believes that grade-level enrollment changes is not isolated to the facility itself, but associated with a range of exogenous factors and concerns. Some of these factors are demographic variables (i.e., total population), socioeconomic variables (i.e., income), building permits and new residential development. The next section uses exploratory charts and visualizations to demonstrate the following insights:
School facility capacity and utilization patterns vary across years and geographies
Socioeconomic characteristics have been a key underlying concern to school district planning
Housing developments have shaped and will continue to shape elementary school enrollments in DC
We have consulted the following data sources to build up the analysis process:
Open Data DC
Others
Student Data by Census Tract SY13-14 to SY 23-24 (provided by client, internal use only)
School Classification by Year (provided by client, internal use only)
We then combine these variables with the school enrollment to create a panel data for our analysis. Details of our panel data creation can be found in Section 3.
This exploratory analysis encompasses three primary spatial units present in our data: census tracts, elementary school boundaries, and wards. Elementary school boundaries change over time (explained further in respective section). DC is divided into 8 different wards, each having its own demographic and social characteristics, as well as representation in the District.
ggarrange(nrow=1,
ggplot(boundary)+
geom_sf(fill="whitesmoke", color="#440154", lwd=0.1)+
labs(title="Elementary School Boundary")+
theme_void(),
ggplot(wards, aes(label=NAME))+
geom_sf(fill="whitesmoke", color="#440154", lwd=0.1)+
geom_sf_label(fill = "white",
fun.geometry = sf::st_centroid,
size=2.7)+
labs(title="Wards")+
theme_void()+
theme(plot.title = element_text(hjust = 0.5)),
ggplot(ct20)+
geom_sf(fill="whitesmoke", color="#440154", lwd=0.1)+
labs(title="Census Tracts (2020)")+
theme_void()
)
For the last decade (SY13-14 to SY23-24), most elementary schools in DC operate close to full capacity, with utilization rates clustering around 1.0. Nonetheless, a noticeable portion of schools falls below this threshold, indicating under-utilization, while the utilization of a smaller yet significant number exceeds their capacity, reflecting overcrowding. Comparatively, total school capacity and enrollment numbers exhibit multiple peaks, indicating variations in school sizes and enrollment volumes.
School_info$Utilization <- as.numeric(gsub("%", "", School_info$Utilization)) / 100
## Histogram
display_histogram <- function(arg, binwidth){
ggplot(School_info, aes(x = .data[[arg]])) +
geom_histogram(binwidth = binwidth, fill = "#440154", color = "black", alpha = 0.7) +
labs(title = paste("Distribution of", arg),
x = arg,
y = "Count") +
theme_minimal()
}
display_histogram("Facility capacity", 10)
display_histogram("Enrollment", 10)
display_histogram("Utilization", 0.02)
Elementary schools under both DCPS and Public Charter sectors have experienced relatively steady growth in facility capacity over time. Meanwhile, while enrollment increased in both sectors before SY19-20, DCPS schools have faced a more pronounced decline since then, suggesting a greater negative impact from COVID-19 compared to Charter schools.
## time series plot
# Convert 'school_year' to an ordered factor to ensure correct ordering
School_info$`School year` <- factor(School_info$`School year`,
levels = c("SY13-14", "SY14-15", "SY15-16", "SY16-17", "SY17-18", "SY18-19", "SY19-20", "SY20-21", "SY21-22", "SY22-23", "SY23-24", "SY24-25"),
ordered = TRUE)
# filter SY24-25 for they have no data
School_info_filtered <- School_info %>%
filter(`School year` != "SY24-25")
# Calculate the sum of 'capacity' by 'sector' for each year
time_series_plot <- function(arg){
arg_summary <- School_info_filtered %>%
group_by(`School year`, `School sector`) %>%
summarise(total_capacity = sum(.data[[arg]], na.rm = TRUE))
# plot time series chart
ggplot(arg_summary, aes(x = `School year`, y = total_capacity, color = `School sector`, group = `School sector`)) +
geom_line() +
geom_point() +
scale_color_manual(values=c("#440154","#fc8961"))+
labs(title = paste("Total", arg, "by Sector Over Time"),
x = "School Year",
y = paste("Total",arg),
color = "Sector") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
time_series_plot("Facility capacity")
time_series_plot("Enrollment")
However, the upward trend in school capacity is not reflected in utilization rates. In particular, DCPS schools have experienced a steady decline in utilization over the past decade, in stark contrast to the increasing rates seen in charter schools. This divergence may be driven by demographic shifts, evolving education policies, and a growing preference for public charter schools. Importantly, it also highlights the risk of relying solely on lower-grade enrollment trends to forecast future demand at higher grade levels.
time_series_plot("Utilization")
We further enquired the spatial patterns of elementary schools’ average utilization rate (by school boundary and wards) in Washington DC over two five year periods (SY13-18 and SY18-23). Note: We also conducted analysis on average capacity and enrollment rate, but the spatial clustering of those two factors evolved less significantly compared to utilization.
Overall, we observed a decreased average utilization rate across DCPS elementary schools across both school boundaries and wards. The decreases are the most severe in school boundaries in Ward 7 and Ward 8, but the lowest average utilization rate continues to be observed in Ward 5.
Comparatively, changes in average utilization rate across Public Charter elementary schools appear more versatile. For instance, while certain school districts from wards 7 and 8 have seen improved utilization rate of Public Charter elementary schools over the years, the gains are counteracted by decrease in adjacent school boundaries, leading to an overall drop in ward-level utilization rate. Nonetheless, this suggests a shift in the spatial distribution of student enrollment.
# SY13-14 ~ SY17-18 and SY18-19 ~ SY22-23, total capacity by school boundary
school_info_sf <-st_as_sf(as.data.frame(School_info), coords = c("MAR longitude", "MAR latitude"), crs = 4326)
school_boundary_sf <- st_join(school_info_sf, boundary, join = st_within)
## mutate a new variable: year_group, including: SY13-18, SY18-23
school_boundary_sf <- school_boundary_sf %>%
mutate(year_group = case_when(
`School year` %in% c("SY13-14", "SY14-15", "SY15-16", "SY16-17", "SY17-18") ~ "SY13-18",
`School year` %in% c("SY18-19", "SY19-20", "SY20-21", "SY21-22", "SY22-23") ~ "SY18-23",
TRUE ~ NA_character_ # dealing the no match case
))
# Calculate the sum of school capacities for each boundary, grouped by time period and School sector
mean_by_boundary <- school_boundary_sf %>%
group_by(NAME, year_group, `School sector`) %>%
summarise(
mean_capacity = mean(`Facility capacity`, na.rm = TRUE),
mean_enrollment = mean(Enrollment, na.rm = TRUE),
mean_utilization = mean(Utilization, na.rm = TRUE)
) %>%
na.omit() %>%
st_drop_geometry()
# merge the capacity, enrollment and utilization data to the Boundary data
Boundary <- boundary %>%
left_join(mean_by_boundary, by = "NAME")
# filter by School sector
dcps_data <- Boundary %>% filter(`School sector` == "DCPS")
charter_data <- Boundary %>% filter(`School sector` == "Public charter")
# function for visualization
base_map <- function(geodata, argbysector, total_arg){
ggplot() +
geom_sf(data = geodata, fill = "transparent") +
geom_sf(data = argbysector, aes(fill = .data[[total_arg]])) +
#scale_fill_viridis_c(option = "magma", trans = "log", direction = -1) +
scale_fill_gradientn(colours = c("#440154", "white", "#fc8961"), trans="log")+
facet_wrap(~ year_group) +
theme_void() +
labs(fill = total_arg) +
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5))
}
## visualize by ward
wards <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/53/query?outFields=*&where=1%3D1&f=geojson", quiet = TRUE)
# Group school_boundary_info by Ward(2022) and year_group, and calculate the total capacity
ward_summary <- school_boundary_sf %>%
group_by(`Ward (2022)`, year_group, `School sector`) %>%
summarise(
mean_capacity = mean(`Facility capacity`, na.rm = TRUE),
mean_enrollment = mean(Enrollment, na.rm = TRUE),
mean_utilization = mean(Utilization, na.rm = TRUE)
) %>%
na.omit()%>%
st_drop_geometry()
# spatial join
wards <- wards %>%
left_join(ward_summary, by = c("NAME" = "Ward (2022)"))
# filter by School sector
dcps_data_ward <- wards %>% filter(`School sector` == "DCPS")
charter_data_ward <- wards %>% filter(`School sector` == "Public charter")
dcps_plot_uti <- base_map(Boundary, dcps_data, "mean_utilization") + ggtitle("Average Utilization Rate by School Boundary (DCPS)")
charter_plot_uti <- base_map(Boundary, charter_data, "mean_utilization") + ggtitle("Average Utilization Rate by School Boundary (Public Charter)")
print(dcps_plot_uti)
print(charter_plot_uti)
A closer look at enrollment patterns reveals shifting in-boundary and out-of-boundary behaviors within DCPS schools. The following interactive maps illustrate kindergarten enrollment flows to high-utilization DCPS schools (those operating at 90% capacity or higher) in 2014, 2018, and 2022. In these maps, each line connects the estimated centroid of a student’s home census tract to their school’s location. Yellow-to-purple line pairs represent in-boundary enrollment flows, while purple-to-purple lines indicate out-of-boundary enrollment. Line thickness corresponds to the number of students, with thicker lines indicating greater volume. Note: Only home tracts that sent three or more students to a school in a given year are shown to highlight stronger school-tract relationships while protecting individual privacy. Tooltip information excludes names of census tracts and schools.
As the maps reveal, most high-utilization schools primarily serve in-boundary students and are located in neighborhoods with varying income and homeownership levels. However, over time, there has been a growing concentration of these schools in Wards 3, 4, and 6. Concurrently, in-boundary enrollments have strengthened, while out-of-boundary enrollments have declined.
These shifts indicate that high enrollment is increasingly driven by neighborhood-based preferences rather than static choice. As a result, relying on past lower-grade enrollment figures to forecast future demand at higher grade levels may introduce risk—particularly if those higher grades serve broader or different catchment areas, or if the observed in-boundary demand does not persist across the years, violating the progression assumption.
## identify high-utilization schools
near_capacity_2014 <- schools.sf %>%
st_drop_geometry() %>%
filter(utilization > 90 & school_year == "SY14-15" & school_sector == "DCPS") %>%
dplyr::pull(school_id)
near_capacity_2018 <- schools.sf %>%
st_drop_geometry() %>%
filter(utilization > 90 & school_year == "SY18-19" & school_sector == "DCPS") %>%
dplyr::pull(school_id)
near_capacity_2022 <- schools.sf %>%
st_drop_geometry() %>%
filter(utilization > 90 & school_year == "SY22-23" & school_sector == "DCPS") %>%
dplyr::pull(school_id)
elem_centroid <- st_centroid(tracts12_23)
both_centroids_14 <- elem_centroid %>%
rename(residence_census_tract = GEOID,
census_year = year) %>%
left_join(elementary_students, by = c("residence_census_tract", "census_year")) %>%
drop_na(total_students)
both_centroids_14$geom2 = both_centroids_14 %>% st_drop_geometry() %>% st_as_sf(coords=c("school_main_facility_long","school_main_facility_lat")) %>% st_geometry()
both_centroids_14 %>%
filter(
census_year == 2014 & school_code %in% near_capacity_2014 & grade_level == "K" & total_students >= 3
) %>%
group_by(residence_census_tract) %>%
mutate(weight = total_students,
pct_own = round(pct_own * 100, 2),
pct_vacant = round(pct_vacant * 100, 2),
pct_poverty = round(pct_poverty * 100, 2),
boundary_word = if_else(in_boundary_indicator == 0, "No", "Yes"),
tooltip = paste0(total_students, " students",
"<br>", "In Boundary School? ", boundary_word,
"<br><br>", "Total Population: ", total_population,
"<br>", "Housing Units: ", total_housing_units,
"<br>", "Vacancy rate: ", pct_vacant,"%",
"<br>", "Tract homeownership rate: ", pct_own, "%",
"<br>", "Tract median gross rent: ", median_gross_rent,
"<br>", "Median HH Income: ", median_household_income,
"<br>", "Poverty rate: ", pct_poverty, "%")) %>%
mapdeck(token = token, style = mapdeck_style("light")) %>%
add_arc(origin = "geometry",
destination = "geom2",
stroke_width = "weight",
stroke_from = "boundary_word",
stroke_to = "census_year",
tooltip = "tooltip",
auto_highlight = TRUE,
legend= list( stroke_from = TRUE, stroke_to = FALSE ),
legend_options = list(
stroke_from = list( title = "In-boundary School"))) %>%
add_polygon(
data = boundary,
fill_opacity = 0,
stroke_width = 15,
stroke_colour = "#708090",
layer = "polygon_layer") %>%
mapdeck_view(
location = c(-77.0395304337645, 38.892799521270774),
# set the zoom level
zoom = 10,
# set the pitch angle
pitch = 30,
)
both_centroids <- elem_centroid %>%
rename(residence_census_tract = GEOID,
census_year = year) %>%
left_join(elementary_students, by = c("residence_census_tract", "census_year")) %>%
drop_na(total_students)
both_centroids$geom2 = both_centroids %>% st_drop_geometry() %>% st_as_sf(coords=c("school_main_facility_long","school_main_facility_lat")) %>% st_geometry()
both_centroids %>%
filter(
census_year == 2018 & school_code %in% near_capacity_2018 & grade_level == "K" & total_students >= 3
) %>%
group_by(residence_census_tract) %>%
mutate(weight = total_students,
pct_own = round(pct_own * 100, 2),
pct_vacant = round(pct_vacant * 100, 2),
pct_poverty = round(pct_poverty * 100, 2),
boundary_word = if_else(in_boundary_indicator == 0, "No", "Yes"),
tooltip = paste0(total_students, " students",
"<br>", "In Boundary School? ", boundary_word,
"<br><br>", "Total Population: ", total_population,
"<br>", "Housing Units: ", total_housing_units,
"<br>", "Vacancy rate: ", pct_vacant,"%",
"<br>", "Tract homeownership rate: ", pct_own, "%",
"<br>", "Tract median gross rent: ", median_gross_rent,
"<br>", "Median HH Income: ", median_household_income,
"<br>", "Poverty rate: ", pct_poverty, "%")) %>%
mapdeck(token = token, style = mapdeck_style("light")) %>%
add_arc(origin = "geometry",
destination = "geom2",
stroke_width = "weight",
stroke_from = "boundary_word",
stroke_to = "census_year",
tooltip = "tooltip",
auto_highlight = TRUE,
legend= list( stroke_from = TRUE, stroke_to = FALSE ),
legend_options = list(
stroke_from = list( title = "In-boundary School"))) %>%
add_polygon(
data = boundary,
fill_opacity = 0,
stroke_width = 15,
stroke_colour = "#708090",
layer = "polygon_layer") %>%
mapdeck_view(
location = c(-77.0395304337645, 38.892799521270774),
# set the zoom level
zoom = 10,
# set the pitch angle
pitch = 30,
)