Introduction

This report provides a detailed workflow of the project on Homestead Tax Exemption entitlement assistance outreach, for the City of Philadelphia Office of Philly Stat 360 and Office of Innovation and Technology. The aim of the project is design an algorithm-driven outreach campaign that can cost effectively identify homeowners who are likely to be eligible for the Homestead Tax Exemption but are not participating in the program. The project aims to allow our clients to understand where these properties are located, potential outreach strategies, and the associated costs and benefits.

These relevant properties who are identified as most likely eligible for the Homestead Exemption but not taking up the program, are also thought to be more likely to be subject to “tangled titles,” or family-rental arrangements that require an affidavit to waive need for a rental license.

Background on Property Tax in Philadelphia

Property tax in Philadelphia is 1.3998% of the property value, as assessed by the Office of Property Assessment,for the 2025 tax year. This is made up of 0.6159% (City of Philadelphia) and 0.7839% (School District) The taxes are due March 31st yearly.

Background on Homestead Exemption

The Homestead Exemption reduces the taxable portion of a homeowner’s property assessment by up to $100,000, saving up to $1,399 on real estate taxes annually. The bill signed aimed to lessen the financial burden of new property assessments on Philadelphia homeowners, whose property values increased by an average of 31% after the city delayed the annual calculations for three years due to the pandemic.

Eligibility for the Homestead Exemption is as follows:

A homeowner is Ineligible if a homeowner is already enrolled in these alternative real estate tax relief/abatement programs: • Longtime Owner Occupants Program (LOOP), an income-based program for homeowners who experience a substantial increase in their property assessment. • 10-year residential tax abatement program, although one can only apply for Homestead Exemption after the abatement is over

Programs that can be used in conjunction with the homestead exemption include • Owner-Occupied Real Estate Tax Payment Agreement (OOPA) • Senior Citizen Real Estate Tax Freeze • Low-Income Real Estate Tax Freeze • Real Estate Tax Installment Plan • Tax Credits for Active-Duty Reserve and National Guard Members

Tangled Titles

An issue of concern that may result in a long-term resident not being able to claim for homestead exemption is tangled titles, which occur when a long-term resident effectively functions as a homeowner but lacks legal ownership of the property. This often happens when a family member who owned the property passes away, and the necessary legal processes to formalize the ownership transfer were never completed, leaving the resident ineligible for the exemption. However, Philadelphia has a conditional Homestead Exemption of three years for such cases while the legal transfer of ownership is resolved.

Significance of Outreach

Currently, no focused or strategic efforts are being carried out by to identify and reach homeowners who is not enrolled in the Homestead Exemption. Through an accurate identification of eligible homeowners, a cost-effective and efficient targeted outreach will be possible, enabling these homeowners to be made aware of and receive support in keeping their home.

Exploratory Data Analysis

Dependent Variable - Homestead Exemption

The primary dataset used is the Property and Assessment History publicly available for download on OpenDataPhilly. Six relevant datasets are merged with this primary dataset with common identifying keys such as the parcel number in order to include useful predictor variables in the model predicting for homeowners most likely eligible but not currently enrolled in the Homestead Exemption.

Every observation in the Property and Assessment History dataset is one property in Philadelphia, with a total of 584,049 properties and 79 features. As this dataset is updated daily, the one used for this project is updated as of 31 January 2025.

There is a column homestead_exemption within this dataset which indicates the taxable portion amount removed from the property assessment of the house. It should be noted that there are 14 properties that had a homestead exemption larger then $100,000, the maximum possible amount, which is suspected to be a clerical error and has been flagged to the PhillyStat360 team. The dependent variable for the model is derived from this feature by creating a binary variable on whether or not the property is currently enrolled in the homestead exemption program, indicated by a non-zero value. There are 246,853 properties with a homestead exemption.

The dataset was filtered to include only properties that are under residential zoning codes. Properties with owners that are organizations and companies, such as owner names containing ‘non profit’ or ‘office’ or ‘church’, were also removed. This is to align with the eligibility criteria of the Homestead Exemption program, which should be for individual homeowners who live in their own property.

Homestead Rate by Census Tract

First, we take a look at the distribution of homestead exemption rates by census tract.

Quantitative Distribution

The distribution of homestead exemption rates across census tracts is approximately normal, with most tracts clustered between 30% and 50%. A notable decline occurs below the 30% threshold, and very few tracts fall below 20%. Tracts with exemption rates under 30% may indicate low participation. If these areas are primarily residential, such figures may warrant targeted outreach or further investigation into potential barriers to enrollment.

Spatial Distribution

Spatially, homestead exemption rates tend to be lower in the central areas of the city compared to the outskirts. This pattern likely reflects a higher proportion of rental or non-residential properties in the urban core, which are less eligible for exemptions. In contrast, peripheral neighborhoods—with more owner-occupied housing—generally exhibit higher rates of participation.

Predictors

Current Property Characteristics

We examine several property-level characteristics that may influence eligibility or participation in the Homestead Exemption program, including ownership status (e.g., corporate ownership) and whether the owner’s mailing address matches the property address.

One of the key eligibility criteria for the Homestead Exemption is that the property must serve as the homeowner’s primary residence. To approximate this, we explore the match between the mailing street address (mailing_street) and the property location address (location) and use it as an predictor. A matching address may suggests that the owner resides at the property.

Another important factor is owner participation in other tax relief programs. Properties already receiving benefits from programs such as LOOP (Longtime Owner Occupants Program) or the Residential Tax Abatement are generally ineligible for the Homestead Exemption. To capture this, we use the exempt-building variable. If this value is non-zero and the property is not enrolled in the Homestead Exemption, it may indicate participation in one of these alternative programs.

The Homestead Exemption also requires that a property is not used exclusively for business or rental purposes—partial use is allowed. We operationalize this rule with two binary variables. We explore its rental Potential by defining the presence of an active rental license and business potential by defining the presence of a business license (excluding rental licenses).

Same Address

Properties enrolled in the Homestead Exemption are more likely to have the owner’s mailing address match the property address, indicating primary residence.

Owner Count Address

Enrolled properties are more likely to be owned by individuals or entities with fewer than 10 properties, suggesting smaller-scale ownership.

Rental Licenses

The Business Licenses & Commercial Activity Licenses dataset from OpenDataPhilly was combined using matching parcel ID. It was found that properties receiving the exemption tend not to hold active rental licenses.

Commertial Licenses

Properties receiving the exemption tend not to have active commercial (non-rental) business licenses.

Historical Property Characteristics

We also examine the historical characteristics of properties, focusing on transfer history and assessment trends over time. The Real Estate Transfers dataset from OpenDataPhilly was combined using matching OPA Account Numbers, while the Assessment History dataset from OpenDataPhilly was combined using matching parcel IDs.

Using residential property assessment data from 2015 to 2025, we compare properties with and without Homestead Exemptions in terms of their average market value and the variability (standard deviation) of that value. This allows us to assess whether exempt properties tend to be more stable or differ systematically in value.

We also analyze year-over-year changes in taxable building values to identify typical patterns of property tax growth, as well as deviations potentially linked to major economic events.

Additionally, we explore property transfer records to better understand the ownership dynamics of exempt properties. We compare exemption rates across different deed types to investigate whether certain forms of property transfer are more commonly associated with exemption enrollment.

Market Value

Properties without exemptions have significantly higher average market values ($304,731 compared to $182,169) and higher variability ($78,513 versus $42,479). This suggests that non-exempt properties tend to have higher market values and experience greater market fluctuations over time. In contrast, exempt properties have lower but much more stable market values, meaning homeowners without exemptions are less vulnerable to sudden market shocks.

Property Transfer Types

Property transfer deed types were also examined in relation to each property. Properties associated with “Deed - Deceased” and “Satisfaction of Mortgage” exhibited notably higher exemption rates. These deed types typically reflect intra-family transfers, inheritance, or mortgage settlements, aligning with common pathways to exemption eligibility. Logistic regression analysis confirmed deed type as a significant predictor of exemption status. Selected deed types can be encoded as binary features for use in predictive modeling.

Recently Transfer

On average, 20.07% of properties without a homestead exemption had a transfer in the past two years, compared to 16.17% of those with an exemption. This indicates that properties without exemptions tend to have a slightly higher likelihood of recent transactions.

Neighborhood Demographic Characteristics

To contextualize individual property patterns within broader neighborhood trends, we also incorporate demographic and financial indicators at the census tract level. Although not formally part of the Homestead Exemption eligibility criteria, these characteristics may influence awareness, application behavior, and outreach effectiveness.

We draw on American Community Survey (ACS) 5-year estimates to include median home value and owner-occupancy rate. Median home value captures the relative wealth of a neighborhood and may correlate with a homeowner’s perceived need for or interest in the exemption. Owner occupancy provides a proxy for residential stability and investment in place—areas with higher rates may be more likely to have eligible and engaged homeowners.

Zoning composition is used to approximate the land use context of each tract. A higher proportion of residential zoning suggests more eligible properties, while mixed-use or commercially dominated areas may contain fewer owner-occupied homes.

We also include tax balance data to capture financial stress at the neighborhood level. This was using the Real Estate Tax Balances dataset from OpenDataPhilly, combining it with the property dataset using zip code. Specifically, we calculate the percentage of properties within each tract that have outstanding tax balances, including principal, penalties, and interest. While this is not a stated criterion for the Homestead Exemption, high levels of tax delinquency may reflect broader socioeconomic challenges—such as lower income, limited English proficiency, or low institutional trust—that influence exemption participation.

Median Home Value

Properties with the Homestead Exemption are typically located in census tracts where the median home value ranges between $200,000 and $400,000, which is generally higher than in tracts without exemptions. However, census tracts with median home values exceeding $400,000 tend to have fewer properties with exemptions, suggesting that wealthier neighborhoods may have lower participation in the program.

Owner Occupancy

Notably, higher rates of owner occupancy don’t automatically translate to higher program participation. This suggests that other factors beyond home ownership - such as awareness of the program, ease of enrollment, or demographic characteristics - may play more significant roles in determining participation rates. These insights can help guide targeted outreach efforts to increase program enrollment among eligible homeowners who are currently missing out on this tax benefit.

Zoning Analysis

Census tracts with single-family zoning, both detached and attached, have the highest percentage of properties with the Homestead Exemption. In contrast, multi-family or mixed-use residential zones tend to have fewer properties enrolled in the exemption program.

Tax Balance Rate

Properties with the Homestead Exemption are more likely to be located in census tracts where fewer properties have outstanding tax balances, particularly in tracts where the tax balance rate is less than 10%.

Modeling

We use the identified property and neighborhood characteristics as predictor variables to model the likelihood that a property receives the Homestead Exemption. The goal is to better understand the key factors associated with exemption status and to identify potentially eligible properties that are not currently enrolled.

We test several classification models, including K-Nearest Neighbors (KNN) and Random Forest, and iteratively refine our feature set. Spatial indicators such as longitude, latitude, and spatial lag terms are incorporated to capture geographic patterns in exemption participation. Our best-performing model is based on XGBoost, which provides the highest predictive accuracy.

Correlation Matrix

We begin by selecting and loading the relevant data, then generate a correlation matrix to gain an initial understanding of the relationships between variables.

Split Dataset

Next, we split the dataset into two parts: 75% for training and 25% for testing. The training set is used to build models using various methods, while the testing set is used to evaluate model performance and identify the most accurate one.

Add Spatial Lag For Training

To prevent data leakage, spatial lag variables need to be calculated separately for the training and test datasets. If spatial information from the test set is used during the training phase, the model would “see” part of the test data’s structure in advance, leading to overfitting and failing to reflect the true generalization ability of the model.

Therefore, in this analysis, we first split the data and then independently calculate the spatial adjacency relationships and spatial lag values for each subset. This approach not only better reflects real-world scenarios (where the model only has access to the training data for prediction) but also improves the accuracy and reliability of model evaluation.

Cross Validation

To ensure the robustness of our model and account for geographic variation, we implement grouped cross-validation using group_vfold_cv(). In this case, we use LOGOCV (Leave-One-Group-Out Cross Validation) based on the geographic_ward variable, which allows us to evaluate model performance across different neighborhood groups.

Before modeling, we also create a preprocessing recipe that includes one-hot encoding for categorical variables, removal of zero-variance predictors, and normalization (centering and scaling) for numeric predictors.

XGBoost Model

We train an XGBoost classification model to predict exemption eligibility, using a grid search to tune key hyperparameters (mtry and min_n). The model is built within a workflow that includes preprocessing steps such as dummy encoding and feature scaling. We apply group-based cross-validation by geographic ward to evaluate model performance and select the best combination of parameters based on accuracy. The finalized model is then fitted on the entire training set and evaluated on the holdout test set. Predictions from both cross-validation and the final test set are collected for further analysis.

Based on the feature importance plot from the XGBoost model, the variable same_address, indicating whether the owner’s mailing address matches the property address, is by far the most influential predictor of exemption status. This likely reflects that owner-occupied properties are more likely to apply for and receive exemptions. Other top predictors include rental_license, suggesting that properties with rental licenses are less likely to receive exemptions, and measures of property value such as sd_market_value (standard deviation of assessed value), avg_market_value, and median_home_value at the tract level, which reflect both property characteristics and neighborhood affordability. Spatial indicators likelag_exemption, lat_4326, and lon_4326 also contribute significantly, showing that spatial clustering and geographic location influence exemption patterns. Overall, a mix of ownership status, rental activity, assessed value, and spatial and demographic context most strongly shape exemption eligibility.

Predict

We use the final model to predict eligibility for exemptions across all properties. We set a 50% probability threshold: properties with predicted probabilities below 50% are classified as ineligible, while those above 50% are classified as eligible. This allows us to compare our predictions with the current exemption status.

As shown, properties without exemptions (marked in red) follow a clear bimodal distribution. The right peak represents the 153,982 properties correctly predicted as ineligible. The left peak highlights over 75,708 properties that are likely eligible for exemption but are currently missing out. These are the properties we will target for outreach.

For properties that already have exemptions (marked in green), there is a single peak, which is expected. This peak corresponds to the 205,780 properties predicted as eligible. The right tail represents 34,006 properties that are incorrectly predicted as ineligible despite having exemptions. We aim to minimize this misclassification. The model performs well in minimizing errors, achieving an 85.8% accuracy in predicting eligible properties with exemptions.

The model’s sensitivity—its ability to correctly identify current exemption holders—is high and consistent across most demographic groups, with rates around 86% and false negative rates (misses) around 14%. Performance is similar for both Majority White and Majority Non-White tracts, as well as across income, foreign-born, and limited English groups.

However, the model is less effective in certain areas: Low Education tracts: Sensitivity drops to 78.6%, with a higher false negative rate of 21.4%. Low Senior Population tracts: Sensitivity is much lower at 52.5%, with nearly half of exemption holders missed (FN rate 47.5%). These results suggest that while the model performs equitably for most groups, additional attention may be needed to ensure residents in tracts with lower education levels or fewer seniors are not overlooked in outreach efforts.

Missed Exemption Opportunities

We focus on false positive properties—those predicted as eligible but currently not receiving an exemption. These properties are key targets for outreach, as they may be missing out on exemption opportunities. At a 0.5 threshold, this amounts to 75,708 houses.

Spatial Density of Missed Exemptions

A density map reveals clustering patterns of missed exemptions across the city. Southwest and North Philadelphia emerge as notable hotspots, suggesting spatial barriers or inequities in exemption uptake. These areas may benefit from additional support or communication efforts.

DBSCAN

DBSCAN, Density-Based Spatial Clustering of Applications with Noise, was also used to explore the spatial distribution of missed exemption opportunities. It is a clustering algorithm that groups data points which are close to each other based on a density criterion. DBSCAN hotspots are regions where there is at least a specified minimum number of points are located within a defined radius (espsilon) from each other. In this case, the minimum number of points was set at 20 and the epsilon at 300 meters. This algorithm is stricter and tends to ignore areas that are high-density but more dispersed. Only dense clusters of closely packed points are identified.

Moran’s I

The local Moran’s I shows two high-high clusters in South and North Philadelphia, indicating areas where properties with missed exemption opportunities are concentrated. In contrast, there are low-low clusters in the city center and Northwest Philadelphia, where fewer such properties are located. In other areas, p-values are above 0.1, so the results aren’t statistically significant.

Demographic

In terms of demographic accorss census tract, the model’s false positive rates—used to guide outreach—show mixed patterns of equity across demographic groups. While the rates are relatively consistent across racial groups (33.0% for Majority White vs 31.9% for Majority Non-White) and language proficiency levels (33.0% for High Limited English vs 33.3% for Low Limited English), there are notable disparities in other areas.

Tracts with lower education levels show significantly lower FP rates (20.0% vs 33.1% in high education areas), and areas with fewer seniors have dramatically lower rates (7.3% vs 33.1% in high senior areas).

Additionally, there’s a slight bias toward wealthier areas (34.8% in Above Average Income vs 31.1% in Below Average Income). These patterns suggest that while the model maintains equity across some demographic dimensions, it may be under-predicting potential exemptions in areas with lower education levels and fewer seniors.

This could reflect true differences in eligibility, but it’s important to monitor these groups to ensure they are not inadvertently underserved. The model’s performance in areas with high foreign-born populations (32.9% FP rate) is relatively equitable compared to low foreign-born areas (35.7%), though the small number of properties in low foreign-born tracts (2,635) makes this comparison less reliable.

Outreach Strategies

Three main potential outreach strategies are proposed. The costs for each of these strategies were then estimated based on existing cost structure references.

The first potential strategy is a direct mailing campaign, which involves sending flyers out to all properties identified as false positives by the model. The flyer would contain information on what the homestead exemption program is, its eligibility criteria, instructions on how to apply, and relevant contact information, translated across different languages.

The next potential strategy is a door-knocking campaign. This involves going door-to-door to identified properties and having canvassers knock on the doors, sharing with homeowners who respond about the homestead exemption program. This face-to-face interaction can highly effective in raising awareness about the program, how to apply to it, and directly address any queries or doubts that the homeowner may have about it. From July 2023 to June 2024, the City of Philadelphia conducted a door-knocking campaign under the Overdose Awareness Canvassing and Trusted Community Messenger Program. Door-knocking is proposed to be conducted for high potential areas with clusters of missed homestead exemption opportunities. This optimizes limited resources in travel time and costs between properties for door knocking. This campaign would therefore only target the 12,302 homes identified in the two clusters, Northeast and South Philadelphia, which have 6,669 and 6,362 target outreach homes respectively.

Another potential strategy is to reach out to community organizations to educate organization leaders and staff about the exemption program, who can then spread the word to their members and the immediate local community about it. This may include community centres or religious organizations, such as the City of Life church in South Philadelphia.

The estimated costs of the outreach strategies were calculated based on the cost structures from the Opioid Response Unit Door-Knocking Campaign and USPS Direct Mail Advertising cost calculator.

Door Knocking:

  • Fixed Costs: Personnel + Supplies
  • Variable Costs: labor $34.80 /hour × (Properties ÷ 10 homes/hr)
  • Indirect Costs: 10% of total direct costs
  • Weekly Capacity: 1,600 homes (8 staff × 20 hrs × 10 homes/hr)

Direct Mailing:

  • Fixed Costs: Personnel + Design + Supplies
  • Variable Costs: Printing $0.09 per piece | Mailing $0.29 per piece
  • Indirect Costs: 10% of total direct costs

The potential benefits of such outreach campaigns are vast. Based on the Data and Marketing Association, the average response rate to a direct mailing campaign ranges between 4 and 9%. With an assumption that 5% of all homeowners we outreach to then successfully enroll into a homestead exemption, the total direct benefit in homeowner savings would be Number of Properties × 0.05 × (up to) $1,399. Using this formula, there can be up to $5,295,774.60 of savings across the homeowners. Majority of the homestead exemptions have been observed to receive the maximum eligible exemption. Comparing this to the total outreach cost for a direct mailing campaign of $35,861, the outreach can bring massive public benefit for a relatively small amount. Furthermore, this estimate does not even include the huge unquantified benefits of increased housing stability for homeowners and their families. This reduces transience and encourages owner occupancy and long term residency, which supports consistent neighborhood networks, fostering more engaged and invested local communities. Owner-occupied homes are also more likely to be well-maintained as there is greater incentive to upkeep the property, which benefits neighborhood aesthetics and property values.

It is then explored how the use of the model compares to an alternative approach of a filtered list of properties. The properties are filtered to residential-only, non-corporation owned properties with the same address as its mailing address. This results in 333,623 properties, which would cost $143,669 to execute solely the direct mailing campaign. Using a machine learning model to identify properties to reach out to substantially narrows down the target number, allowing for huge cost-savings and an effective use of limited resources. The spatial analysis of false positives also allow for more targeted door-knocking to take place.

Dashboard

Our dashboard Homestead Exemption Outreach Explorer is for Philly Stat 360 and potential outreach campaign managers to explore where properties predicted to be likely eligible for the Homestead Exemption program are located, as well as neighborhood profiles of areas. The threshold slider allows one to change the eligibility threshold of the model, where a lower value indicates a more lenient threshold which would include more properties in the census tract, while a higher value indicates a stricter threshold. The exported spreadsheet contains all addresses and owner names of the properties in the selected census tract, based on the chosen threshold level.

Additional features include the address search bar function and the community sites toggle button. Community sites include places of worship, libraries, marketplaces and community centers, extracted from OpenStreetMap. These are potential locations that canvassers can reach out to and partner, including requesting support for disseminating information about the Homestead Exemption program, as well as putting up informational material such as posters in these locations.

One of the census trait characteristics shown in the dashboard is most dominant non-English language spoken in the census tract, along with the percentage of residents who speak it. This can be particularly useful for campaign managers when organizing outreach strategies, such as ensuring canvassers who can speak that particular language conduct outreach in the area to more effectively raise awareness about the Homestead Exemption program to the potential homeowners.

Conclusion

This project aimed to identify homeowners in Philadelphia likely to be eligible but not yet enrolled for the Homestead Exemption program. Through the creation of a machine learning model and a web application dashboard, the hope is for this project to support a data-driven targeted outreach approach that is cost-efficient and able to allow homeowners to receive tax savings. It should be noted that this current model was unable to factor in existing enrollment of a property in the 10-year residential tax abatement or LOOP program, which would make the program ineligible for the Homestead Exemption program. If future access to this data from the Office of Property Assessment can be obtained, this may improve the model further. Nevertheless, the project has provided a model with strong potential and proven the immense benefits stood to be gained by currently unenrolled homeowners at a relatively low cost of outreach. Successfully identifying these homeowners can also allow the City of Philadelphia to provide further benefits such as home upgrading subsidies for these homeowners. Ultimately, it is hoped that ensuring more eligible homeowners are made aware of and can take up the Homeowner Exemption program can offer greater stability, preserve community ties and reduce financial burden for these homeowners.

