1. Introduction

Watauga Location
Watauga Location

1.1 Problem Statement / Use Case

Watauga County, North Carolina, is currently experiencing a housing shortage that is expected to intensify in the near future. The shortage is exacerbated by vacation rentals and student housing, which has significantly reduced the available residential housing options. The Watauga Housing Solutions Committee is partnering with other community and local government organizations looking to develop affordable and workforce housing in order to mitigate the severe decreases in housing affordability. The housing council is currently working with developers to prioritize the most feasible sites to pursue for purchase and development. Because of the extreme physical conditions of the landscape (slope, soils, etc.), there is currently no streamlined system in place to help identify suitable parcels for development. In response to this issue, we have been contracted to develop an application-based data framework that is designed to identify land parcels based on their likelihood to be developed for affordable workforce housing. This data framework has been designed to function as an interactive application interface that will allow users to input criteria related to land supply and combine it with likelihood estimates to rank each parcel in the county in terms of development suitability.

1.2 Project Client

The Watauga Housing Council
The Watauga Housing Council

The main project client is the Watauga Housing Council and its Housing Solutions Committee. This organization is tasked with responding to the economic, environmental, and social aspects of the county to provide adequate housing options for all residents. Our main points of contacts within the housing council are Dr. Kellie Reed Ashcraft (Facilitator/Organizer, Watauga Housing Council), Dr. Chris Quattro (Assistant Professor, Appalachian State University/Member, Watauga Housing Council), Laura Beach (Member, Watauga Housing Council).

1.3 Dependent Variable

The key indicator for development that will be used is the administrative record associated with septic system permits. This permit is exclusively granted when a property satisfies the requirements for both land supply and developer demand at that specific site. Due to the rural nature of the project area, there is relatively low wastewater sewer coverage. Subsequently, a septic system is critical for new development, and a site cannot be developed without one. Our approach will involve identifying the site suitability characteristics (e.g., slope, soil type) most closely related to the probability of a parcel receiving a septic permit. Using this information, we will forecast the likelihood that a given parcel will receive a new septic permit, which is the dependent variable.

To conduct this analysis, we used permit data gathered online from 2017 & 2022. Being able to observe how permitting has changed over those 5 years will help us predict which parcels are most likely to be suitable for development in our future target year of 2025. Watauga County approved 300 new septic permits in 2017, The number increased to 472 in 2022. The data indicates a 57.33% increase in the number of septic permits over the five-year period. The increase in septic permits suggests a significant increase in new development and land use change in Watauga County over the past five years.

2. Exploratory Data Analysis

2.1 Demographics

Watauga County has undergone notable demographic shifts over the last five years. In 2017, the county’s population was 53,421, which rose to 54,540 by 2022, indicating a steady increase in residents. This growth underscores the importance for the Housing Council to proactively address future housing demands as more people continue to make Watauga their home. Alongside population growth, the median age in Watauga has also increased, from 30.6 in 2017 to 32.1 in 2022. Despite this rise, the county maintains a youthful demographic compared to surrounding areas, which typically see median ages in the mid to upper 40s, likely influenced by Watauga’s substantial student population.

Additionally, the average household size has shown a slight decrease, moving from 2.35 in 2017 to 2.30 in 2022. This change mirrors trends in neighboring counties, where the average household size ranges between 2 and 3 people, suggesting a predominance of family units over single individuals or couples. In terms of economic metrics, the median household income in 2022 was $50,034, aligning closely with nearby counties like Ashe, Avery, Caldwell, and Wilkes, which have median incomes within $3,000 of Watauga’s. However, these figures also highlight the county’s relatively lower income levels compared to the national median of $74,580, emphasizing the need for targeted economic and housing policies to support the community’s growing and diversifying population.

Population Size

Median Age

Average Household Size

In 2017, the median household size in Watauga County was 2.35. Decreasing slightly in 2022 at 2.30, household size largely mirrors those of surrounding counties, which all experience households of approximately 2-3 people. This suggests that many homes in the area are families, rather than just individuals or couples living together.

Median Income

In 2022, Watauga County saw a median household income of $50,034. This is on par with the surrounding counties of Ashe, Avery, Caldwell, and Wilkes, which fell within $3,000 of this range. ($49,176-$53,313). The county experiences relatively low incomes, compared to the national median of $74,580.

2.2 Current State of Housing

In 2022, Watauga County’s housing landscape demonstrated a complex mix of ownership and rental challenges amid demographic shifts due to the local university population. Approximately 61% of homes in the county are owner-occupied, which juxtaposes interestingly with the demographic influx from Appalachian State University, indicating that a majority of adults not associated with the university are likely homeowners. This figure highlights a relatively stable ownership rate but also masks underlying issues in the rental market.

Despite a regional downtrend, Watauga County has experienced an increase in rent burden since 2017, with more than two-thirds of residents now spending over 30% of their income on rent, signaling a growing instability in the local housing market. This rise in rent burden suggests that despite the high rate of home ownership, affordability remains a pressing concern, particularly for renters.

On the ownership side, the median house value in Watauga stands at $283,000, aligning with the national average property value. There has been some improvement in housing affordability among owners over the past five years, with the percentage of unaffordable owner-occupied houses decreasing from 15.6% in 2017 to 14.4% in 2022. This rate of unaffordable housing is slightly better than the national average of 16.8%, indicating a positive trend towards greater affordability for homeowners.

The county’s demographic profile is significantly shaped by the presence of Appalachian State University, which hosts over 18,000 students. Approximately 30% of the county’s population comprises college students, impacting the local economy and the housing market. The influence of the student population necessitates careful planning from the Housing Council to accommodate both the current and future housing needs of students and the broader community as students transition into the workforce within the county.

Lastly, vehicle ownership is notably high in Watauga, with nearly every household owning at least one car, and about 14,000 households owning two or more. This prevalence of vehicle ownership indicates that mobility within the county is largely car-dependent, influencing daily travel patterns and potentially impacting housing location preferences related to accessibility and commute distances.

Housing by Occupancy Type (Own vs. Rent)

Rent Burden

Number of Households by Vehicles Owned

2.4 Physical Features

Land Cover Changes Over 10 Years

Watauga County has experienced significant land cover changes from 2011 to 2021, reflecting a notable shift towards development at the expense of natural landscapes. The data indicates a 13.89% increase in developed areas encompassing low, medium, and high-intensity developments, with medium intensity seeing the most significant growth at 17.22% by 2021. Concurrently, there has been a dramatic reduction in natural land covers, with a 56.5% decrease in Grassland/Herbaceous areas and a 50% reduction in Shrub/Scrub land. These changes are likely due to the conversion of open fields and grasslands into developed land or agricultural use, though the majority of the land remains natural or semi-natural with a small rise in developed open spaces.

The 2021 land cover data further underscores Watauga County’s rich natural resources against its moderate urban development. The area is predominantly forested, with 74.56% of the land covered by forests, including 54.16% deciduous, 19.2% mixed, and 1.2% evergreen forests. Developed areas constitute only 13.89% of the total land cover, with 11.23% as developed open space, 1.48% low intensity, 0.9% medium density, and 0.28% high intensity. The preponderance of development is classified as open space, such as parks, golf courses, or agricultural land that maintains the rural character of the county. The smaller percentages of more intensive developments reflect concentrated urbanization around cities and transportation corridors, indicating strategic land use planning that balances natural landscape preservation with residential and commercial growth. This careful balance suggests a mindful approach to development, prioritizing the maintenance of natural covers while accommodating necessary growth over the past decade.

2011-2021

2011

2013

2016

2019

2021

Slope & Soil

In Watauga County, where the landscape is predominantly mountainous, development is highly dependent on the terrain’s topography. Slope is a critical factor in assessing the feasibility of construction projects, with steeper slopes presenting significant challenges. Utilizing slope data, we’ve identified areas where the gradient exceeds 25°—well below the maximum development-friendly slope of 65% (approximately 33.02°)—as unsuitable for development. The reclassified slope degrees within the county are visually represented in a plot where green areas indicate slopes suitable for development, and white areas represent unsuitable zones.

Additionally, the county’s diverse soil composition includes over 80 types of soil, crucial for determining the appropriate design and installation of septic systems. Soil types affect the water absorption rate and the necessary depth for septic systems. An analysis of the USDA’s soil database reveals that these soils are categorized by their Map Unit Symbol (MUSYM) into three main groups based on their drainage capabilities and permeability: A-Moderately Rapid Infiltration Rate, B-Moderate Infiltration Rate, and C-Low Infiltration Rate. Type C soil, characterized by low infiltration rates and good drainage, is identified as optimal for septic system installation, contrasting sharply with the less desirable, poorly drained soils.

Slope

Soil

Distance to Calculations

In Watauga County, landslide risk predominantly dictates parcel safety, with proximity to landslide-prone areas being a critical factor. The northeastern region, characterized by its gentler slopes, is identified as the safest area, as depicted in the provided figure where white lines, often indicative of streams or roads, demarcate parcel boundaries.

Water management is also vital in the county’s planning strategy, especially concerning the placement of septic systems relative to watersheds. The central region of Watauga County lies close to essential water supply zones, necessitating stringent measures to prevent pollution and safeguard the water supply during septic system installation. This careful management ensures that environmental impacts are minimized.

Furthermore, the accessibility of parcels to transportation infrastructure is crucial for their development potential. The analysis shows that while some parcels are well-connected, areas in the southwest and northeast of Watauga lack sufficient road access, potentially limiting their development. Despite Watauga’s extensive network of water bodies, which influence septic system design due to their impact on the water table, our evaluations focus on the distance from the parcel’s center of mass to the nearest river, finding minimal variance in proximity across the county. This comprehensive approach to analyzing geographical and environmental factors ensures informed decision-making in land development and resource management.

Note: The white lines in the plots represent gaps between the parcels. They often correlate with streams or roads.

Distance to Landslide

Distance to Watershed

Distance to the nearest Road

Road is one of the most important infrastructures for housing development.We filtered out the roads with traffic equal to or higher than 2000.

Distance to Nearest Water Body

Natural Areas, Spatial Lag, Economic Features

Watauga County hosts a range of sites recognized as natural heritage areas or significant natural areas at various levels, from national to county. These protected areas, whether under federal, state, or private ownership, are vital for maintaining biodiversity, preserving cultural heritage sites, and conserving natural resources. Precisely identifying the locations of these protected sites is crucial for pinpointing development-suitable parcels while upholding conservation efforts.

property value plays a pivotal role in assessing the development potential of a parcel. It reflects market demand and potential return on investment, being primarily determined by the property’s sale price. Parcels with higher property values indicate greater development potential, guiding developers and planners in decision-making processes focused on maximizing economic benefits while considering environmental and cultural preservation.

Additionally, an analysis of the spatial lag of previous building permits issued between 2017 and 2022 provides insights into development trends and patterns in Watauga County. This analysis helps to identify areas with significant concentrations of recent development activities as well as regions that have seen fewer investments. Understanding these patterns is essential for forecasting future development hotspots and potential areas of growth. This spatial data not only reflects the historical demand for development but also assists planners and developers in making informed decisions about where to direct development efforts next, ensuring balanced growth across the county.

Natural Areas

Spatial Lag

Economic Features: Property Value

Property value is a critical factor in determining the development potential of a parcel. It is a direct reflection of the market demand and the potential return on investment. The property value is calculated based on the sale price of the property. The higher the property value, the higher the development potential.

3. Modeling

For our modeling process, we attempted three different models: the logistic model (to calculate probabilities), Random Forest, and Poisson regression (permit number). We’ve encountered various issues, including overfitting and misleadingly high accuracy rates, which we attribute to our imbalanced dataset—only 0.05 percent of parcels have permits. To address this, we’ve tried removing parcels with slopes too steep for development and resampling by oversampling the instances with permits (randomly duplicating the 1’s), yet we still end up with similar challenges. Below, you will find the results of these three models to assist in deciding which model type is best to proceed with.

3.1 Loading Data

Here we load the dataset filtered out the developed and unable to develop (steep slope) parcels as our dataset used for model training. In order to tackle the imbalanced problem of our dataset (1% of ‘1’ and 99% of ‘0’), we randomly duplicate the ‘1’ in our datset and reach the ratio of 10% of ‘1’ and 90% ‘0’.

3.2 Random Forest Model

Firstly, we prepare the training dataset by dropping the ‘GlobalID’ and ‘n_permit_22’ columns. And set the ‘permit_22’ column as our target variable. Then we standarize all the independent variables in the dataset.

To determine the variables with the greatest significance, that could be used for model training, we plot out the correlation matrix and a bar chart to show the p-value of the independent variables. Then we choose 13 most significant variables to include in our training.

Correlation Matrix

The correlation matrix shows that the ‘Slope_Ave’, ‘Slope_Max’, ‘permit_17.nn_3’, ‘permit_17.nn_4’, ‘permit_17.nn_5’, ‘Pasture_Hay’, ‘Developed_Medium_Intensity’, ‘permit_17’, ‘YRBUILT’ and ‘Dist_Flowline’ are the most correlated variables.

P-value Bar Chart

Similar to the result of the correlation matrix, the p-value bar chart shows that the ‘permit_17’, ‘Slope_Ave’, ‘Slope_Max’, ‘permit_17.nn_3’, ‘permit_17.nn_4’, ‘permit_17.nn_5’, ‘Pasture_Hay’, ‘Developed_Medium_Intensity’, ‘permit_17’, ‘YRBUILT’, ‘Dist_Flowline’, ‘Dist_Road’, ‘Developed_High_Intensity’, ‘Developed_Low_Intensity’, ‘Developed_Open_Space’, ‘Soil_A’ and ‘infaltion_17’ are the most correlated variables.

Choosing Features

So we include the target variabl ‘permit_22’ and 15 features into the model. The features are ‘permit_17’, ‘Slope_Ave’, ‘Slope_Max’, ‘permit_17.nn_3’, ‘permit_17.nn_4’, ‘permit_17.nn_5’, ‘Pasture_Hay’, ‘Developed_Medium_Intensity’, ‘permit_17’, ‘Dist_Flowline’, ‘Dist_Road’, ‘Developed_High_Intensity’, ‘Developed_Low_Intensity’, ‘Developed_Open_Space’, and ‘infaltion_17’

Data split

We split our dataset into training and testing sets using the initial_split() function from the rsample package. We use 80% of the data for training and 20% for testing. The set.seed() function is used to ensure reproducibility.

Hyperparameter Tuning & Cross Validation

Next, set up the hyperparameter grid for tuning:

Hyperparameter tuning and cross-validation are performed using the tune_grid() function from the tune package. We specify the metrics to optimize for, including ROC AUC, accuracy, sensitivity, precision, and F1 score.

Finally, finalize the model with the best parameters and evaluate it on the test set.

3.3 Spatial Cross-Validation

Once the model is created, it is important to test how accurately it performs across different areas of the County. In the map below, the random forest model’s accuracy is mapped across zip codes in Watauga, showing that while the model is very accurate across the County, it performs especially well towards the center regions. This is an important consideration when acknowledging that the center region is also more likely to be an early target for development due to its proximity to Boone and other amenity hubs.

4. Predicting for 2027

After getting the best model, we should start our prediction. Based on the data in 2022, we could predict the septic permits for 2027.

Below are the prediction results. There is a higher density of development in Boone, the largest city in Watauga County, which aligns with its future urban expansion.

Overlapping with the 2017 and 2022 Permit Data

After overlapping the existing predictions between years, we can know in the future, the urban expansion seems to be more concentrated in the center of Watauga County.

5. Application Development

App Link: https://seuha.github.io/watauga-app/

5.1 Users’ Workflow

5.2 App Manual

Left Panel

The left panel contains a search bar where users can search for a specific parcel by entering the parcel ID. The search bar will automatically zoom to the selected parcel on the map.

And the 3 filters below the search bar allow users to filter parcels based on the following criteria:

  • Size of Parcel: Filter parcels based on the size of the parcel, which will infect the housing type could be built on the parcel.

  • Slope Threshold: Filter parcels based on the maximum slope degree.

  • Value of Parcel: Filter parcels based on the property value of the parcel.

When entering the value in the filter, the map will automatically update and show the parcels that meet the criteria.

The bottom left list shows the top 10 parcels with the highest development probability. And if the user clicks on the parcel listed in the box, the map will automatically zoom to the selected parcel.

Layers Panel

The layers panel provides 3 options for the users to choose from:

  • Parcel View: A preview to show all parcels in the map.

  • Development Probability: Showing the development probability by color gradient, the deeper color means a higher probability of development.

  • Distance to Roads: Showing the distance to the nearest main roads by color scheme.

Pop-up Info Box

When clicking on a parcel, a pop-up info box will show up and provide the following information: Parcel ID, Area(Acre), Slope Average (°), Slope Maximum (°), Distance to Road (ft), Development Probability, Undevelopable (<35°) [Yes/No], Already Developed [Yes/No].

