Literature Review

We conducted a literature review of journals that analyzed the effects of high-speed internet access and property prices.

Home Is Where the Internet Is? High-speed Internet’s Impact on Rural Housing Values

Kelsey L. Conley and Brian E. Whitacre

This report uses data from the National Broadband Map, the Federal Communications Commission, and over 2,700 housing transactions from June 2011 to June 2017 to examine the impact of broadband availability on housing values in two rural Oklahoma counties via a hedonic price model. The results find no support for the existence of a broadband premium, and stress that differences across counties are crucial in assessing rural housing prices. The full report can be found at this link.

High-speed Internet access and housing values

Gabor Molnar, Scott J. Savage, and Douglas C. Sicker

This report uses a hedonic regression model that relates house values to high-speed Internet access while controlling for the potential endogeneity of Internet access. Results show that single-family homes with access to a 25 Mbps broadband connection have a price that is about $5,977, or 3%, more than similar homes in neighborhoods with 1 Mbps. The rural premium is lower at $5,099. The full report can be found at this link.

Project Area Selection

For this analysis, we used a subset of the CoreLogic data, which included data for the eight states with the highest number of properties in the data: Arkansas, California, Colorado, Florida, Georgia, North Carolina, Ohio, and Texas. To select the project area for our analysis, we calculated the counts of properties inside the project area and outside the project area by a radius of 20 miles, both before and after the broadband deployment project was implemented. This was completed by correlating the latitude/longitude of each property to the project shapefiles and finding the properties that intersected using the sf package in R. The counts are presented in the table below for a subset of the 132 BIP projects that were present in one of the eight states in the CoreLogic data.

state8_counts <- read.csv("www/state8_counts")
knitr::kable(state8_counts)

X	BIP_ID	tot_in_20m	tot_out_20m	bef_in_20m	bef_out_20m	aft_in_20m	aft_out_20m
1	AR1102-AG39	196	25762	0	45	196	25717
2	CA1107-B40	3450	231602	0	1984	3450	229618
3	CA1110-A40	0	208662	0	59	0	208662
4	CA1112-A40	0	84220	0	12	0	84220
5	CA1113-A40	112620	2390368	8	1099	112620	2390368
6	CA1114-A40	49153	6635504	4	2583	49153	6635504
7	CA1115-A40	17844	62734	1	0	17844	62734
8	CO1106-A40	1747	63464	0	42	1747	63422
9	CO1107-A40	2846	930471	3	46	2846	930471
10	CO1108-B39	35731	206466	3	7	35731	206466
11	CO1109-A40	89	73694	0	12	89	73694
12	CO1110-A40	49	4954	0	0	49	4954
13	CO1114-A40	77	52637	0	12	77	52637
14	FL1105-A40	18569	2888348	57	21243	18512	2867105
15	GA1105-A40	7199	453800	13	614	7186	453186
16	GA1107-A40	8	97332	0	7	8	97332
17	GA1109-C40	2201	12151	4	7	2201	12151
18	NC1103-A40	0	466492	0	398	0	466094
19	NC1105-A40	0	110710	0	0	0	110710
20	NC1106-A40	16579	385708	0	20	16579	385708
21	NC1107-B40	16892	486199	0	1075	16892	486199
22	NC1108-A40	8872	522083	0	39	8872	522083
23	NC1109-A40	15140	1322566	0	150	15140	1322566
24	NC1110-A40	0	65805	0	9	0	65805
25	OH1106-A40	22048	2891452	910	44797	21138	2846655
26	OH1107-A40	1147	241801	0	1	1147	241801
27	OH1108-A40	56058	2834094	0	34	56058	2834094
28	OH1110-B40	12447	243845	0	4	12447	243845
29	OH1111-A40	1095	242735	0	2	1095	242735
30	TX1113-A40	394	385529	0	6926	394	378603
31	TX1116-A40	0	50	0	0	0	50
32	TX1119-A40	36	356593	0	0	36	356593
33	TX1120-A40	1774	238414	0	3	1774	238414
34	TX1122-A40	9	12347	0	0	9	12347

Broadband Initiative Project OH1106-A40 is one of the few project areas that has sufficient data from this specific CoreLogic dataset, so our analysis will be based upon the properties and their prices and characteristics within this project area. The OH1106-A40 project shapefile and plot of properties inside (green) and outside (orange) are presented below.

Rplot Map Before

Model

Variable Selection

Dependent Variable: Sales Price vs Assessed Price

Our initial analysis was focused on assessing changes in property prices using the sale value as the dependent variable. However, the sales values data are largely incomplete with a completeness percentage of 65.90% as shown by the exploratory data analysis in the Data page. We shifted our dependent variable from sale value to the assessed values of the properties. The assessed values are calculated by county-level tax offices that conduct these assessments on a regular basis, varying by tax office. The assessed values had a completeness percentage of 99.41%, ensuring that our analysis could use most of the data.

To establish that assessed values could serve as a good proxy for sales values within Ohio, we calculated the correlation between the medians and means of the two variables at the ZIP code level by year. We removed ZIP codes that had less than 40 observations within a given year. The correlation between the median and mean values was both strong and positive, with a correlation coefficient of 0.938 and 0.961, respectively. We plotted these points and overlayed them with a LOESS (locally weighted smoothing) regression shown below. We can confidently use assessed prices as the dependent variable of our analysis.

Independent Variable Manipulation

Age (age)

CoreLogic provides the year built and effective year built of properties, where the year built represents the initial property building year, and the effective year represents year where substantial renovation was completed to the property. Thus, we created a new variable that represented the most recent year built, i.e., if the property had an effective year built, then this value was used; otherwise, the initial year built was used. We then subtracted this value from the assessed year, which would give us the age of the property when the home was assessed. For example, if the home was assessed in 2009, and the home was renovated in 1993, the age of the home would be 16 years at the time of assessment. This age variable is used in our model.

Distance (distance_miles)

The distance is a main effect variable which calculates the distance of the property to the closest border of the project area, which was calculated using the shapefiles. Properties inside of the project areas have a negative distance while properties outside have a positive distance. This will be used to assess whether the program had an effect on property prices before/after the project was implemented. This distance_miles variable is used in our model.

Living Area to Land ratio (sqft_ratio, living_square_feet, land_square_footage)

The ratio of the living area (square feet) to the land area (square feet) was taken to create the variable sqft_ratio. sqft_ratio, living_square_feet and land_square_footage are all used in our model.

Dummy Variables

Inside/Outside Project Area (dummy_out_in)

To specify whether a property is inside or outside of the project area, we created dummy variable dummy_out_in where 0 represents a property outside of the project area and 1 represents a property inside the project area. This was created using the project area shapefiles.

Before/After Project Implementation (dummy_before_after)

To specify whether a property's value was before or after the project was implemented, we created dummy variable dummy_before_after where 0 represents an assessment before project implementation and 1 represents an assessment after project implementation. The OH1106-A40 project was implemented in 2010, so assessment values leading up to and including 2010 correspond to a 0 and assessment values after 2010 correspond to a 1.

Mobile Home and Fireplace (dummy_mobile_home_ind, dummy_fire_place)

The CoreLogic data has indicators for whether a property is a mobile home or has a fireplace. Thus, we create dummy variable where 1 represents that it is a mobile home or has a fireplace, and 0 if it is not a mobile home or doesn't include a fireplace. Variables dummy_mobile_home_ind and dummy_fire_place are included in our model.

Factor Variables (factor_bedrooms, factor_total_baths_calculated, factor_number_of_units, factor_stories_number)

CoreLogic data includes values for the number of bedrooms, bathrooms, units, and stories that each property has. These are all discrete variables and have varying distributions as seen on the boxplots to the left. These are all transformed to factor variables with fewer levels. The factor variable distributions should have near normal distributions, which is true for these variables, not including the number of stories. Variables factor_bedrooms, factor_total_baths_calculated, factor_number_of_units, and factor_stories_number are all used in our model.

bedrooms factor_bedrooms total_baths_calculated factor_total_baths_calculated number_of_units factor_number_of_units stories_number factor_stories_number

Interactions (dummy_in_after_interaction, distance_after_interaction, distance_in_after_interaction.)

Finally, we create three interactions between variables. The first interaction variable is dummy_in_after_interaction, an interaction between dummy_out_in and dummy_before_after. Thus, the observations will be valued at 1 if they are both inside the project area and after the project implementation. This variable is the main effect variable, and we will use the coefficient and p-value of this variable to determine the impact of the broadband initiative project.

The next interaction variables are distance_after_interaction and distance_in_after_interaction. Interaction variable distance_in_after_interaction represents an interaction between the distance_miles and dummy_out_in and dummy_before_after. Thus, in a similar fashion to dummy_in_after_interaction, distance_in_after_interaction represents the effect on property prices for properties inside and after project implementation, magnified by the distance of the property to the project boundary. This variable is also the main effect variable, and we will use the coefficient and p-value of this variable to determine the impact of the broadband initiative project, although in a seperate model from the dummy_in_after_interaction variable.

Data Cleaning

After selecting and manipulating the variables, duplicated and incomplete property observations were deleted. Our final dataset had 137558 observations, compared to 289184 before the data was cleaned. The distribution of these properties by inside/outside the project area and before/after project implementation are shown in the contingency matrices below.

Final Model

With the variables mentioned above, we created two log-linear models with the dependent variable of assessed property values. The model to the left uses main effect variables dummy_out_in, dummy_before_after, and dummy_in_after_interaction. The main variable coefficient to consider is the dummy_in_after_interaction coefficient of -0.01409, which is 0.986 when exponentiated. This signals that homes inside the project area after the project implemention had a lower property value by 1.4%. However, this is the only variable in the model with an insignificant p-value, denoting that no significant conclusion can be made.

The model on the right uses dummy_before_after, distance_miles, distance_after_interaction, and distance_in_after_interaction. The main variable coefficient to consider is the distance_in_after_interaction coefficient of -1.17e-05, which is 0.999 when exponentiated. This denotes that home that were inside the project area and after project implementation had a lower property value by 0.1%, showing almost no effect. Unlike the first model, this main effect variable is significant to the model.

In both models, housing characteristics have strong influences on the property homes. Properties with fireplaces and higher numbers of bedrooms, bathrooms, units, stories, living area, and land area had a housing premium. On the other hand, properties that were older, classified as mobile homes, and had a higher living area to land area ratio had lower property values.

The outputs are given below.

Conclusion

This project sought to determine if the implementation of broadband in rural areas would affect the property values of homes in said areas. Our analysis focused specifically on Broadband Initiative Project OH1106-A40. From the two models above, we can infer that there was no significant difference between the property values of homes that were inside of the project area versus outside of the project area after the project was implemented. Housing characteristics such as age, size, and the number of bedrooms/bathrooms in the property were significant indicators of the property value.

Next Steps

This analysis revolved around a specific project area, which we would like to use as a model to extrapolate to other project areas in the Broaband Initiative Project and other broaband projects more generally. We would also like to implement further modeling techniques such as the spatial regression model and difference in differences model mentioned in the overview page.

Residential Property Values Analysis