The following project was created in association with the MUSA 801 Practicum at the University of Pennsylvania, taught by Ken Steif, Michael Fichman, and Matt Harris. We would like to thank Charlie Catlett of the Argonne National Laboratory for providing feedback to help us create a meaningful application. All products from this class should be considered proofs of concept and works in progress.
This document is split into two parts. The first part addresses the policy implications of our project, as well as the concepts of data reliability that underpinned our methodology. The second part presents our methodology together with the codeblocks necessary for its replication. The policy implications and concepts are explicated in the sections Introduction and Defining Data Reliability. Our methods can then be replicated by following the sections Scoring Data Reliability and Scoring Data Reliability After Imputation.
“…sound and touch and taste all have a place in the tools which…will define digital planning in the near future.” - Michael Batty
In a foreword for Robert Laurini’s Information Systems for Urban Planning: A Hypermedia Co-operative Approach, Michael Batty, Professor of Spatial Analysis and Planning at the University College of London, identifies the role that our senses play in the future of urban planning. As urban citizens rely on their senses to interpret and experience their urban environment through sights, sounds, smells, and touch, a planner needs to obtain a good understanding of the same sensorial stimuli that shape urban experience and quality of life in order to make meaningful improvements to it.
This is why the rising ubiquity and decreasing cost of sensing tools has the potential to change the way planners plan, structure, and manage the city. Sensor devices are often designed for tasks that either emulate or extend beyond the human senses. More importantly, they collect valuable data that help us approximate the human sensory experience in an urban environment, generating large volumes of feedback at spatial and temporal scales that can be further analysed for detailed insights. As cities around the world strive to be ‘smart’ in the ways they enhance quality of life for its citizens, sensor networks are also increasingly deployed to collect data that can be used to understand and manage the urban experience.Here, new technology plays a role in transforming efforts for sustainable urban growth and smart city planning.
As an urban-scale sensing network that collects real-time environmental data in cities, the AoT initiative exemplifies this trend. This is an initiative led by Charlie Catlett and the researchers from the Urban Center for Computation and Data, a joint initiative of the Argonne National Laboratory and the University of Chicago. Launched in 2016 and currently implemented in Chicago, the data collected through this initiative is open and free to the public.
The AoT could be the first sensing project of this geographic scale and level of temporal and data type specifity. As presented in the figure below, the AoT network comprises nodes, which are sensor boxes containing up to 15 sensors measuring different sensory data types, or parameters. These parameters include temperature, humidity, pressure, PM 2.5 concentration, and concentrations of other hazardous gas types such as carbon monoxide (CO), nitrous dioxide (NO2) and sulphur dioxide (SO2). As of this writing, there are 86 nodes citywide, as seen in the figure below. When fully implemented, the AoT network will consist of 500 nodes across Chicago.
This opens up a whole array of possible angles and ways that individuals, organisations, researchers, engineers and scientists can study urban environment and living. This is the main objective of the AoT initiative. Particularly, the data presents valuable insights for urban policy planners and researchers interested in devising urban policies that are sensitive to the unavoidable human-environment dynamics that shape urban behaviour and livability.
Extracting such valuable insights from sensor data requires the raw data to be processed and analysed. Here, the extent of data processing and quality of data analysis critically depends on the reliability of the data itself.
Therefore, our project seeks to evaluate the level of reliability of the data collected by the AoT network. In the following Section 2, we first define criteria for what it means for AoT data to be reliable. In Section 3 and Section 4, based on these criteria, we then provide a method to numerically score daily network data reliability for the network in terms of different data parameters.
Based on this score metric, we hope that planners could easily identify segments of the big AoT dataset for their relevant analysis. All in all, we also hope the transparent evaluation factors and scores behind this data reliability analysis could promote a more informed use of data in the increasingly data-driven planning process. We see this application to be additionally useful in improving research efficiency, considering the increasingly large stores of sensor data available - planners will be able to scope the spatial and temporal scale of their research according to where and when reliable segments of the data is available, instead of having to explore many different datasets to finalise a suitable scope of analysis.
Below we set the working directory, and load the libraries needed for the analysis as well as a plotTheme. We also set the memory limit to a high value, given the size of data we are processing here.
library(dplyr)
library(DBI)
library(RSQLite)
library(dbplyr)
library(lubridate)
library(leaflet)
library(tmap)
library(ggplot2)
library(sf)
library(plotly)
library(sp)
library(spatstat)
library(rgeos)
library(rgdal)
library(tidyr)
library(gridExtra)
library(stringr)
library(tidyverse)
library(caret)
library(sf)
library(FNN)
library(spdep)
library(knitr)
library(kableExtra)
library(htmlwidgets)
library(htmltools)
library(tmap)
library(openair)
setwd("~/Capstone/Exploratory")
memory.limit(100000000000)
## [1] 1e+11
plotTheme <- function(base_size = 12) {
theme(
text = element_text( color = "black"),
plot.title = element_text(size = 14,colour = "black"),
plot.subtitle=element_text(face="italic"),
plot.caption=element_text(hjust=0),
axis.ticks.x = element_blank(),
axis.ticks.y = element_line( size=.1, color="#ababab" ),
panel.grid.major.y = element_line( size=.1, color="#ababab" ),
panel.grid.major.x = element_blank(),
panel.background = element_blank(),
panel.border = element_blank()
)
}
In order to use the data, planners need to know if the data is reliable. Here, we identify 4 ideal criteria for AoT data to be considered reliable and useful:
Sensor Measure Reliability: All the data collected should be within sensing range, as determined by the sensor specifications.
Spatial Reliability: Reliable data should be collected for the whole spatial extent of the study area, which is Chicago in this case.
Temporal Reliability: Reliable data should be collected at consistent time intervals throughout the day.
Imputability: Missing and unreliable data records should be easily substitutable by the next nearest record in time and space. This substitution, or imputation, should improve sensor measure, spatial, and temporal reliability.
Each criteria is demonstrated below, together with the method through which the criteria is scored. Such numerical scores are metrics that facilitate easy comparison between different datasets.
Each of these 4 criteria will be conceptualised in detail in the following sub-sections below.
This is the first and most important criteria defining data reliability in our method.
According to the AoT metadata site, each sensor has a specific detection range. This means that a well-functioning sensor should record values within this range. Otherwise, values recorded outside this range indicate that the sensor is faulty - and these values are unreliable and unuseable.
The table below presents the specific detection ranges of different sensors recording different data parameters.
| Data Type | Parameter | Minimum senseable value | Maximum senseable value | Unit |
|---|---|---|---|---|
| Weather | Temperature | -55 | 125 | deg Celsius |
| Weather | Relative Humidity | 0 | 100 | Percent |
| Weather | Pressure | 300 | 1100 | Pascal |
| Air Quality | PM2.5 concentration | 0 | - | PPM |
| Air Quality | CO concentration | 0 | 1000 | PPM |
| Air Quality | H2S concentration | 0 | 50 | PPM |
| Air Quality | NO2 concentration | 0 | 20 | PPM |
| Air Quality | O3 concentration | 0 | 20 | PPM |
| Air Quality | SO2 concentration | 0 | 20 | PPM |
It should also be noted that temperature values recorded by the sensors should also fall within logical seasonal ranges. Intuitively, we know that temperature cannot fluctuate between -55 deg Celsius and 125 deg Celsius in a day. Also, we expect temperature values collected during winter in Chicago to be around or below the freezing point of 0 deg Celsius, while summers should record higher temperature values. Therefore, to determine sensor value reliability for temperature values, we also reference daily temperature ranges in Chicago published by the National Weather Service.
Based on these, we can label each sensor value as reliable or not. If the value falls within the sensor specification range (and daily range for temperature values), reliability == 1, and if not, reliability == 0.
To evaluate the network sensor value reliability, we are interested to know on average the proportion of sensor values measured in each node that are reliable. The more sensor values measured that are reliable, the more reliable the overall network is in terms of sensor value reliability.
Based on the this criterion of Sensor Value Reliability, we can define active nodes and inactive nodes in the network.
Active nodes record at least one reliable value that falls within the sensor specification range during the course of a day.
Inactive nodes record not even one reliable values that falls within the sensor specification range during the course of a day. These includes nodes containing sensors that do not record any value at all.
This helps us define two other relevant reliability criteria, Spatial Reliability and Temporal Reliability, that will be elaborated on in Section 2.3 and Section 2.4 respectively.
To evaluate the spatial reliability of the network, we are interested to know whether active nodes are distributed across Chicago. Here, the area covered by active nodes are considered to have reliable data collected for it. A network that is fully spatially reliable is one that has its nodes distributed across the whole of Chicago, such that the total spatial extent of the nodes span the area of the city. A network that is not fully spatially reliable is one which total spatial extent span only part of the city. The figure below illustrates this point:
Therefore, the average proportion of Chicago’s area covered by the network extent at any one time serves as a metric for spatial reliability here.
To evaluate the temporal reliability of the network, we are interested to know the average proportion of the day-duration that a node within the network is active for i.e. collecting reliable data. A network that is fully temporally reliable is one that has all its nodes collecting reliable data consistently across all time intervals during a day. A network that is not fully temporally reliable is one that has at least one of its nodes not collecting reliable data at some point during the day. The figure below illustrates this type of network - while some of its nodes are consistently collecting reliable data throughout the day, others have periods during which no reliable data is collected at all:
Therefore, the average proportion of the day-duration during which reliable data is being collected serves as a metric for temporal reliability here.
Imputability refers to the possibility and effectiveness of replacing missing or as-if-missing values with observed ones. Here, unreliable data is considered as-if-missing.
The figure below presents our imputation method to replace missing and unreliable data in the dataset we retrieve from the AoT database.
To evaluate imputability, we first apply the procedure above to our original dataset to obtain one that is imputed for. We then apply the score metrics for the other 3 reliability criteria on this new imputed dataset, and compare the scores. Ideally, the scores for the second dataset should be higher - this will suggest the effectiveness of imputing for unreliable and missing data in the retrieved dataset. If the scores are not higher, this indicates that that imputation cannot be used to ‘salvage’ the original dataset that is consisted of too many unreliable data points. In this case, planners might be advised to not use the dataset for that day and data type at all.
In this section, we will present the method of scoring Data Reliability based on the criteria of Sensor Value Reliability, Spatial Reliability and Temporal Reliability. This will be demonstrated using data collected on 2018-12-15 for the different data types listed for weather and air quality in their respective sections.
The flow chart below illustrates the common workflow we adopt for scoring data reliability for each day:
To manage the large AoT dataset, we first download the dataset for December 2018 from the AoT data site and import it into a database using SQL Server Management Studio (instructions on this can be found here. We then save this database as Chicago2018-12.db. It is this database that we will connect to using the dbConnect function available from the DBI R package.
dbname<-'Chicago2018-12.db'
To faciliate the workflow, we provide a function below for users to retrieve AoT data from the SQL database and then determine if each data point is reliable or not.
defValid<-function(dbname, system, parameter1, sensor1=NULL, high=NULL, low=NULL, actual1=NULL){
#1. Connect to SQL database
con<-dbConnect(SQLite(), dbname=dbname)
#2. Send query and retrieve data from database
weather<-
dbSendQuery(con,
paste0(
"SELECT data.timestamp, data.node_id, data.subsystem,data.sensor, data.parameter, data.value_hrf, nodes.lat, nodes.lon
FROM data
JOIN nodes
ON data.node_id = nodes.node_id
WHERE data.subsystem
IN ('",system,"')"))%>%
dbFetch()%>%
mutate(timestamp2=ymd_hms(timestamp),
date=date(timestamp),
value_hrf=as.numeric(value_hrf))%>%
filter(parameter==parameter1)%>%
mutate(by10=cut(timestamp2, breaks='10 min'))%>%
mutate(time=ymd_hms(by10))
#3. Determine for each data point whether it is reliable or not,
##sensor reliability specification differs according to different data parameters
if(parameter1=='humidity'){
weather%>%
filter(sensor==sensor1)%>%
filter(parameter==parameter1)%>%
mutate(val_qual=ifelse(is.na(value_hrf), 0,
ifelse(value_hrf>100|value_hrf<0,0,1)))%>%
group_by(date,node_id)%>%
mutate(val_qual=ifelse(mean(val_qual)!=1, 0,1))->df
}else if(parameter1=='pressure'){
weather%>%
filter(sensor==sensor1)%>%
filter(parameter==parameter1)%>%
mutate(val_qual=ifelse(is.na(value_hrf), 0,
ifelse(value_hrf>1100|value_hrf<300,0,1)))%>%
group_by(date,node_id)%>%
mutate(val_qual=ifelse(mean(val_qual)!=1, 0,1))->df
}else if(parameter1=='pm2_5'){
weather%>%
filter(parameter==parameter1)%>%
mutate(val_qual=ifelse(is.na(value_hrf), 0,
ifelse(value_hrf<0,0,1)))->df
}else if(parameter1=='concentration'){
weather%>%
filter(parameter==parameter1)%>%
filter(sensor==sensor1)%>%
mutate(val_qual=ifelse(is.na(value_hrf), 0,
ifelse(value_hrf<low|value_hrf>high,0,1)))->df
}else if(parameter1=='temperature'){
actual<-read.csv(actual1)
actual$date<-ymd(actual$date)
weather%>%
filter(parameter==parameter1)%>%
left_join(actual, by='date')%>%
mutate(high_bound = high + 5,
low_bound = low - 5) %>%
mutate(val_qual = ifelse(value_hrf > high_bound | value_hrf < low_bound, 0, 1))->df
df%>%
filter(val_qual==1)%>%
group_by(by10, node_id)%>%
mutate(quant75= quantile(value_hrf, probs=0.75),
quant25= quantile(value_hrf, probs=0.25))%>%
mutate(val_qual= ifelse(value_hrf > quant75 | value_hrf < quant25, 0,1))->df1
df%>%
filter(val_qual==0)%>%
bind_rows(df1)->df
rm(df1)
}
return(df)
}
The function calls for the following inputs:
dbname: Name of the SQL database file containing a month’s worth of AoT datasystem: The sensor system within the node - see the AoT site for the specific system types.parameter1: The data type - see the AoT site for the specific parameter types.sensor1: The sensor type to specify for different gas concentration types - only required when parameter1 = concentration. See the AoT site for the specific sensor types.high: The highest value sensor1 can record - only required when parameter1 = concentrationlow: The lowest value sensor1 can record - only required when parameter1 = concentrationactual: The dataframe containing temperature ranges obtained from the National Weather Service - only required when temperature data is retrieved (parameter1 = temperature)The function implements the following procedure:
The function returns
timestamp: Date and time at which data observation is recorded.node_id: Unique ID number of the node at which data is being recordedsubsystem: Subsystem within which the node is locatedsensor: Sensor modelparameter: Data typevalue_hrf: Measurementnodes.lat: Latitude location of nodenodes.lon: Longitude location of nodetimestamp2: timestamp in datetime formatdate: Date extracted from timestamp2by10: Time extracted from timestamp2, rounded to the nearest 10 minute intervalval_qual: 1 if data record is reliable, 0 if notThis function has to be applied before each scoring process in the following sections. Click on the tabs below to view the specific inputs that will yield the relevant data parameter types. The data retrieved here will then be scored in the following sections.
temperature data and label reliabilitysystemTemp<-'metsense'
actualTemp<-'december_weather.csv'
parameterTemp<-'temperature'
dfTemp<-defValid(dbname, system=systemTemp, parameter1=parameterTemp, actual1=actualTemp)
#because we are only scoring for a day here, filter data for 2018-12-15
dfTemp%>%
filter(date=='2018-12-15')->dfTemp
dfTemp%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
| timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|
| 2018/12/15 00:00:01 | 001e0610bc12 | 41.75034 | -87.663518 | 2018-12-15 00:00:00 | 40.20 | 0 |
| 2018/12/15 00:00:01 | 001e06113f54 | 41.884607 | -87.624577 | 2018-12-15 00:00:00 | 35.60 | 0 |
| 2018/12/15 00:00:01 | 001e0611537d | 41.794167 | -87.601646 | 2018-12-15 00:00:00 | 29.70 | 0 |
| 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | 35.10 | 0 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 241.00 | 0 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 128.86 | 0 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | -254.00 | 0 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 214.75 | 0 |
| 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | 37.20 | 0 |
| 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | 241.00 | 0 |
humidity data and label reliabilitysystemHumidity<-'metsense'
parameterHumidity<-'humidity'
sensorHumidity<-'htu21d'
dfHumidity<-defValid(dbname, system=systemHumidity, parameter1=parameterHumidity, sensor1=sensorHumidity)
#because we are only scoring for a day here, filter data for 2018-12-15
dfHumidity%>%
filter(date=='2018-12-15')->dfHumidity
dfHumidity%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
## Adding missing grouping variables: `date`
| date | timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|---|
| 2018-12-15 | 2018/12/15 00:00:00 | 001e0610ee36 | 41.751295 | -87.605288 | 2018-12-15 00:00:00 | 77.50 | 1 |
| 2018-12-15 | 2018/12/15 00:00:00 | 001e0610ee43 | 41.788608 | -87.598713 | 2018-12-15 00:00:00 | 75.80 | 1 |
| 2018-12-15 | 2018/12/15 00:00:01 | 001e0610bc12 | 41.75034 | -87.663518 | 2018-12-15 00:00:00 | 80.76 | 1 |
| 2018-12-15 | 2018/12/15 00:00:01 | 001e06113f54 | 41.884607 | -87.624577 | 2018-12-15 00:00:00 | 81.80 | 1 |
| 2018-12-15 | 2018/12/15 00:00:01 | 001e0611537d | 41.794167 | -87.601646 | 2018-12-15 00:00:00 | 118.99 | 0 |
| 2018-12-15 | 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | 118.99 | 0 |
| 2018-12-15 | 2018/12/15 00:00:05 | 001e061130f4 | 41.896157 | -87.662391 | 2018-12-15 00:00:00 | 86.06 | 1 |
| 2018-12-15 | 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 118.99 | 0 |
| 2018-12-15 | 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | 86.75 | 1 |
| 2018-12-15 | 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | 118.99 | 0 |
pressure data and label reliabilitysystemPressure<-'metsense'
parameterPressure<-'pressure'
sensorPressure<-'bmp180'
dfPressure<-defValid(dbname, system=systemPressure, parameter1=parameterPressure, sensor1=sensorPressure)
#because we are only scoring for a day here, filter data for 2018-12-15
dfPressure %>%
filter(date=='2018-12-15')->dfPressure
dfPressure%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
## Adding missing grouping variables: `date`
| date | timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|---|
| 2018-12-15 | 2018/12/15 00:00:00 | 001e0610ee36 | 41.751295 | -87.605288 | 2018-12-15 00:00:00 | 1042.56 | 1 |
| 2018-12-15 | 2018/12/15 00:00:00 | 001e0610ee43 | 41.788608 | -87.598713 | 2018-12-15 00:00:00 | 1011.77 | 1 |
| 2018-12-15 | 2018/12/15 00:00:01 | 001e0610bc12 | 41.75034 | -87.663518 | 2018-12-15 00:00:00 | 1119.05 | 0 |
| 2018-12-15 | 2018/12/15 00:00:01 | 001e06113f54 | 41.884607 | -87.624577 | 2018-12-15 00:00:00 | 1017.66 | 1 |
| 2018-12-15 | 2018/12/15 00:00:01 | 001e0611537d | 41.794167 | -87.601646 | 2018-12-15 00:00:00 | 1016.86 | 1 |
| 2018-12-15 | 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | 1080.70 | 1 |
| 2018-12-15 | 2018/12/15 00:00:05 | 001e061130f4 | 41.896157 | -87.662391 | 2018-12-15 00:00:00 | 997.53 | 1 |
| 2018-12-15 | 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 2361.56 | 0 |
| 2018-12-15 | 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | 1036.71 | 1 |
| 2018-12-15 | 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | 2361.56 | 0 |
PM 2.5 Concentration data and label reliability.systemPM25<-'alphasense'
parameterPM25<-'pm2_5'
dfPM25<-defValid(dbname, systemPM25, parameterPM25)
#because we are only scoring for a day here, filter data for 2018-12-15
dfPM25%>%
filter(date=='2018-12-15')->dfPM25
dfPM25%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
| timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|
| 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:09 | 001e0610f05c | 41.924903 | -87.687703 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:11 | 001e06113107 | 41.751142 | -87.71299 | 2018-12-15 00:00:00 | 12.179 | 1 |
| 2018/12/15 00:00:13 | 001e06113dbc | 41.713867 | -87.536509 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:18 | 001e0610bc10 | 41.736314 | -87.624179 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:21 | 001e0610ba15 | 41.722457 | -87.57535 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:31 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:31 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:35 | 001e0610f05c | 41.924903 | -87.687703 | 2018-12-15 00:00:00 | NA | 0 |
CO concentration data and label reliability.systemCO<-'chemsense'
parameterCO<-'concentration'
sensorCO<-'co'
highCO<-1000
lowCO<-0
dfCO<-defValid(dbname, system=systemCO, parameter1=parameterCO, sensor1=sensorCO, high=highCO, low=lowCO)
#because we are only scoring for a day here, filter data for 2018-12-15
dfCO %>%
filter(date=='2018-12-15')->dfCO
dfCO%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
| timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|
| 2018/12/15 00:00:00 | 001e0610ee43 | 41.788608 | -87.598713 | 2018-12-15 00:00:00 | -0.10126 | 0 |
| 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | -0.52656 | 0 |
| 2018/12/15 00:00:05 | 001e0610ef27 | 41.846579 | -87.685557 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:05 | 001e061130f4 | 41.896157 | -87.662391 | 2018-12-15 00:00:00 | 0.27291 | 1 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 0.08977 | 1 |
| 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | 0.12475 | 1 |
| 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | -0.07333 | 0 |
| 2018/12/15 00:00:09 | 001e0610f05c | 41.924903 | -87.687703 | 2018-12-15 00:00:00 | -0.08132 | 0 |
| 2018/12/15 00:00:09 | 001e06114503 | 41.666078 | -87.539374 | 2018-12-15 00:00:00 | -0.11219 | 0 |
| 2018/12/15 00:00:11 | 001e06113107 | 41.751142 | -87.71299 | 2018-12-15 00:00:00 | 0.14908 | 1 |
H2S concentration data and label reliability.systemH2S<-'chemsense'
parameterH2S<-'concentration'
sensorH2S<-'h2s'
highH2S<-50
lowH2S<-0
dfH2S<-defValid(dbname, system=systemH2S, parameter1=parameterH2S, sensor1=sensorH2S, high=highH2S, low=lowH2S)
#because we are only scoring for a day here, filter data for 2018-12-15
dfH2S %>%
filter(date=='2018-12-15')->dfH2S
dfH2S%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
| timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|
| 2018/12/15 00:00:00 | 001e0610ee43 | 41.788608 | -87.598713 | 2018-12-15 00:00:00 | 0.00271 | 1 |
| 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | -0.13310 | 0 |
| 2018/12/15 00:00:05 | 001e0610ef27 | 41.846579 | -87.685557 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:05 | 001e061130f4 | 41.896157 | -87.662391 | 2018-12-15 00:00:00 | 0.46145 | 1 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 0.11730 | 1 |
| 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | -0.02195 | 0 |
| 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | -0.04071 | 0 |
| 2018/12/15 00:00:09 | 001e0610f05c | 41.924903 | -87.687703 | 2018-12-15 00:00:00 | 0.02893 | 1 |
| 2018/12/15 00:00:09 | 001e06114503 | 41.666078 | -87.539374 | 2018-12-15 00:00:00 | 0.16494 | 1 |
| 2018/12/15 00:00:11 | 001e06113107 | 41.751142 | -87.71299 | 2018-12-15 00:00:00 | -0.06809 | 0 |
NO2 concentration data and label reliability.systemNO2<-'chemsense'
parameterNO2<-'concentration'
sensorNO2<-'no2'
highNO2<-20
lowNO2<-0
dfNO2<-defValid(dbname, system=systemNO2, parameter1=parameterNO2, sensor1=sensorNO2, high=highNO2, low=lowNO2)
#because we are only scoring for a day here, filter data for 2018-12-15
dfNO2 %>%
filter(date=='2018-12-15')->dfNO2
dfNO2%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
| timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|
| 2018/12/15 00:00:00 | 001e0610ee43 | 41.788608 | -87.598713 | 2018-12-15 00:00:00 | 0.00470 | 1 |
| 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | 0.00000 | 1 |
| 2018/12/15 00:00:05 | 001e0610ef27 | 41.846579 | -87.685557 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:05 | 001e061130f4 | 41.896157 | -87.662391 | 2018-12-15 00:00:00 | 0.00000 | 1 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 0.00000 | 1 |
| 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | 0.01549 | 1 |
| 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | 0.02010 | 1 |
| 2018/12/15 00:00:09 | 001e0610f05c | 41.924903 | -87.687703 | 2018-12-15 00:00:00 | 0.07814 | 1 |
| 2018/12/15 00:00:09 | 001e06114503 | 41.666078 | -87.539374 | 2018-12-15 00:00:00 | 0.00000 | 1 |
| 2018/12/15 00:00:11 | 001e06113107 | 41.751142 | -87.71299 | 2018-12-15 00:00:00 | 0.00543 | 1 |
O3 concentration data and label reliability.systemO3<-'chemsense'
parameterO3<-'concentration'
sensorO3<-'o3'
highO3<-20
lowO3<-0
dfO3<-defValid(dbname, system=systemO3, parameter1=parameterO3, sensor1=sensorO3, high=highO3, low=lowO3)
#because we are only scoring for a day here, filter data for 2018-12-15
dfO3 %>%
filter(date=='2018-12-15')->dfO3
dfO3%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
| timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|
| 2018/12/15 00:00:00 | 001e0610ee43 | 41.788608 | -87.598713 | 2018-12-15 00:00:00 | 0.00000 | 1 |
| 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | 0.03330 | 1 |
| 2018/12/15 00:00:05 | 001e0610ef27 | 41.846579 | -87.685557 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:05 | 001e061130f4 | 41.896157 | -87.662391 | 2018-12-15 00:00:00 | 0.08645 | 1 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | 0.02858 | 1 |
| 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | 0.00000 | 1 |
| 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | 0.00000 | 1 |
| 2018/12/15 00:00:09 | 001e0610f05c | 41.924903 | -87.687703 | 2018-12-15 00:00:00 | 0.08059 | 1 |
| 2018/12/15 00:00:09 | 001e06114503 | 41.666078 | -87.539374 | 2018-12-15 00:00:00 | -0.01805 | 0 |
| 2018/12/15 00:00:11 | 001e06113107 | 41.751142 | -87.71299 | 2018-12-15 00:00:00 | 0.00000 | 1 |
SO2 concentration data and label reliability.systemSO2<-'chemsense'
parameterSO2<-'concentration'
sensorSO2<-'so2'
highSO2<-20
lowSO2<-0
dfSO2<-defValid(dbname, system=systemSO2, parameter1=parameterSO2, sensor1=sensorSO2, high=highSO2, low=lowSO2)
#because we are only scoring for a day here, filter data for 2018-12-15
dfSO2 %>%
filter(date=='2018-12-15')->dfSO2
dfSO2%>%
select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
head(10)%>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"))
| timestamp | node_id | lat | lon | by10 | value_hrf | val_qual |
|---|---|---|---|---|---|---|
| 2018/12/15 00:00:00 | 001e0610ee43 | 41.788608 | -87.598713 | 2018-12-15 00:00:00 | -0.08692 | 0 |
| 2018/12/15 00:00:04 | 001e061144c0 | 41.764122 | -87.72242 | 2018-12-15 00:00:00 | 1.06933 | 1 |
| 2018/12/15 00:00:05 | 001e0610ef27 | 41.846579 | -87.685557 | 2018-12-15 00:00:00 | NA | 0 |
| 2018/12/15 00:00:05 | 001e061130f4 | 41.896157 | -87.662391 | 2018-12-15 00:00:00 | -1.40657 | 0 |
| 2018/12/15 00:00:06 | 001e06114fd4 | 41.794477 | -87.615957 | 2018-12-15 00:00:00 | -0.60472 | 0 |
| 2018/12/15 00:00:07 | 001e06113cf1 | 41.884688 | -87.627864 | 2018-12-15 00:00:00 | 0.05132 | 1 |
| 2018/12/15 00:00:08 | 001e061146bc | 41.918733 | -87.668257 | 2018-12-15 00:00:00 | 0.11572 | 1 |
| 2018/12/15 00:00:09 | 001e0610f05c | 41.924903 | -87.687703 | 2018-12-15 00:00:00 | -0.05358 | 0 |
| 2018/12/15 00:00:09 | 001e06114503 | 41.666078 | -87.539374 | 2018-12-15 00:00:00 | -0.20640 | 0 |
| 2018/12/15 00:00:11 | 001e06113107 | 41.751142 | -87.71299 | 2018-12-15 00:00:00 | 0.39536 | 1 |
In this section, the method of scoring Sensor Value Reliability is presented for each data parameter type for the day of 2012-12-15. There are 2 scores obtained for this criteria. In summary, this section scores sensor value reliability for the entire network by analysing the amount of reliable data measured by the temperature sensors in each node at each 10-minute time interval during the day. This amount is compared in terms of proportion to account for the different total amounts of data measured by the sensors in different nodes and/or at different times of the day. The node sensor value reliablity of each node is obtained by taking the mean of these proportions across the time-intervals during the day. The network sensor value reliability (Score 1) is finally obtained by taking the mean of all the nodes’ average sensor value reliability. To also observe whether this reliability in sensor values is consistent throughout the day for each node, standard deviation metrics are used to score node sensor value reliability consistency. The overall consistency in sensor value reliability (Score 2) for the network is then obtained as the average mean of these nodes’ consistency scores.
The flowchart below illustrates the scoring process in this section:
Click on the tabs below to view in detail how the scores were constructed for each data parameter.
We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting temperature data in the network. It can be observed that most nodes collect more than 100 data measurements for every 10-minute time interval.
dfTemp%>%
ggplot()+
geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
facet_wrap(~node_id, ncol=5)+
labs(y='Number collected', x='Time',
title='Number of Reliable and Unreliable Temperature Data Collected on 2018-12-15 For Each Node\n- By Time')+
scale_fill_manual(values=c('indianred1', 'cornflowerblue'),
labels=c('Unreliable',
'Reliable'),
name="")+
scale_x_discrete(breaks=c('2018-12-15 00:00:00',
'2018-12-15 04:00:00',
'2018-12-15 08:00:00',
'2018-12-15 12:00:00',
'2018-12-15 16:00:00',
'2018-12-15 20:00:00'),
labels=c('00:00',
'04:00',
'08:00',
'12:00',
'16:00',
'20:00'
))+
plotTheme()+
theme(plot.title=element_text(face='bold', size=20),
text=element_text(size=20),
legend.position = 'bottom',
axis.text.x=element_text(angle=90, hjust=1))
Calculate Proportion of Reliable Data Measurements
As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.
From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.
dfTemp%>%
group_by(node_id, by10)%>%
mutate(X = 100*sum(val_qual)/n())%>%
select(node_id, by10, lat, lon, X)%>%
unique()%>%
as.data.frame()%>%
group_by(node_id)%>%
mutate(NodeMeanX = sum(X)/144)->dfTemp1
The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.
dfTemp1%>%
arrange(node_id)%>%
head(144)%>%
kable()%>%
kable_styling(bootstrap_options = c('striped', 'hover'))%>%
scroll_box(height = "300px")
| node_id | by10 | lat | lon | X | NodeMeanX |
|---|---|---|---|---|---|
| 001e0610ba13 | 2018-12-15 00:00:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 00:10:00 | 41.751238 | -87.712990 | 44.34783 | 41.35618 |
| 001e0610ba13 | 2018-12-15 00:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 00:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 00:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 00:50:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 01:00:00 | 41.751238 | -87.712990 | 44.16667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 01:10:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 01:20:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 01:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 01:40:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 01:50:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 02:00:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 02:10:00 | 41.751238 | -87.712990 | 43.47826 | 41.35618 |
| 001e0610ba13 | 2018-12-15 02:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 02:30:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 02:40:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 02:50:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 03:00:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 03:10:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 03:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 03:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 03:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 03:50:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 04:00:00 | 41.751238 | -87.712990 | 44.16667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 04:10:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 04:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 04:30:00 | 41.751238 | -87.712990 | 42.60870 | 41.35618 |
| 001e0610ba13 | 2018-12-15 04:40:00 | 41.751238 | -87.712990 | 43.33333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 04:50:00 | 41.751238 | -87.712990 | 43.33333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 05:00:00 | 41.751238 | -87.712990 | 43.33333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 05:10:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 05:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 05:30:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 05:40:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 05:50:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 06:00:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 06:10:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 06:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 06:30:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 06:40:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 06:50:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 07:00:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 07:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 07:20:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 07:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 07:40:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 07:50:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 08:00:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 08:10:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 08:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 08:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 08:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 08:50:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 09:00:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 09:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 09:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 09:30:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 09:40:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 09:50:00 | 41.751238 | -87.712990 | 41.73913 | 41.35618 |
| 001e0610ba13 | 2018-12-15 10:00:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 10:10:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 10:20:00 | 41.751238 | -87.712990 | 45.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 10:30:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 10:40:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 10:50:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 11:00:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 11:10:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 11:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 11:30:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 11:40:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 11:50:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 12:00:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 12:10:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 12:20:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 12:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 12:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 12:50:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 13:00:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 13:10:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 13:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 13:30:00 | 41.751238 | -87.712990 | 43.47826 | 41.35618 |
| 001e0610ba13 | 2018-12-15 13:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 13:50:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 14:00:00 | 41.751238 | -87.712990 | 44.16667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 14:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 14:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 14:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 14:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 14:50:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 15:00:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 15:10:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 15:20:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 15:30:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 15:40:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 15:50:00 | 41.751238 | -87.712990 | 41.73913 | 41.35618 |
| 001e0610ba13 | 2018-12-15 16:00:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 16:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 16:20:00 | 41.751238 | -87.712990 | 43.33333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 16:30:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 16:40:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 16:50:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 17:00:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 17:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 17:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 17:30:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 17:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 17:50:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 18:00:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 18:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 18:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 18:30:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 18:40:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 18:50:00 | 41.751238 | -87.712990 | 40.86957 | 41.35618 |
| 001e0610ba13 | 2018-12-15 19:00:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 19:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 19:20:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 19:30:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 19:40:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 19:50:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 20:00:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 20:10:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 20:20:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 20:30:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 20:40:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 20:50:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 21:00:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 21:10:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 21:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 21:30:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 21:40:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 21:50:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 22:00:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 22:10:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 22:20:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 22:30:00 | 41.751238 | -87.712990 | 42.50000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 22:40:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 22:50:00 | 41.751238 | -87.712990 | 40.83333 | 41.35618 |
| 001e0610ba13 | 2018-12-15 23:00:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 23:10:00 | 41.751238 | -87.712990 | 45.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 23:20:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 23:30:00 | 41.751238 | -87.712990 | 41.66667 | 41.35618 |
| 001e0610ba13 | 2018-12-15 23:40:00 | 41.751238 | -87.712990 | 40.00000 | 41.35618 |
| 001e0610ba13 | 2018-12-15 23:50:00 | 41.751238 | -87.712990 | 48.69565 | 41.35618 |
The plot below presents how X varies around each node’s Node Sensor Value Reliability.
ggplot()+
geom_line(data=dfTemp1,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
geom_line(data=dfTemp1,aes(x=by10, y=X, group=1, col="Proportion of Reliable Data"), size=1, alpha=0.5)+
scale_x_discrete(breaks=c('2018-12-15 00:00:00',
'2018-12-15 04:00:00',
'2018-12-15 08:00:00',
'2018-12-15 12:00:00',
'2018-12-15 16:00:00',
'2018-12-15 20:00:00'),
labels=c('00:00',
'04:00',
'08:00',
'12:00',
'16:00',
'20:00'
))+
scale_color_manual('',
values=c("Proportion of Reliable Data"='indianred'))+
facet_wrap(~node_id, ncol=5)+
labs(y='Proportion Reliable', x='Time',
title='Proportion of Reliable Temperature Data Collected on 2018-12-15 For Each Node - By Time',
subtitle='Mean proportion for each node denoted by dashed line.')+
plotTheme()+
theme(plot.title=element_text(face='bold', size=20),
text=element_text(size=20),
legend.position = 'bottom',
axis.text.x=element_text(angle=90, hjust=1))
The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 11.3% to 65.0% reliable.
dfTemp1%>%
select(-X, -by10)%>%
unique()%>%
as.data.frame()%>%
arrange(desc(NodeMeanX))%>%
kable()%>%
kable_styling(bootstrap_options = c('striped', 'hover'))%>%
scroll_box(height = "300px")
| node_id | lat | lon | NodeMeanX |
|---|---|---|---|
| 001e0610ef27 | 41.846579 | -87.685557 | 64.99769 |
| 001e061135cb | 41.779369 | -87.664421 | 64.12399 |
| 001e0610ee33 | 41.965089 | -87.679076 | 52.79469 |
| 001e0610e532 | 41.857959 | -87.656427 | 52.28412 |
| 001e0610f05c | 41.924903 | -87.687703 | 52.00181 |
| 001e0610e537 | 41.961622 | -87.665948 | 51.58313 |
| 001e061130f4 | 41.896157 | -87.662391 | 51.45028 |
| 001e0610f732 | 41.895005 | -87.745817 | 51.39065 |
| 001e0610eef4 | 41.912681 | -87.681052 | 51.17562 |
| 001e0610ee43 | 41.788608 | -87.598713 | 51.14156 |
| 001e0610ba46 | 41.878377 | -87.627678 | 51.13476 |
| 001e0610ee36 | 41.751295 | -87.605288 | 51.08645 |
| 001e0610ee5d | 41.923996 | -87.761072 | 51.03694 |
| 001e0610bbf9 | 41.768319 | -87.683396 | 50.87284 |
| 001e0610bc10 | 41.736314 | -87.624179 | 50.67658 |
| 001e0610f6db | 41.791329 | -87.598677 | 48.42593 |
| 001e06113dbc | 41.713867 | -87.536509 | 42.10656 |
| 001e06113f54 | 41.884607 | -87.624577 | 41.87248 |
| 001e0610bc12 | 41.75034 | -87.663518 | 41.78039 |
| 001e06113a48 | 41.943263 | -87.688069 | 41.65358 |
| 001e0610ba13 | 41.751238 | -87.712990 | 41.35618 |
| 001e0610ba15 | 41.722457 | -87.57535 | 41.22434 |
| 001e06113cf1 | 41.884688 | -87.627864 | 40.98682 |
| 001e0611537d | 41.794167 | -87.601646 | 40.77823 |
| 001e06113107 | 41.751142 | -87.71299 | 38.42869 |
| 001e061144c0 | 41.764122 | -87.72242 | 34.21231 |
| 001e0610e538 | 41.736593 | -87.604759 | 32.02960 |
| 001e0610fb4c | 41.913583 | -87.682414 | 30.72025 |
| 001e06114503 | 41.666078 | -87.539374 | 30.06114 |
| 001e06113ace | 41.83107 | -87.617298 | 13.88310 |
| 001e0610f703 | 41.87148 | -87.67644 | 13.38366 |
| 001e06114500 | 41.714494 | -87.643099 | 12.53336 |
| 001e0611462f | 41.823527 | -87.641054 | 12.48516 |
| 001e06113d22 | 41.800846 | -87.703739 | 12.46162 |
| 001e0610f8f4 | 41.832579 | -87.646133 | 12.09063 |
| 001e0611536c | 41.88575 | -87.62969 | 12.05440 |
| 001e06114fd4 | 41.794477 | -87.615957 | 12.03402 |
| 001e061146bc | 41.918733 | -87.668257 | 11.86846 |
| 001e0610eef2 | 41.965256 | -87.66672 | 11.84833 |
| 001e061146ba | 41.96759 | -87.76257 | 11.50035 |
| 001e0610e835 | 41.968757 | -87.679174 | 11.29906 |
The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map. There, it can be generally observed that nodes of similar average sensor value reliability levels tend to be located close to one another. The nodes with the lowest sensor value reliability levels tend to be located mostly around the city centre, with the rest located in isolation at the city periphery.
chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfTemp1$NodeMeanX, n = 5)
dfTemp1$lat<-as.numeric(dfTemp1$lat)
dfTemp1$lon<-as.numeric(dfTemp1$lon)
leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
addCircleMarkers(data=dfTemp1,
lng = ~lon, lat = ~lat, weight = 2,
radius = 7, opacity = 0.2,
fillColor= ~pal(NodeMeanX),fillOpacity = 0.5,
popup = paste("Node:", dfTemp1$node_id, "<br>",
"Mean Proportion of Reliable Data Collected:", round(dfTemp1$NodeMeanX), "%", "<br>"))%>%
addLegend(pal = pal, values = dfTemp1$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")