Return to MUSA 801 Projects Page

The following project was created in association with the MUSA 801 Practicum at the University of Pennsylvania, taught by Ken Steif, Michael Fichman, and Matt Harris. We would like to thank Charlie Catlett of the Argonne National Laboratory for providing feedback to help us create a meaningful application. All products from this class should be considered proofs of concept and works in progress.

This document is split into two parts. The first part addresses the policy implications of our project, as well as the concepts of data reliability that underpinned our methodology. The second part presents our methodology together with the codeblocks necessary for its replication. The policy implications and concepts are explicated in the sections Introduction and Defining Data Reliability. Our methods can then be replicated by following the sections Scoring Data Reliability and Scoring Data Reliability After Imputation.

1. Introduction

A Senseable Smart City

“…sound and touch and taste all have a place in the tools which…will define digital planning in the near future.” - Michael Batty

In a foreword for Robert Laurini’s Information Systems for Urban Planning: A Hypermedia Co-operative Approach, Michael Batty, Professor of Spatial Analysis and Planning at the University College of London, identifies the role that our senses play in the future of urban planning. As urban citizens rely on their senses to interpret and experience their urban environment through sights, sounds, smells, and touch, a planner needs to obtain a good understanding of the same sensorial stimuli that shape urban experience and quality of life in order to make meaningful improvements to it.

This is why the rising ubiquity and decreasing cost of sensing tools has the potential to change the way planners plan, structure, and manage the city. Sensor devices are often designed for tasks that either emulate or extend beyond the human senses. More importantly, they collect valuable data that help us approximate the human sensory experience in an urban environment, generating large volumes of feedback at spatial and temporal scales that can be further analysed for detailed insights. As cities around the world strive to be ‘smart’ in the ways they enhance quality of life for its citizens, sensor networks are also increasingly deployed to collect data that can be used to understand and manage the urban experience.Here, new technology plays a role in transforming efforts for sustainable urban growth and smart city planning.

The Array of Things (AoT)

As an urban-scale sensing network that collects real-time environmental data in cities, the AoT initiative exemplifies this trend. This is an initiative led by Charlie Catlett and the researchers from the Urban Center for Computation and Data, a joint initiative of the Argonne National Laboratory and the University of Chicago. Launched in 2016 and currently implemented in Chicago, the data collected through this initiative is open and free to the public.

The AoT could be the first sensing project of this geographic scale and level of temporal and data type specifity. As presented in the figure below, the AoT network comprises nodes, which are sensor boxes containing up to 15 sensors measuring different sensory data types, or parameters. These parameters include temperature, humidity, pressure, PM 2.5 concentration, and concentrations of other hazardous gas types such as carbon monoxide (CO), nitrous dioxide (NO2) and sulphur dioxide (SO2). As of this writing, there are 86 nodes citywide, as seen in the figure below. When fully implemented, the AoT network will consist of 500 nodes across Chicago.

This opens up a whole array of possible angles and ways that individuals, organisations, researchers, engineers and scientists can study urban environment and living. This is the main objective of the AoT initiative. Particularly, the data presents valuable insights for urban policy planners and researchers interested in devising urban policies that are sensitive to the unavoidable human-environment dynamics that shape urban behaviour and livability.

Importance of sensor data reliability

Extracting such valuable insights from sensor data requires the raw data to be processed and analysed. Here, the extent of data processing and quality of data analysis critically depends on the reliability of the data itself.

Therefore, our project seeks to evaluate the level of reliability of the data collected by the AoT network. In the following Section 2, we first define criteria for what it means for AoT data to be reliable. In Section 3 and Section 4, based on these criteria, we then provide a method to numerically score daily network data reliability for the network in terms of different data parameters.

Based on this score metric, we hope that planners could easily identify segments of the big AoT dataset for their relevant analysis. All in all, we also hope the transparent evaluation factors and scores behind this data reliability analysis could promote a more informed use of data in the increasingly data-driven planning process. We see this application to be additionally useful in improving research efficiency, considering the increasingly large stores of sensor data available - planners will be able to scope the spatial and temporal scale of their research according to where and when reliable segments of the data is available, instead of having to explore many different datasets to finalise a suitable scope of analysis.

1.2 Setup

Below we set the working directory, and load the libraries needed for the analysis as well as a plotTheme. We also set the memory limit to a high value, given the size of data we are processing here.

library(dplyr)
library(DBI)
library(RSQLite)
library(dbplyr)
library(lubridate)
library(leaflet)
library(tmap)
library(ggplot2)
library(sf)
library(plotly)
library(sp)
library(spatstat)
library(rgeos)
library(rgdal)
library(tidyr)
library(gridExtra)
library(stringr)
library(tidyverse)
library(caret)
library(sf)
library(FNN)
library(spdep)
library(knitr)
library(kableExtra)
library(htmlwidgets)
library(htmltools)
library(tmap)
library(openair)

setwd("~/Capstone/Exploratory")
memory.limit(100000000000)
## [1] 1e+11
plotTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks.x = element_blank(),
    axis.ticks.y = element_line( size=.1, color="#ababab" ),
    panel.grid.major.y = element_line( size=.1, color="#ababab" ),
    panel.grid.major.x = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank()
  )
}

2. Defining Data Reliability

In order to use the data, planners need to know if the data is reliable. Here, we identify 4 ideal criteria for AoT data to be considered reliable and useful:

  1. Sensor Measure Reliability: All the data collected should be within sensing range, as determined by the sensor specifications.

  2. Spatial Reliability: Reliable data should be collected for the whole spatial extent of the study area, which is Chicago in this case.

  3. Temporal Reliability: Reliable data should be collected at consistent time intervals throughout the day.

  4. Imputability: Missing and unreliable data records should be easily substitutable by the next nearest record in time and space. This substitution, or imputation, should improve sensor measure, spatial, and temporal reliability.

Each criteria is demonstrated below, together with the method through which the criteria is scored. Such numerical scores are metrics that facilitate easy comparison between different datasets.

Each of these 4 criteria will be conceptualised in detail in the following sub-sections below.

2.2 Sensor Value Reliability

This is the first and most important criteria defining data reliability in our method.

According to the AoT metadata site, each sensor has a specific detection range. This means that a well-functioning sensor should record values within this range. Otherwise, values recorded outside this range indicate that the sensor is faulty - and these values are unreliable and unuseable.

The table below presents the specific detection ranges of different sensors recording different data parameters.

Data Type Parameter Minimum senseable value Maximum senseable value Unit
Weather Temperature -55 125 deg Celsius
Weather Relative Humidity 0 100 Percent
Weather Pressure 300 1100 Pascal
Air Quality PM2.5 concentration 0 - PPM
Air Quality CO concentration 0 1000 PPM
Air Quality H2S concentration 0 50 PPM
Air Quality NO2 concentration 0 20 PPM
Air Quality O3 concentration 0 20 PPM
Air Quality SO2 concentration 0 20 PPM

It should also be noted that temperature values recorded by the sensors should also fall within logical seasonal ranges. Intuitively, we know that temperature cannot fluctuate between -55 deg Celsius and 125 deg Celsius in a day. Also, we expect temperature values collected during winter in Chicago to be around or below the freezing point of 0 deg Celsius, while summers should record higher temperature values. Therefore, to determine sensor value reliability for temperature values, we also reference daily temperature ranges in Chicago published by the National Weather Service.

Based on these, we can label each sensor value as reliable or not. If the value falls within the sensor specification range (and daily range for temperature values), reliability == 1, and if not, reliability == 0.

To evaluate the network sensor value reliability, we are interested to know on average the proportion of sensor values measured in each node that are reliable. The more sensor values measured that are reliable, the more reliable the overall network is in terms of sensor value reliability.

Based on the this criterion of Sensor Value Reliability, we can define active nodes and inactive nodes in the network.

  • Active nodes record at least one reliable value that falls within the sensor specification range during the course of a day.

  • Inactive nodes record not even one reliable values that falls within the sensor specification range during the course of a day. These includes nodes containing sensors that do not record any value at all.

This helps us define two other relevant reliability criteria, Spatial Reliability and Temporal Reliability, that will be elaborated on in Section 2.3 and Section 2.4 respectively.

2.3 Spatial Reliability

To evaluate the spatial reliability of the network, we are interested to know whether active nodes are distributed across Chicago. Here, the area covered by active nodes are considered to have reliable data collected for it. A network that is fully spatially reliable is one that has its nodes distributed across the whole of Chicago, such that the total spatial extent of the nodes span the area of the city. A network that is not fully spatially reliable is one which total spatial extent span only part of the city. The figure below illustrates this point:

Therefore, the average proportion of Chicago’s area covered by the network extent at any one time serves as a metric for spatial reliability here.

2.4 Temporal Reliability

To evaluate the temporal reliability of the network, we are interested to know the average proportion of the day-duration that a node within the network is active for i.e. collecting reliable data. A network that is fully temporally reliable is one that has all its nodes collecting reliable data consistently across all time intervals during a day. A network that is not fully temporally reliable is one that has at least one of its nodes not collecting reliable data at some point during the day. The figure below illustrates this type of network - while some of its nodes are consistently collecting reliable data throughout the day, others have periods during which no reliable data is collected at all:

Therefore, the average proportion of the day-duration during which reliable data is being collected serves as a metric for temporal reliability here.

2.5 Imputability

Imputability refers to the possibility and effectiveness of replacing missing or as-if-missing values with observed ones. Here, unreliable data is considered as-if-missing.

The figure below presents our imputation method to replace missing and unreliable data in the dataset we retrieve from the AoT database.

To evaluate imputability, we first apply the procedure above to our original dataset to obtain one that is imputed for. We then apply the score metrics for the other 3 reliability criteria on this new imputed dataset, and compare the scores. Ideally, the scores for the second dataset should be higher - this will suggest the effectiveness of imputing for unreliable and missing data in the retrieved dataset. If the scores are not higher, this indicates that that imputation cannot be used to ‘salvage’ the original dataset that is consisted of too many unreliable data points. In this case, planners might be advised to not use the dataset for that day and data type at all.

3. Scoring Data Reliability

In this section, we will present the method of scoring Data Reliability based on the criteria of Sensor Value Reliability, Spatial Reliability and Temporal Reliability. This will be demonstrated using data collected on 2018-12-15 for the different data types listed for weather and air quality in their respective sections.

The flow chart below illustrates the common workflow we adopt for scoring data reliability for each day:

3.2 Pre-scoring Data Retrieval and Processing

To manage the large AoT dataset, we first download the dataset for December 2018 from the AoT data site and import it into a database using SQL Server Management Studio (instructions on this can be found here. We then save this database as Chicago2018-12.db. It is this database that we will connect to using the dbConnect function available from the DBI R package.

dbname<-'Chicago2018-12.db'

To faciliate the workflow, we provide a function below for users to retrieve AoT data from the SQL database and then determine if each data point is reliable or not.

defValid<-function(dbname, system, parameter1, sensor1=NULL, high=NULL, low=NULL, actual1=NULL){
  
  #1. Connect to SQL database
  
  con<-dbConnect(SQLite(), dbname=dbname)
  
  #2. Send query and retrieve data from database
  weather<-
    dbSendQuery(con,
                paste0(
                  "SELECT data.timestamp, data.node_id, data.subsystem,data.sensor, data.parameter, data.value_hrf, nodes.lat, nodes.lon
                  FROM data
                  JOIN nodes
                  ON data.node_id = nodes.node_id
                  WHERE data.subsystem 
                  IN ('",system,"')"))%>%
    dbFetch()%>%
    mutate(timestamp2=ymd_hms(timestamp),
           date=date(timestamp),
           value_hrf=as.numeric(value_hrf))%>%
    filter(parameter==parameter1)%>%
    mutate(by10=cut(timestamp2, breaks='10 min'))%>%
    mutate(time=ymd_hms(by10))
  
  #3. Determine for each data point whether it is reliable or not, 
  ##sensor reliability specification differs according to different data parameters
  
  if(parameter1=='humidity'){
    weather%>%
      filter(sensor==sensor1)%>%
      filter(parameter==parameter1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf>100|value_hrf<0,0,1)))%>%
      group_by(date,node_id)%>%
      mutate(val_qual=ifelse(mean(val_qual)!=1, 0,1))->df
  }else if(parameter1=='pressure'){
    weather%>%
      filter(sensor==sensor1)%>%
      filter(parameter==parameter1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf>1100|value_hrf<300,0,1)))%>%
      group_by(date,node_id)%>%
      mutate(val_qual=ifelse(mean(val_qual)!=1, 0,1))->df
  }else if(parameter1=='pm2_5'){
    weather%>%
      filter(parameter==parameter1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf<0,0,1)))->df
  }else if(parameter1=='concentration'){
    weather%>%
      filter(parameter==parameter1)%>%
      filter(sensor==sensor1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf<low|value_hrf>high,0,1)))->df
  }else if(parameter1=='temperature'){
    actual<-read.csv(actual1)
    actual$date<-ymd(actual$date)
    
    weather%>%
      filter(parameter==parameter1)%>%
      left_join(actual, by='date')%>%
      mutate(high_bound = high + 5,
             low_bound = low - 5) %>%
      mutate(val_qual = ifelse(value_hrf > high_bound | value_hrf < low_bound, 0, 1))->df
    
    df%>%
      filter(val_qual==1)%>%
      group_by(by10, node_id)%>%
      mutate(quant75= quantile(value_hrf, probs=0.75),
             quant25= quantile(value_hrf, probs=0.25))%>%
      mutate(val_qual= ifelse(value_hrf > quant75 | value_hrf < quant25, 0,1))->df1
    
    df%>%
      filter(val_qual==0)%>%
      bind_rows(df1)->df
    
    rm(df1)
    
  }
  
  return(df)
}

The function calls for the following inputs:

  • dbname: Name of the SQL database file containing a month’s worth of AoT data
  • system: The sensor system within the node - see the AoT site for the specific system types.
  • parameter1: The data type - see the AoT site for the specific parameter types.
  • sensor1: The sensor type to specify for different gas concentration types - only required when parameter1 = concentration. See the AoT site for the specific sensor types.
  • high: The highest value sensor1 can record - only required when parameter1 = concentration
  • low: The lowest value sensor1 can record - only required when parameter1 = concentration
  • actual: The dataframe containing temperature ranges obtained from the National Weather Service - only required when temperature data is retrieved (parameter1 = temperature)

The function implements the following procedure:

  1. Connect to SQL database
  2. Send query and retrieve data from database
  3. Determine for each data point whether it is reliable or not

The function returns

  • timestamp: Date and time at which data observation is recorded.
  • node_id: Unique ID number of the node at which data is being recorded
  • subsystem: Subsystem within which the node is located
  • sensor: Sensor model
  • parameter: Data type
  • value_hrf: Measurement
  • nodes.lat: Latitude location of node
  • nodes.lon: Longitude location of node
  • timestamp2: timestamp in datetime format
  • date: Date extracted from timestamp2
  • by10: Time extracted from timestamp2, rounded to the nearest 10 minute interval
  • val_qual: 1 if data record is reliable, 0 if not

This function has to be applied before each scoring process in the following sections. Click on the tabs below to view the specific inputs that will yield the relevant data parameter types. The data retrieved here will then be scored in the following sections.

Temperature

  1. Apply function to retrieve temperature data and label reliability
systemTemp<-'metsense'
actualTemp<-'december_weather.csv'
parameterTemp<-'temperature'
dfTemp<-defValid(dbname, system=systemTemp, parameter1=parameterTemp, actual1=actualTemp)

#because we are only scoring for a day here, filter data for 2018-12-15
dfTemp%>%
  filter(date=='2018-12-15')->dfTemp
  1. Observe first 10 rows
dfTemp%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:01 001e0610bc12 41.75034 -87.663518 2018-12-15 00:00:00 40.20 0
2018/12/15 00:00:01 001e06113f54 41.884607 -87.624577 2018-12-15 00:00:00 35.60 0
2018/12/15 00:00:01 001e0611537d 41.794167 -87.601646 2018-12-15 00:00:00 29.70 0
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 35.10 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 241.00 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 128.86 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 -254.00 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 214.75 0
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 37.20 0
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 241.00 0

Humidity

  1. Apply function to retrieve humidity data and label reliability
systemHumidity<-'metsense'
parameterHumidity<-'humidity'
sensorHumidity<-'htu21d'
dfHumidity<-defValid(dbname, system=systemHumidity, parameter1=parameterHumidity, sensor1=sensorHumidity)

#because we are only scoring for a day here, filter data for 2018-12-15
dfHumidity%>%
  filter(date=='2018-12-15')->dfHumidity
  1. Observe first 10 rows
dfHumidity%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
## Adding missing grouping variables: `date`
date timestamp node_id lat lon by10 value_hrf val_qual
2018-12-15 2018/12/15 00:00:00 001e0610ee36 41.751295 -87.605288 2018-12-15 00:00:00 77.50 1
2018-12-15 2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 75.80 1
2018-12-15 2018/12/15 00:00:01 001e0610bc12 41.75034 -87.663518 2018-12-15 00:00:00 80.76 1
2018-12-15 2018/12/15 00:00:01 001e06113f54 41.884607 -87.624577 2018-12-15 00:00:00 81.80 1
2018-12-15 2018/12/15 00:00:01 001e0611537d 41.794167 -87.601646 2018-12-15 00:00:00 118.99 0
2018-12-15 2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 118.99 0
2018-12-15 2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 86.06 1
2018-12-15 2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 118.99 0
2018-12-15 2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 86.75 1
2018-12-15 2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 118.99 0

Pressure

  1. Apply function to retrieve pressure data and label reliability
systemPressure<-'metsense'
parameterPressure<-'pressure'
sensorPressure<-'bmp180'
dfPressure<-defValid(dbname, system=systemPressure, parameter1=parameterPressure, sensor1=sensorPressure)

#because we are only scoring for a day here, filter data for 2018-12-15
dfPressure %>%
  filter(date=='2018-12-15')->dfPressure
  1. Observe first 10 rows
dfPressure%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
## Adding missing grouping variables: `date`
date timestamp node_id lat lon by10 value_hrf val_qual
2018-12-15 2018/12/15 00:00:00 001e0610ee36 41.751295 -87.605288 2018-12-15 00:00:00 1042.56 1
2018-12-15 2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 1011.77 1
2018-12-15 2018/12/15 00:00:01 001e0610bc12 41.75034 -87.663518 2018-12-15 00:00:00 1119.05 0
2018-12-15 2018/12/15 00:00:01 001e06113f54 41.884607 -87.624577 2018-12-15 00:00:00 1017.66 1
2018-12-15 2018/12/15 00:00:01 001e0611537d 41.794167 -87.601646 2018-12-15 00:00:00 1016.86 1
2018-12-15 2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 1080.70 1
2018-12-15 2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 997.53 1
2018-12-15 2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 2361.56 0
2018-12-15 2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 1036.71 1
2018-12-15 2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 2361.56 0

PM2.5 Concentration

  1. Apply function to retrieve PM 2.5 Concentration data and label reliability.
systemPM25<-'alphasense'
parameterPM25<-'pm2_5'

dfPM25<-defValid(dbname, systemPM25, parameterPM25)

#because we are only scoring for a day here, filter data for 2018-12-15
dfPM25%>%
  filter(date=='2018-12-15')->dfPM25
  1. Observe first 10 rows
dfPM25%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 12.179 1
2018/12/15 00:00:13 001e06113dbc 41.713867 -87.536509 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:18 001e0610bc10 41.736314 -87.624179 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:21 001e0610ba15 41.722457 -87.57535 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:31 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:31 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:35 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 NA 0

CO Concentration

  1. Apply function to retrieve CO concentration data and label reliability.
systemCO<-'chemsense'
parameterCO<-'concentration'

sensorCO<-'co'
highCO<-1000
lowCO<-0
dfCO<-defValid(dbname, system=systemCO, parameter1=parameterCO, sensor1=sensorCO, high=highCO, low=lowCO)

#because we are only scoring for a day here, filter data for 2018-12-15
dfCO %>% 
  filter(date=='2018-12-15')->dfCO
  1. Observe first 10 rows
dfCO%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 -0.10126 0
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 -0.52656 0
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.27291 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.08977 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.12475 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 -0.07333 0
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 -0.08132 0
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 -0.11219 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.14908 1

H2S Concentration

  1. Apply function to retrieve H2S concentration data and label reliability.
systemH2S<-'chemsense'
parameterH2S<-'concentration'

sensorH2S<-'h2s'
highH2S<-50
lowH2S<-0
dfH2S<-defValid(dbname, system=systemH2S, parameter1=parameterH2S, sensor1=sensorH2S, high=highH2S, low=lowH2S)

#because we are only scoring for a day here, filter data for 2018-12-15
dfH2S %>%
  filter(date=='2018-12-15')->dfH2S
  1. Observe first 10 rows
dfH2S%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 0.00271 1
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 -0.13310 0
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.46145 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.11730 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 -0.02195 0
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 -0.04071 0
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 0.02893 1
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 0.16494 1
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 -0.06809 0

NO2 Concentration

  1. Apply function to retrieve NO2 concentration data and label reliability.
systemNO2<-'chemsense'
parameterNO2<-'concentration'

sensorNO2<-'no2'
highNO2<-20
lowNO2<-0
dfNO2<-defValid(dbname, system=systemNO2, parameter1=parameterNO2, sensor1=sensorNO2, high=highNO2, low=lowNO2)

#because we are only scoring for a day here, filter data for 2018-12-15
dfNO2 %>%
  filter(date=='2018-12-15')->dfNO2
  1. Observe first 10 rows
dfNO2%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 0.00470 1
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.01549 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 0.02010 1
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 0.07814 1
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.00543 1

O3 Concentration

  1. Apply function to retrieve O3 concentration data and label reliability.
systemO3<-'chemsense'
parameterO3<-'concentration'

sensorO3<-'o3'
highO3<-20
lowO3<-0
dfO3<-defValid(dbname, system=systemO3, parameter1=parameterO3, sensor1=sensorO3, high=highO3, low=lowO3)

#because we are only scoring for a day here, filter data for 2018-12-15
dfO3 %>%
  filter(date=='2018-12-15')->dfO3
  1. Observe first 10 rows
dfO3%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 0.03330 1
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.08645 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.02858 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 0.08059 1
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 -0.01805 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.00000 1

SO2 Concentration

  1. Apply function to retrieve SO2 concentration data and label reliability.
systemSO2<-'chemsense'
parameterSO2<-'concentration'

sensorSO2<-'so2'
highSO2<-20
lowSO2<-0
dfSO2<-defValid(dbname, system=systemSO2, parameter1=parameterSO2, sensor1=sensorSO2, high=highSO2, low=lowSO2)

#because we are only scoring for a day here, filter data for 2018-12-15
dfSO2 %>%
  filter(date=='2018-12-15')->dfSO2
  1. Observe first 10 rows
dfSO2%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 -0.08692 0
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 1.06933 1
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 -1.40657 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 -0.60472 0
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.05132 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 0.11572 1
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 -0.05358 0
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 -0.20640 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.39536 1

3.3 Scoring Sensor Value Reliability

In this section, the method of scoring Sensor Value Reliability is presented for each data parameter type for the day of 2012-12-15. There are 2 scores obtained for this criteria. In summary, this section scores sensor value reliability for the entire network by analysing the amount of reliable data measured by the temperature sensors in each node at each 10-minute time interval during the day. This amount is compared in terms of proportion to account for the different total amounts of data measured by the sensors in different nodes and/or at different times of the day. The node sensor value reliablity of each node is obtained by taking the mean of these proportions across the time-intervals during the day. The network sensor value reliability (Score 1) is finally obtained by taking the mean of all the nodes’ average sensor value reliability. To also observe whether this reliability in sensor values is consistent throughout the day for each node, standard deviation metrics are used to score node sensor value reliability consistency. The overall consistency in sensor value reliability (Score 2) for the network is then obtained as the average mean of these nodes’ consistency scores.

The flowchart below illustrates the scoring process in this section:

Click on the tabs below to view in detail how the scores were constructed for each data parameter.

Temperature

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting temperature data in the network. It can be observed that most nodes collect more than 100 data measurements for every 10-minute time interval.

dfTemp%>%
  ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable Temperature Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfTemp%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfTemp1

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfTemp1%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 44.34783 41.35618
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 44.16667 41.35618
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 43.47826 41.35618
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 44.16667 41.35618
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 42.60870 41.35618
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 41.73913 41.35618
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 45.00000 41.35618
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 43.47826 41.35618
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 44.16667 41.35618
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 41.73913 41.35618
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 40.86957 41.35618
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 45.00000 41.35618
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 48.69565 41.35618

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

ggplot()+
  geom_line(data=dfTemp1,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfTemp1,aes(x=by10, y=X, group=1, col="Proportion of Reliable Data"), size=1, alpha=0.5)+
      scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable Temperature Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 11.3% to 65.0% reliable.

dfTemp1%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e0610ef27 41.846579 -87.685557 64.99769
001e061135cb 41.779369 -87.664421 64.12399
001e0610ee33 41.965089 -87.679076 52.79469
001e0610e532 41.857959 -87.656427 52.28412
001e0610f05c 41.924903 -87.687703 52.00181
001e0610e537 41.961622 -87.665948 51.58313
001e061130f4 41.896157 -87.662391 51.45028
001e0610f732 41.895005 -87.745817 51.39065
001e0610eef4 41.912681 -87.681052 51.17562
001e0610ee43 41.788608 -87.598713 51.14156
001e0610ba46 41.878377 -87.627678 51.13476
001e0610ee36 41.751295 -87.605288 51.08645
001e0610ee5d 41.923996 -87.761072 51.03694
001e0610bbf9 41.768319 -87.683396 50.87284
001e0610bc10 41.736314 -87.624179 50.67658
001e0610f6db 41.791329 -87.598677 48.42593
001e06113dbc 41.713867 -87.536509 42.10656
001e06113f54 41.884607 -87.624577 41.87248
001e0610bc12 41.75034 -87.663518 41.78039
001e06113a48 41.943263 -87.688069 41.65358
001e0610ba13 41.751238 -87.712990 41.35618
001e0610ba15 41.722457 -87.57535 41.22434
001e06113cf1 41.884688 -87.627864 40.98682
001e0611537d 41.794167 -87.601646 40.77823
001e06113107 41.751142 -87.71299 38.42869
001e061144c0 41.764122 -87.72242 34.21231
001e0610e538 41.736593 -87.604759 32.02960
001e0610fb4c 41.913583 -87.682414 30.72025
001e06114503 41.666078 -87.539374 30.06114
001e06113ace 41.83107 -87.617298 13.88310
001e0610f703 41.87148 -87.67644 13.38366
001e06114500 41.714494 -87.643099 12.53336
001e0611462f 41.823527 -87.641054 12.48516
001e06113d22 41.800846 -87.703739 12.46162
001e0610f8f4 41.832579 -87.646133 12.09063
001e0611536c 41.88575 -87.62969 12.05440
001e06114fd4 41.794477 -87.615957 12.03402
001e061146bc 41.918733 -87.668257 11.86846
001e0610eef2 41.965256 -87.66672 11.84833
001e061146ba 41.96759 -87.76257 11.50035
001e0610e835 41.968757 -87.679174 11.29906

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map. There, it can be generally observed that nodes of similar average sensor value reliability levels tend to be located close to one another. The nodes with the lowest sensor value reliability levels tend to be located mostly around the city centre, with the rest located in isolation at the city periphery.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfTemp1$NodeMeanX, n = 5)

dfTemp1$lat<-as.numeric(dfTemp1$lat)
dfTemp1$lon<-as.numeric(dfTemp1$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfTemp1,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfTemp1$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfTemp1$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfTemp1$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")