Return to MUSA 801 Projects Page

The following project was created in association with the MUSA 801 Practicum at the University of Pennsylvania, taught by Ken Steif, Michael Fichman, and Matt Harris. We would like to thank Charlie Catlett of the Argonne National Laboratory for providing feedback to help us create a meaningful application. All products from this class should be considered proofs of concept and works in progress.

This document is split into two parts. The first part addresses the policy implications of our project, as well as the concepts of data reliability that underpinned our methodology. The second part presents our methodology together with the codeblocks necessary for its replication. The policy implications and concepts are explicated in the sections Introduction and Defining Data Reliability. Our methods can then be replicated by following the sections Scoring Data Reliability and Scoring Data Reliability After Imputation.

1. Introduction

A Senseable Smart City

“…sound and touch and taste all have a place in the tools which…will define digital planning in the near future.” - Michael Batty

In a foreword for Robert Laurini’s Information Systems for Urban Planning: A Hypermedia Co-operative Approach, Michael Batty, Professor of Spatial Analysis and Planning at the University College of London, identifies the role that our senses play in the future of urban planning. As urban citizens rely on their senses to interpret and experience their urban environment through sights, sounds, smells, and touch, a planner needs to obtain a good understanding of the same sensorial stimuli that shape urban experience and quality of life in order to make meaningful improvements to it.

This is why the rising ubiquity and decreasing cost of sensing tools has the potential to change the way planners plan, structure, and manage the city. Sensor devices are often designed for tasks that either emulate or extend beyond the human senses. More importantly, they collect valuable data that help us approximate the human sensory experience in an urban environment, generating large volumes of feedback at spatial and temporal scales that can be further analysed for detailed insights. As cities around the world strive to be ‘smart’ in the ways they enhance quality of life for its citizens, sensor networks are also increasingly deployed to collect data that can be used to understand and manage the urban experience.Here, new technology plays a role in transforming efforts for sustainable urban growth and smart city planning.

The Array of Things (AoT)

As an urban-scale sensing network that collects real-time environmental data in cities, the AoT initiative exemplifies this trend. This is an initiative led by Charlie Catlett and the researchers from the Urban Center for Computation and Data, a joint initiative of the Argonne National Laboratory and the University of Chicago. Launched in 2016 and currently implemented in Chicago, the data collected through this initiative is open and free to the public.

The AoT could be the first sensing project of this geographic scale and level of temporal and data type specifity. As presented in the figure below, the AoT network comprises nodes, which are sensor boxes containing up to 15 sensors measuring different sensory data types, or parameters. These parameters include temperature, humidity, pressure, PM 2.5 concentration, and concentrations of other hazardous gas types such as carbon monoxide (CO), nitrous dioxide (NO2) and sulphur dioxide (SO2). As of this writing, there are 86 nodes citywide, as seen in the figure below. When fully implemented, the AoT network will consist of 500 nodes across Chicago.

This opens up a whole array of possible angles and ways that individuals, organisations, researchers, engineers and scientists can study urban environment and living. This is the main objective of the AoT initiative. Particularly, the data presents valuable insights for urban policy planners and researchers interested in devising urban policies that are sensitive to the unavoidable human-environment dynamics that shape urban behaviour and livability.

Importance of sensor data reliability

Extracting such valuable insights from sensor data requires the raw data to be processed and analysed. Here, the extent of data processing and quality of data analysis critically depends on the reliability of the data itself.

Therefore, our project seeks to evaluate the level of reliability of the data collected by the AoT network. In the following Section 2, we first define criteria for what it means for AoT data to be reliable. In Section 3 and Section 4, based on these criteria, we then provide a method to numerically score daily network data reliability for the network in terms of different data parameters.

Based on this score metric, we hope that planners could easily identify segments of the big AoT dataset for their relevant analysis. All in all, we also hope the transparent evaluation factors and scores behind this data reliability analysis could promote a more informed use of data in the increasingly data-driven planning process. We see this application to be additionally useful in improving research efficiency, considering the increasingly large stores of sensor data available - planners will be able to scope the spatial and temporal scale of their research according to where and when reliable segments of the data is available, instead of having to explore many different datasets to finalise a suitable scope of analysis.

1.2 Setup

Below we set the working directory, and load the libraries needed for the analysis as well as a plotTheme. We also set the memory limit to a high value, given the size of data we are processing here.

library(dplyr)
library(DBI)
library(RSQLite)
library(dbplyr)
library(lubridate)
library(leaflet)
library(tmap)
library(ggplot2)
library(sf)
library(plotly)
library(sp)
library(spatstat)
library(rgeos)
library(rgdal)
library(tidyr)
library(gridExtra)
library(stringr)
library(tidyverse)
library(caret)
library(sf)
library(FNN)
library(spdep)
library(knitr)
library(kableExtra)
library(htmlwidgets)
library(htmltools)
library(tmap)
library(openair)

setwd("~/Capstone/Exploratory")
memory.limit(100000000000)
## [1] 1e+11
plotTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks.x = element_blank(),
    axis.ticks.y = element_line( size=.1, color="#ababab" ),
    panel.grid.major.y = element_line( size=.1, color="#ababab" ),
    panel.grid.major.x = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank()
  )
}

2. Defining Data Reliability

In order to use the data, planners need to know if the data is reliable. Here, we identify 4 ideal criteria for AoT data to be considered reliable and useful:

  1. Sensor Measure Reliability: All the data collected should be within sensing range, as determined by the sensor specifications.

  2. Spatial Reliability: Reliable data should be collected for the whole spatial extent of the study area, which is Chicago in this case.

  3. Temporal Reliability: Reliable data should be collected at consistent time intervals throughout the day.

  4. Imputability: Missing and unreliable data records should be easily substitutable by the next nearest record in time and space. This substitution, or imputation, should improve sensor measure, spatial, and temporal reliability.

Each criteria is demonstrated below, together with the method through which the criteria is scored. Such numerical scores are metrics that facilitate easy comparison between different datasets.

Each of these 4 criteria will be conceptualised in detail in the following sub-sections below.

2.2 Sensor Value Reliability

This is the first and most important criteria defining data reliability in our method.

According to the AoT metadata site, each sensor has a specific detection range. This means that a well-functioning sensor should record values within this range. Otherwise, values recorded outside this range indicate that the sensor is faulty - and these values are unreliable and unuseable.

The table below presents the specific detection ranges of different sensors recording different data parameters.

Data Type Parameter Minimum senseable value Maximum senseable value Unit
Weather Temperature -55 125 deg Celsius
Weather Relative Humidity 0 100 Percent
Weather Pressure 300 1100 Pascal
Air Quality PM2.5 concentration 0 - PPM
Air Quality CO concentration 0 1000 PPM
Air Quality H2S concentration 0 50 PPM
Air Quality NO2 concentration 0 20 PPM
Air Quality O3 concentration 0 20 PPM
Air Quality SO2 concentration 0 20 PPM

It should also be noted that temperature values recorded by the sensors should also fall within logical seasonal ranges. Intuitively, we know that temperature cannot fluctuate between -55 deg Celsius and 125 deg Celsius in a day. Also, we expect temperature values collected during winter in Chicago to be around or below the freezing point of 0 deg Celsius, while summers should record higher temperature values. Therefore, to determine sensor value reliability for temperature values, we also reference daily temperature ranges in Chicago published by the National Weather Service.

Based on these, we can label each sensor value as reliable or not. If the value falls within the sensor specification range (and daily range for temperature values), reliability == 1, and if not, reliability == 0.

To evaluate the network sensor value reliability, we are interested to know on average the proportion of sensor values measured in each node that are reliable. The more sensor values measured that are reliable, the more reliable the overall network is in terms of sensor value reliability.

Based on the this criterion of Sensor Value Reliability, we can define active nodes and inactive nodes in the network.

  • Active nodes record at least one reliable value that falls within the sensor specification range during the course of a day.

  • Inactive nodes record not even one reliable values that falls within the sensor specification range during the course of a day. These includes nodes containing sensors that do not record any value at all.

This helps us define two other relevant reliability criteria, Spatial Reliability and Temporal Reliability, that will be elaborated on in Section 2.3 and Section 2.4 respectively.

2.3 Spatial Reliability

To evaluate the spatial reliability of the network, we are interested to know whether active nodes are distributed across Chicago. Here, the area covered by active nodes are considered to have reliable data collected for it. A network that is fully spatially reliable is one that has its nodes distributed across the whole of Chicago, such that the total spatial extent of the nodes span the area of the city. A network that is not fully spatially reliable is one which total spatial extent span only part of the city. The figure below illustrates this point:

Therefore, the average proportion of Chicago’s area covered by the network extent at any one time serves as a metric for spatial reliability here.

2.4 Temporal Reliability

To evaluate the temporal reliability of the network, we are interested to know the average proportion of the day-duration that a node within the network is active for i.e. collecting reliable data. A network that is fully temporally reliable is one that has all its nodes collecting reliable data consistently across all time intervals during a day. A network that is not fully temporally reliable is one that has at least one of its nodes not collecting reliable data at some point during the day. The figure below illustrates this type of network - while some of its nodes are consistently collecting reliable data throughout the day, others have periods during which no reliable data is collected at all:

Therefore, the average proportion of the day-duration during which reliable data is being collected serves as a metric for temporal reliability here.

2.5 Imputability

Imputability refers to the possibility and effectiveness of replacing missing or as-if-missing values with observed ones. Here, unreliable data is considered as-if-missing.

The figure below presents our imputation method to replace missing and unreliable data in the dataset we retrieve from the AoT database.

To evaluate imputability, we first apply the procedure above to our original dataset to obtain one that is imputed for. We then apply the score metrics for the other 3 reliability criteria on this new imputed dataset, and compare the scores. Ideally, the scores for the second dataset should be higher - this will suggest the effectiveness of imputing for unreliable and missing data in the retrieved dataset. If the scores are not higher, this indicates that that imputation cannot be used to ‘salvage’ the original dataset that is consisted of too many unreliable data points. In this case, planners might be advised to not use the dataset for that day and data type at all.

3. Scoring Data Reliability

In this section, we will present the method of scoring Data Reliability based on the criteria of Sensor Value Reliability, Spatial Reliability and Temporal Reliability. This will be demonstrated using data collected on 2018-12-15 for the different data types listed for weather and air quality in their respective sections.

The flow chart below illustrates the common workflow we adopt for scoring data reliability for each day:

3.2 Pre-scoring Data Retrieval and Processing

To manage the large AoT dataset, we first download the dataset for December 2018 from the AoT data site and import it into a database using SQL Server Management Studio (instructions on this can be found here. We then save this database as Chicago2018-12.db. It is this database that we will connect to using the dbConnect function available from the DBI R package.

dbname<-'Chicago2018-12.db'

To faciliate the workflow, we provide a function below for users to retrieve AoT data from the SQL database and then determine if each data point is reliable or not.

defValid<-function(dbname, system, parameter1, sensor1=NULL, high=NULL, low=NULL, actual1=NULL){
  
  #1. Connect to SQL database
  
  con<-dbConnect(SQLite(), dbname=dbname)
  
  #2. Send query and retrieve data from database
  weather<-
    dbSendQuery(con,
                paste0(
                  "SELECT data.timestamp, data.node_id, data.subsystem,data.sensor, data.parameter, data.value_hrf, nodes.lat, nodes.lon
                  FROM data
                  JOIN nodes
                  ON data.node_id = nodes.node_id
                  WHERE data.subsystem 
                  IN ('",system,"')"))%>%
    dbFetch()%>%
    mutate(timestamp2=ymd_hms(timestamp),
           date=date(timestamp),
           value_hrf=as.numeric(value_hrf))%>%
    filter(parameter==parameter1)%>%
    mutate(by10=cut(timestamp2, breaks='10 min'))%>%
    mutate(time=ymd_hms(by10))
  
  #3. Determine for each data point whether it is reliable or not, 
  ##sensor reliability specification differs according to different data parameters
  
  if(parameter1=='humidity'){
    weather%>%
      filter(sensor==sensor1)%>%
      filter(parameter==parameter1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf>100|value_hrf<0,0,1)))%>%
      group_by(date,node_id)%>%
      mutate(val_qual=ifelse(mean(val_qual)!=1, 0,1))->df
  }else if(parameter1=='pressure'){
    weather%>%
      filter(sensor==sensor1)%>%
      filter(parameter==parameter1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf>1100|value_hrf<300,0,1)))%>%
      group_by(date,node_id)%>%
      mutate(val_qual=ifelse(mean(val_qual)!=1, 0,1))->df
  }else if(parameter1=='pm2_5'){
    weather%>%
      filter(parameter==parameter1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf<0,0,1)))->df
  }else if(parameter1=='concentration'){
    weather%>%
      filter(parameter==parameter1)%>%
      filter(sensor==sensor1)%>%
      mutate(val_qual=ifelse(is.na(value_hrf), 0, 
                             ifelse(value_hrf<low|value_hrf>high,0,1)))->df
  }else if(parameter1=='temperature'){
    actual<-read.csv(actual1)
    actual$date<-ymd(actual$date)
    
    weather%>%
      filter(parameter==parameter1)%>%
      left_join(actual, by='date')%>%
      mutate(high_bound = high + 5,
             low_bound = low - 5) %>%
      mutate(val_qual = ifelse(value_hrf > high_bound | value_hrf < low_bound, 0, 1))->df
    
    df%>%
      filter(val_qual==1)%>%
      group_by(by10, node_id)%>%
      mutate(quant75= quantile(value_hrf, probs=0.75),
             quant25= quantile(value_hrf, probs=0.25))%>%
      mutate(val_qual= ifelse(value_hrf > quant75 | value_hrf < quant25, 0,1))->df1
    
    df%>%
      filter(val_qual==0)%>%
      bind_rows(df1)->df
    
    rm(df1)
    
  }
  
  return(df)
}

The function calls for the following inputs:

  • dbname: Name of the SQL database file containing a month’s worth of AoT data
  • system: The sensor system within the node - see the AoT site for the specific system types.
  • parameter1: The data type - see the AoT site for the specific parameter types.
  • sensor1: The sensor type to specify for different gas concentration types - only required when parameter1 = concentration. See the AoT site for the specific sensor types.
  • high: The highest value sensor1 can record - only required when parameter1 = concentration
  • low: The lowest value sensor1 can record - only required when parameter1 = concentration
  • actual: The dataframe containing temperature ranges obtained from the National Weather Service - only required when temperature data is retrieved (parameter1 = temperature)

The function implements the following procedure:

  1. Connect to SQL database
  2. Send query and retrieve data from database
  3. Determine for each data point whether it is reliable or not

The function returns

  • timestamp: Date and time at which data observation is recorded.
  • node_id: Unique ID number of the node at which data is being recorded
  • subsystem: Subsystem within which the node is located
  • sensor: Sensor model
  • parameter: Data type
  • value_hrf: Measurement
  • nodes.lat: Latitude location of node
  • nodes.lon: Longitude location of node
  • timestamp2: timestamp in datetime format
  • date: Date extracted from timestamp2
  • by10: Time extracted from timestamp2, rounded to the nearest 10 minute interval
  • val_qual: 1 if data record is reliable, 0 if not

This function has to be applied before each scoring process in the following sections. Click on the tabs below to view the specific inputs that will yield the relevant data parameter types. The data retrieved here will then be scored in the following sections.

Temperature

  1. Apply function to retrieve temperature data and label reliability
systemTemp<-'metsense'
actualTemp<-'december_weather.csv'
parameterTemp<-'temperature'
dfTemp<-defValid(dbname, system=systemTemp, parameter1=parameterTemp, actual1=actualTemp)

#because we are only scoring for a day here, filter data for 2018-12-15
dfTemp%>%
  filter(date=='2018-12-15')->dfTemp
  1. Observe first 10 rows
dfTemp%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:01 001e0610bc12 41.75034 -87.663518 2018-12-15 00:00:00 40.20 0
2018/12/15 00:00:01 001e06113f54 41.884607 -87.624577 2018-12-15 00:00:00 35.60 0
2018/12/15 00:00:01 001e0611537d 41.794167 -87.601646 2018-12-15 00:00:00 29.70 0
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 35.10 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 241.00 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 128.86 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 -254.00 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 214.75 0
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 37.20 0
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 241.00 0

Humidity

  1. Apply function to retrieve humidity data and label reliability
systemHumidity<-'metsense'
parameterHumidity<-'humidity'
sensorHumidity<-'htu21d'
dfHumidity<-defValid(dbname, system=systemHumidity, parameter1=parameterHumidity, sensor1=sensorHumidity)

#because we are only scoring for a day here, filter data for 2018-12-15
dfHumidity%>%
  filter(date=='2018-12-15')->dfHumidity
  1. Observe first 10 rows
dfHumidity%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
## Adding missing grouping variables: `date`
date timestamp node_id lat lon by10 value_hrf val_qual
2018-12-15 2018/12/15 00:00:00 001e0610ee36 41.751295 -87.605288 2018-12-15 00:00:00 77.50 1
2018-12-15 2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 75.80 1
2018-12-15 2018/12/15 00:00:01 001e0610bc12 41.75034 -87.663518 2018-12-15 00:00:00 80.76 1
2018-12-15 2018/12/15 00:00:01 001e06113f54 41.884607 -87.624577 2018-12-15 00:00:00 81.80 1
2018-12-15 2018/12/15 00:00:01 001e0611537d 41.794167 -87.601646 2018-12-15 00:00:00 118.99 0
2018-12-15 2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 118.99 0
2018-12-15 2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 86.06 1
2018-12-15 2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 118.99 0
2018-12-15 2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 86.75 1
2018-12-15 2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 118.99 0

Pressure

  1. Apply function to retrieve pressure data and label reliability
systemPressure<-'metsense'
parameterPressure<-'pressure'
sensorPressure<-'bmp180'
dfPressure<-defValid(dbname, system=systemPressure, parameter1=parameterPressure, sensor1=sensorPressure)

#because we are only scoring for a day here, filter data for 2018-12-15
dfPressure %>%
  filter(date=='2018-12-15')->dfPressure
  1. Observe first 10 rows
dfPressure%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
## Adding missing grouping variables: `date`
date timestamp node_id lat lon by10 value_hrf val_qual
2018-12-15 2018/12/15 00:00:00 001e0610ee36 41.751295 -87.605288 2018-12-15 00:00:00 1042.56 1
2018-12-15 2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 1011.77 1
2018-12-15 2018/12/15 00:00:01 001e0610bc12 41.75034 -87.663518 2018-12-15 00:00:00 1119.05 0
2018-12-15 2018/12/15 00:00:01 001e06113f54 41.884607 -87.624577 2018-12-15 00:00:00 1017.66 1
2018-12-15 2018/12/15 00:00:01 001e0611537d 41.794167 -87.601646 2018-12-15 00:00:00 1016.86 1
2018-12-15 2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 1080.70 1
2018-12-15 2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 997.53 1
2018-12-15 2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 2361.56 0
2018-12-15 2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 1036.71 1
2018-12-15 2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 2361.56 0

PM2.5 Concentration

  1. Apply function to retrieve PM 2.5 Concentration data and label reliability.
systemPM25<-'alphasense'
parameterPM25<-'pm2_5'

dfPM25<-defValid(dbname, systemPM25, parameterPM25)

#because we are only scoring for a day here, filter data for 2018-12-15
dfPM25%>%
  filter(date=='2018-12-15')->dfPM25
  1. Observe first 10 rows
dfPM25%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 12.179 1
2018/12/15 00:00:13 001e06113dbc 41.713867 -87.536509 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:18 001e0610bc10 41.736314 -87.624179 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:21 001e0610ba15 41.722457 -87.57535 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:31 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:31 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:35 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 NA 0

CO Concentration

  1. Apply function to retrieve CO concentration data and label reliability.
systemCO<-'chemsense'
parameterCO<-'concentration'

sensorCO<-'co'
highCO<-1000
lowCO<-0
dfCO<-defValid(dbname, system=systemCO, parameter1=parameterCO, sensor1=sensorCO, high=highCO, low=lowCO)

#because we are only scoring for a day here, filter data for 2018-12-15
dfCO %>% 
  filter(date=='2018-12-15')->dfCO
  1. Observe first 10 rows
dfCO%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 -0.10126 0
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 -0.52656 0
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.27291 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.08977 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.12475 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 -0.07333 0
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 -0.08132 0
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 -0.11219 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.14908 1

H2S Concentration

  1. Apply function to retrieve H2S concentration data and label reliability.
systemH2S<-'chemsense'
parameterH2S<-'concentration'

sensorH2S<-'h2s'
highH2S<-50
lowH2S<-0
dfH2S<-defValid(dbname, system=systemH2S, parameter1=parameterH2S, sensor1=sensorH2S, high=highH2S, low=lowH2S)

#because we are only scoring for a day here, filter data for 2018-12-15
dfH2S %>%
  filter(date=='2018-12-15')->dfH2S
  1. Observe first 10 rows
dfH2S%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 0.00271 1
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 -0.13310 0
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.46145 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.11730 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 -0.02195 0
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 -0.04071 0
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 0.02893 1
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 0.16494 1
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 -0.06809 0

NO2 Concentration

  1. Apply function to retrieve NO2 concentration data and label reliability.
systemNO2<-'chemsense'
parameterNO2<-'concentration'

sensorNO2<-'no2'
highNO2<-20
lowNO2<-0
dfNO2<-defValid(dbname, system=systemNO2, parameter1=parameterNO2, sensor1=sensorNO2, high=highNO2, low=lowNO2)

#because we are only scoring for a day here, filter data for 2018-12-15
dfNO2 %>%
  filter(date=='2018-12-15')->dfNO2
  1. Observe first 10 rows
dfNO2%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 0.00470 1
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.01549 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 0.02010 1
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 0.07814 1
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.00543 1

O3 Concentration

  1. Apply function to retrieve O3 concentration data and label reliability.
systemO3<-'chemsense'
parameterO3<-'concentration'

sensorO3<-'o3'
highO3<-20
lowO3<-0
dfO3<-defValid(dbname, system=systemO3, parameter1=parameterO3, sensor1=sensorO3, high=highO3, low=lowO3)

#because we are only scoring for a day here, filter data for 2018-12-15
dfO3 %>%
  filter(date=='2018-12-15')->dfO3
  1. Observe first 10 rows
dfO3%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 0.03330 1
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 0.08645 1
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 0.02858 1
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 0.00000 1
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 0.08059 1
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 -0.01805 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.00000 1

SO2 Concentration

  1. Apply function to retrieve SO2 concentration data and label reliability.
systemSO2<-'chemsense'
parameterSO2<-'concentration'

sensorSO2<-'so2'
highSO2<-20
lowSO2<-0
dfSO2<-defValid(dbname, system=systemSO2, parameter1=parameterSO2, sensor1=sensorSO2, high=highSO2, low=lowSO2)

#because we are only scoring for a day here, filter data for 2018-12-15
dfSO2 %>%
  filter(date=='2018-12-15')->dfSO2
  1. Observe first 10 rows
dfSO2%>%
  select(timestamp, node_id, lat, lon, by10, value_hrf, val_qual)%>%
  head(10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover"))
timestamp node_id lat lon by10 value_hrf val_qual
2018/12/15 00:00:00 001e0610ee43 41.788608 -87.598713 2018-12-15 00:00:00 -0.08692 0
2018/12/15 00:00:04 001e061144c0 41.764122 -87.72242 2018-12-15 00:00:00 1.06933 1
2018/12/15 00:00:05 001e0610ef27 41.846579 -87.685557 2018-12-15 00:00:00 NA 0
2018/12/15 00:00:05 001e061130f4 41.896157 -87.662391 2018-12-15 00:00:00 -1.40657 0
2018/12/15 00:00:06 001e06114fd4 41.794477 -87.615957 2018-12-15 00:00:00 -0.60472 0
2018/12/15 00:00:07 001e06113cf1 41.884688 -87.627864 2018-12-15 00:00:00 0.05132 1
2018/12/15 00:00:08 001e061146bc 41.918733 -87.668257 2018-12-15 00:00:00 0.11572 1
2018/12/15 00:00:09 001e0610f05c 41.924903 -87.687703 2018-12-15 00:00:00 -0.05358 0
2018/12/15 00:00:09 001e06114503 41.666078 -87.539374 2018-12-15 00:00:00 -0.20640 0
2018/12/15 00:00:11 001e06113107 41.751142 -87.71299 2018-12-15 00:00:00 0.39536 1

3.3 Scoring Sensor Value Reliability

In this section, the method of scoring Sensor Value Reliability is presented for each data parameter type for the day of 2012-12-15. There are 2 scores obtained for this criteria. In summary, this section scores sensor value reliability for the entire network by analysing the amount of reliable data measured by the temperature sensors in each node at each 10-minute time interval during the day. This amount is compared in terms of proportion to account for the different total amounts of data measured by the sensors in different nodes and/or at different times of the day. The node sensor value reliablity of each node is obtained by taking the mean of these proportions across the time-intervals during the day. The network sensor value reliability (Score 1) is finally obtained by taking the mean of all the nodes’ average sensor value reliability. To also observe whether this reliability in sensor values is consistent throughout the day for each node, standard deviation metrics are used to score node sensor value reliability consistency. The overall consistency in sensor value reliability (Score 2) for the network is then obtained as the average mean of these nodes’ consistency scores.

The flowchart below illustrates the scoring process in this section:

Click on the tabs below to view in detail how the scores were constructed for each data parameter.

Temperature

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting temperature data in the network. It can be observed that most nodes collect more than 100 data measurements for every 10-minute time interval.

dfTemp%>%
  ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable Temperature Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfTemp%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfTemp1

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfTemp1%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 44.34783 41.35618
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 44.16667 41.35618
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 43.47826 41.35618
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 44.16667 41.35618
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 42.60870 41.35618
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 41.73913 41.35618
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 45.00000 41.35618
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 43.47826 41.35618
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 44.16667 41.35618
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 41.73913 41.35618
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 43.33333 41.35618
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 40.86957 41.35618
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 42.50000 41.35618
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 40.83333 41.35618
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 45.00000 41.35618
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 41.66667 41.35618
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 40.00000 41.35618
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 48.69565 41.35618

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

ggplot()+
  geom_line(data=dfTemp1,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfTemp1,aes(x=by10, y=X, group=1, col="Proportion of Reliable Data"), size=1, alpha=0.5)+
      scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable Temperature Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 11.3% to 65.0% reliable.

dfTemp1%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e0610ef27 41.846579 -87.685557 64.99769
001e061135cb 41.779369 -87.664421 64.12399
001e0610ee33 41.965089 -87.679076 52.79469
001e0610e532 41.857959 -87.656427 52.28412
001e0610f05c 41.924903 -87.687703 52.00181
001e0610e537 41.961622 -87.665948 51.58313
001e061130f4 41.896157 -87.662391 51.45028
001e0610f732 41.895005 -87.745817 51.39065
001e0610eef4 41.912681 -87.681052 51.17562
001e0610ee43 41.788608 -87.598713 51.14156
001e0610ba46 41.878377 -87.627678 51.13476
001e0610ee36 41.751295 -87.605288 51.08645
001e0610ee5d 41.923996 -87.761072 51.03694
001e0610bbf9 41.768319 -87.683396 50.87284
001e0610bc10 41.736314 -87.624179 50.67658
001e0610f6db 41.791329 -87.598677 48.42593
001e06113dbc 41.713867 -87.536509 42.10656
001e06113f54 41.884607 -87.624577 41.87248
001e0610bc12 41.75034 -87.663518 41.78039
001e06113a48 41.943263 -87.688069 41.65358
001e0610ba13 41.751238 -87.712990 41.35618
001e0610ba15 41.722457 -87.57535 41.22434
001e06113cf1 41.884688 -87.627864 40.98682
001e0611537d 41.794167 -87.601646 40.77823
001e06113107 41.751142 -87.71299 38.42869
001e061144c0 41.764122 -87.72242 34.21231
001e0610e538 41.736593 -87.604759 32.02960
001e0610fb4c 41.913583 -87.682414 30.72025
001e06114503 41.666078 -87.539374 30.06114
001e06113ace 41.83107 -87.617298 13.88310
001e0610f703 41.87148 -87.67644 13.38366
001e06114500 41.714494 -87.643099 12.53336
001e0611462f 41.823527 -87.641054 12.48516
001e06113d22 41.800846 -87.703739 12.46162
001e0610f8f4 41.832579 -87.646133 12.09063
001e0611536c 41.88575 -87.62969 12.05440
001e06114fd4 41.794477 -87.615957 12.03402
001e061146bc 41.918733 -87.668257 11.86846
001e0610eef2 41.965256 -87.66672 11.84833
001e061146ba 41.96759 -87.76257 11.50035
001e0610e835 41.968757 -87.679174 11.29906

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map. There, it can be generally observed that nodes of similar average sensor value reliability levels tend to be located close to one another. The nodes with the lowest sensor value reliability levels tend to be located mostly around the city centre, with the rest located in isolation at the city periphery.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfTemp1$NodeMeanX, n = 5)

dfTemp1$lat<-as.numeric(dfTemp1$lat)
dfTemp1$lon<-as.numeric(dfTemp1$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfTemp1,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfTemp1$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfTemp1$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfTemp1$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for temperature data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfTemp1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfTemp1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
36.3617

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfTemp1%>%
  select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfTemp2

dfTemp2%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e0610bbf9 41.76832 -87.68340 97.726680 42.29874
001e0610ee36 41.75129 -87.60529 97.613689 42.29874
001e0610ee5d 41.92400 -87.76107 97.552482 42.29874
001e0610f732 41.89500 -87.74582 97.426711 42.29874
001e0610ba46 41.87838 -87.62768 97.340617 42.29874
001e0610ee43 41.78861 -87.59871 97.333544 42.29874
001e061130f4 41.89616 -87.66239 97.026105 42.29874
001e0610e537 41.96162 -87.66595 96.662634 42.29874
001e0610e532 41.85796 -87.65643 96.553289 42.29874
001e0610ee33 41.96509 -87.67908 95.940712 42.29874
001e0610f05c 41.92490 -87.68770 95.548531 42.29874
001e0610eef4 41.91268 -87.68105 93.955537 42.29874
001e0610bc10 41.73631 -87.62418 91.976894 42.29874
001e061135cb 41.77937 -87.66442 85.808949 42.29874
001e0610ef27 41.84658 -87.68556 84.491752 42.29874
001e06114503 41.66608 -87.53937 50.173085 42.29874
001e0610f6db 41.79133 -87.59868 19.087700 42.29874
001e0610fb4c 41.91358 -87.68241 16.947028 42.29874
001e06113d22 41.80085 -87.70374 16.149122 42.29874
001e0611462f 41.82353 -87.64105 15.878560 42.29874
001e0610f703 41.87148 -87.67644 14.701805 42.29874
001e06113ace 41.83107 -87.61730 14.566941 42.29874
001e06114500 41.71449 -87.64310 14.513772 42.29874
001e0610eef2 41.96526 -87.66672 14.415931 42.29874
001e061146bc 41.91873 -87.66826 13.812588 42.29874
001e061144c0 41.76412 -87.72242 13.356711 42.29874
001e0610f8f4 41.83258 -87.64613 13.153401 42.29874
001e0611536c 41.88575 -87.62969 12.352170 42.29874
001e061146ba 41.96759 -87.76257 12.231410 42.29874
001e0610e835 41.96876 -87.67917 12.074803 42.29874
001e06114fd4 41.79448 -87.61596 12.032181 42.29874
001e0610e538 41.73659 -87.60476 10.604387 42.29874
001e06113dbc 41.71387 -87.53651 5.784395 42.29874
001e06113a48 41.94326 -87.68807 4.675932 42.29874
001e06113107 41.75114 -87.71299 4.509346 42.29874
001e06113f54 41.88461 -87.62458 4.467886 42.29874
001e0610bc12 41.75034 -87.66352 4.015151 42.29874
001e0610ba15 41.72246 -87.57535 3.388502 42.29874
001e0610ba13 41.75124 -87.71299 3.166652 42.29874
001e06113cf1 41.88469 -87.62786 2.666945 42.29874
001e0611537d 41.79417 -87.60165 2.563674 42.29874

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map. There, it can be picked out that node 001e06114503 is located at the southmost end of Chicago.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfTemp2$NodeSDScore, n = 5)

dfTemp2$lat<-as.numeric(dfTemp2$lat)
dfTemp2$lon<-as.numeric(dfTemp2$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfTemp2,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfTemp2$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfTemp2$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfTemp2$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for temperature data on 2012-12-15.

dfTemp2%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT Temperature Network on 2012-12-15,

  • Score 1 = 36.6 : On average, only 36.6% of the temperature data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 42.3 : From the density distribution, it could be observed that the node sensor value reliability is consistently bad more often than consistently good, hence the moderate score here.

Humidity

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting humidity data in the network. It can be observed that most nodes collect more than 20 data measurements for every 10-minute time interval. It can also be observed that nodes are either collecting reliable data or unreliable ones - there is no mix of both.

dfHumidity%>%
  ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable Humidity Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfHumidity%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfHumidity1

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfHumidity1%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 100 100

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfHumidity1,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfHumidity1,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
      scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable Humidity Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 0% to 100.0% reliable.

dfHumidity1%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e0610ee36 41.751295 -87.605288 100.00000
001e0610ee43 41.788608 -87.598713 100.00000
001e0610bc12 41.75034 -87.663518 100.00000
001e06113f54 41.884607 -87.624577 100.00000
001e061130f4 41.896157 -87.662391 100.00000
001e06113cf1 41.884688 -87.627864 100.00000
001e0610ee5d 41.923996 -87.761072 100.00000
001e06113dbc 41.713867 -87.536509 100.00000
001e0610e532 41.857959 -87.656427 100.00000
001e0610ba46 41.878377 -87.627678 100.00000
001e0610ee33 41.965089 -87.679076 100.00000
001e0610f732 41.895005 -87.745817 100.00000
001e0610bbf9 41.768319 -87.683396 100.00000
001e06113a48 41.943263 -87.688069 100.00000
001e0610ba13 41.751238 -87.712990 100.00000
001e0610e537 41.961622 -87.665948 100.00000
001e0610f6db 41.791329 -87.598677 100.00000
001e06113107 41.751142 -87.71299 92.36111
001e0611537d 41.794167 -87.601646 0.00000
001e061144c0 41.764122 -87.72242 0.00000
001e06114fd4 41.794477 -87.615957 0.00000
001e061146bc 41.918733 -87.668257 0.00000
001e0610f05c 41.924903 -87.687703 0.00000
001e06114503 41.666078 -87.539374 0.00000
001e0611536c 41.88575 -87.62969 0.00000
001e0611462f 41.823527 -87.641054 0.00000
001e0610f8f4 41.832579 -87.646133 0.00000
001e0610f703 41.87148 -87.67644 0.00000
001e06113d22 41.800846 -87.703739 0.00000
001e0610e538 41.736593 -87.604759 0.00000
001e0610bc10 41.736314 -87.624179 0.00000
001e0610eef4 41.912681 -87.681052 0.00000
001e06113ace 41.83107 -87.617298 0.00000
001e061146ba 41.96759 -87.76257 0.00000
001e0610ba15 41.722457 -87.57535 0.00000
001e0610e835 41.968757 -87.679174 0.00000
001e0610eef2 41.965256 -87.66672 0.00000
001e06114500 41.714494 -87.643099 0.00000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map. There, it can be generally observed that nodes of similar average sensor value reliability levels tend to be located close to one another.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfHumidity1$NodeMeanX, n = 5)

dfHumidity1$lat<-as.numeric(dfHumidity1$lat)
dfHumidity1$lon<-as.numeric(dfHumidity1$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfHumidity1,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfHumidity1$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfHumidity1$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfHumidity1$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for humidity data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfHumidity1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfHumidity1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
47.1674

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfHumidity1%>%
select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfHumidity2

dfHumidity2%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e0610ee36 41.75129 -87.60529 100 47.36842
001e0610ee43 41.78861 -87.59871 100 47.36842
001e0610bc12 41.75034 -87.66352 100 47.36842
001e06113f54 41.88461 -87.62458 100 47.36842
001e061130f4 41.89616 -87.66239 100 47.36842
001e06113cf1 41.88469 -87.62786 100 47.36842
001e0610ee5d 41.92400 -87.76107 100 47.36842
001e06113107 41.75114 -87.71299 100 47.36842
001e06113dbc 41.71387 -87.53651 100 47.36842
001e0610e532 41.85796 -87.65643 100 47.36842
001e0610ba46 41.87838 -87.62768 100 47.36842
001e0610ee33 41.96509 -87.67908 100 47.36842
001e0610f732 41.89500 -87.74582 100 47.36842
001e0610bbf9 41.76832 -87.68340 100 47.36842
001e06113a48 41.94326 -87.68807 100 47.36842
001e0610ba13 41.75124 -87.71299 100 47.36842
001e0610e537 41.96162 -87.66595 100 47.36842
001e0610f6db 41.79133 -87.59868 100 47.36842
001e0611537d 41.79417 -87.60165 0 47.36842
001e061144c0 41.76412 -87.72242 0 47.36842
001e06114fd4 41.79448 -87.61596 0 47.36842
001e061146bc 41.91873 -87.66826 0 47.36842
001e0610f05c 41.92490 -87.68770 0 47.36842
001e06114503 41.66608 -87.53937 0 47.36842
001e0611536c 41.88575 -87.62969 0 47.36842
001e0611462f 41.82353 -87.64105 0 47.36842
001e0610f8f4 41.83258 -87.64613 0 47.36842
001e0610f703 41.87148 -87.67644 0 47.36842
001e06113d22 41.80085 -87.70374 0 47.36842
001e0610e538 41.73659 -87.60476 0 47.36842
001e0610bc10 41.73631 -87.62418 0 47.36842
001e0610eef4 41.91268 -87.68105 0 47.36842
001e06113ace 41.83107 -87.61730 0 47.36842
001e061146ba 41.96759 -87.76257 0 47.36842
001e0610ba15 41.72246 -87.57535 0 47.36842
001e0610e835 41.96876 -87.67917 0 47.36842
001e0610eef2 41.96526 -87.66672 0 47.36842
001e06114500 41.71449 -87.64310 0 47.36842

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfHumidity2$NodeSDScore, n = 5)

dfHumidity2$lat<-as.numeric(dfHumidity2$lat)
dfHumidity2$lon<-as.numeric(dfHumidity2$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfHumidity2,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfHumidity2$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfHumidity2$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfHumidity2$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for humidity data on 2012-12-15.

dfHumidity2%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT Humidity Network on 2012-12-15,

  • Score 1 = 47.2 : On average, only 47.2% of the humidity data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 47.4 : From the density distribution, it could be observed that the node sensor value reliability is consistently bad more often than consistently good, hence the moderate score here.

Pressure

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting pressure data in the network. It can be observed that most nodes collect more than 20 data measurements for every 10-minute time interval.It can also be observed that nodes are either collecting reliable data or unreliable ones - there is no mix of both.

dfPressure%>%
   ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable Pressure Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfPressure%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfPressure1

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfPressure1%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 100 100

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfPressure1,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfPressure1,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
      scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable Pressure Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 0 to 100% reliable.

dfPressure1%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e0610ee36 41.751295 -87.605288 100.00000
001e0610ee43 41.788608 -87.598713 100.00000
001e06113f54 41.884607 -87.624577 100.00000
001e0611537d 41.794167 -87.601646 100.00000
001e061130f4 41.896157 -87.662391 100.00000
001e06113cf1 41.884688 -87.627864 100.00000
001e0610f05c 41.924903 -87.687703 100.00000
001e0610ee5d 41.923996 -87.761072 100.00000
001e06113dbc 41.713867 -87.536509 100.00000
001e0610e532 41.857959 -87.656427 100.00000
001e0610ba46 41.878377 -87.627678 100.00000
001e0610ee33 41.965089 -87.679076 100.00000
001e0610f732 41.895005 -87.745817 100.00000
001e0610bc10 41.736314 -87.624179 100.00000
001e0610eef4 41.912681 -87.681052 100.00000
001e0610bbf9 41.768319 -87.683396 100.00000
001e06113a48 41.943263 -87.688069 100.00000
001e0610ba15 41.722457 -87.57535 100.00000
001e0610ba13 41.751238 -87.712990 100.00000
001e0610e537 41.961622 -87.665948 100.00000
001e0610f6db 41.791329 -87.598677 100.00000
001e0610e538 41.736593 -87.604759 93.05556
001e061144c0 41.764122 -87.72242 92.36111
001e06113107 41.751142 -87.71299 92.36111
001e0610bc12 41.75034 -87.663518 0.00000
001e06114fd4 41.794477 -87.615957 0.00000
001e061146bc 41.918733 -87.668257 0.00000
001e06114503 41.666078 -87.539374 0.00000
001e0611536c 41.88575 -87.62969 0.00000
001e0611462f 41.823527 -87.641054 0.00000
001e0610f8f4 41.832579 -87.646133 0.00000
001e0610f703 41.87148 -87.67644 0.00000
001e06113d22 41.800846 -87.703739 0.00000
001e06113ace 41.83107 -87.617298 0.00000
001e061146ba 41.96759 -87.76257 0.00000
001e0610e835 41.968757 -87.679174 0.00000
001e0610eef2 41.965256 -87.66672 0.00000
001e06114500 41.714494 -87.643099 0.00000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map. There, it can be generally observed that nodes of similar average sensor value reliability levels tend to be located close to one another.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfPressure1$NodeMeanX, n = 5)

dfPressure1$lat<-as.numeric(dfPressure1$lat)
dfPressure1$lon<-as.numeric(dfPressure1$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfPressure1,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfPressure1$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfPressure1$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfPressure1$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for pressure data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfPressure1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfPressure1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
62.5731

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfPressure1%>%
 select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfPressure2

dfPressure2%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e0610ee36 41.75129 -87.60529 100 63.15789
001e0610ee43 41.78861 -87.59871 100 63.15789
001e06113f54 41.88461 -87.62458 100 63.15789
001e0611537d 41.79417 -87.60165 100 63.15789
001e061144c0 41.76412 -87.72242 100 63.15789
001e061130f4 41.89616 -87.66239 100 63.15789
001e06113cf1 41.88469 -87.62786 100 63.15789
001e0610f05c 41.92490 -87.68770 100 63.15789
001e0610ee5d 41.92400 -87.76107 100 63.15789
001e06113107 41.75114 -87.71299 100 63.15789
001e06113dbc 41.71387 -87.53651 100 63.15789
001e0610e532 41.85796 -87.65643 100 63.15789
001e0610ba46 41.87838 -87.62768 100 63.15789
001e0610ee33 41.96509 -87.67908 100 63.15789
001e0610e538 41.73659 -87.60476 100 63.15789
001e0610f732 41.89500 -87.74582 100 63.15789
001e0610bc10 41.73631 -87.62418 100 63.15789
001e0610eef4 41.91268 -87.68105 100 63.15789
001e0610bbf9 41.76832 -87.68340 100 63.15789
001e06113a48 41.94326 -87.68807 100 63.15789
001e0610ba15 41.72246 -87.57535 100 63.15789
001e0610ba13 41.75124 -87.71299 100 63.15789
001e0610e537 41.96162 -87.66595 100 63.15789
001e0610f6db 41.79133 -87.59868 100 63.15789
001e0610bc12 41.75034 -87.66352 0 63.15789
001e06114fd4 41.79448 -87.61596 0 63.15789
001e061146bc 41.91873 -87.66826 0 63.15789
001e06114503 41.66608 -87.53937 0 63.15789
001e0611536c 41.88575 -87.62969 0 63.15789
001e0611462f 41.82353 -87.64105 0 63.15789
001e0610f8f4 41.83258 -87.64613 0 63.15789
001e0610f703 41.87148 -87.67644 0 63.15789
001e06113d22 41.80085 -87.70374 0 63.15789
001e06113ace 41.83107 -87.61730 0 63.15789
001e061146ba 41.96759 -87.76257 0 63.15789
001e0610e835 41.96876 -87.67917 0 63.15789
001e0610eef2 41.96526 -87.66672 0 63.15789
001e06114500 41.71449 -87.64310 0 63.15789

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfPressure2$NodeSDScore, n = 5)

dfPressure2$lat<-as.numeric(dfPressure2$lat)
dfPressure2$lon<-as.numeric(dfPressure2$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfPressure2,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfPressure2$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfPressure2$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfPressure2$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for pressure data on 2012-12-15.

dfPressure2%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT Pressure Network on 2012-12-15,

  • Score 1 = 62.6 : On average, only 62.6% of the pressure data measured by the nodes in the network every 10-minute is reliable. This is a moderate score.
  • Score 2 = 63.2 : From the density distribution, it could be observed that the node sensor value reliability is consistently good more often than consistently bad, hence the above-moderate score here.

PM2.5 Concentration

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting PM2.5 Concentration data in the network. It can be observed that most nodes collect more than 20 data measurements for every 10-minute time interval.

dfPM25%>%
  ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable PM 2.5 Concentration Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfPM25%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfPM251

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfPM251%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba15 2018-12-15 00:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 00:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 00:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 00:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 00:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 00:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 01:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 01:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 01:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 01:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 01:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 01:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 02:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 02:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 02:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 02:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 02:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 02:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 03:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 03:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 03:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 03:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 03:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 03:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 04:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 04:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 04:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 04:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 04:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 04:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 05:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 05:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 05:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 05:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 05:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 05:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 06:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 06:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 06:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 06:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 06:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 06:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 07:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 07:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 07:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 07:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 07:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 07:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 08:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 08:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 08:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 08:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 08:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 08:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 09:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 09:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 09:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 09:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 09:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 09:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 10:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 10:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 10:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 10:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 10:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 10:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 11:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 11:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 11:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 11:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 11:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 11:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 12:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 12:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 12:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 12:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 12:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 12:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 13:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 13:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 13:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 13:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 13:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 13:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 14:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 14:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 14:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 14:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 14:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 14:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 15:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 15:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 15:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 15:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 15:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 15:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 16:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 16:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 16:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 16:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 16:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 16:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 17:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 17:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 17:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 17:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 17:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 17:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 18:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 18:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 18:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 18:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 18:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 18:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 19:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 19:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 19:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 19:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 19:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 19:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 20:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 20:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 20:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 20:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 20:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 20:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 21:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 21:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 21:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 21:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 21:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 21:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 22:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 22:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 22:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 22:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 22:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 22:50:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 23:00:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 23:10:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 23:20:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 23:30:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 23:40:00 41.722457 -87.57535 0 0
001e0610ba15 2018-12-15 23:50:00 41.722457 -87.57535 0 0

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfPM251,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfPM251,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
      scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable PM 2.5 Concentration Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 0.0% to 92.4% reliable.

dfPM251%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e06113107 41.751142 -87.71299 92.36111
001e0610bc10 41.736314 -87.624179 43.22665
001e061144c0 41.764122 -87.72242 0.00000
001e06114fd4 41.794477 -87.615957 0.00000
001e0610f05c 41.924903 -87.687703 0.00000
001e06113dbc 41.713867 -87.536509 0.00000
001e0610ba15 41.722457 -87.57535 0.00000
001e06114500 41.714494 -87.643099 0.00000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map. There, it can also be observed that the limited number of PM2.5 nodes are located in the south-side of the city, with only one node located in the north.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfPM251$NodeMeanX, n = 5)

dfPM251$lat<-as.numeric(dfPM251$lat)
dfPM251$lon<-as.numeric(dfPM251$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfPM251,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfPM251$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfPM251$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfPM251$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for PM 2.5 concentration data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfPM251%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfPM251%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
16.94847

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfPM251%>%
select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfPM252

dfPM252%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e06113107 41.75114 -87.71299 100 25
001e0610bc10 41.73631 -87.62418 100 25
001e061144c0 41.76412 -87.72242 0 25
001e06114fd4 41.79448 -87.61596 0 25
001e0610f05c 41.92490 -87.68770 0 25
001e06113dbc 41.71387 -87.53651 0 25
001e0610ba15 41.72246 -87.57535 0 25
001e06114500 41.71449 -87.64310 0 25

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfPM252$NodeSDScore, n = 5)

dfPM252$lat<-as.numeric(dfPM252$lat)
dfPM252$lon<-as.numeric(dfPM252$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfTemp2,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfPM252$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfPM252$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfPM252$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for PM2.5 concentration data on 2012-12-15.

dfPM252%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT PM 2.5 Network on 2012-12-15,

  • Score 1 = 16.9 : On average, only 16.9% of the pressure data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 25.0 : Only 2 nodes consistently collected reliable data. The other nodes consistently collected unreliable data, hence the low score here.

CO Concentration

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting CO concentration data in the network. It can be observed that most nodes collect around 20 data measurements for every 10-minute time interval.

dfCO%>%
  ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable CO concentration Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfCO%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfCO1

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfCO1%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 37.500000 39.41967
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 21.739130 39.41967
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 45.833333 39.41967
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 45.833333 39.41967
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 25.000000 39.41967
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 16.666667 39.41967
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 66.666667 39.41967
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 87.500000 39.41967
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 37.500000 39.41967
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 25.000000 39.41967
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 16.666667 39.41967
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 41.666667 39.41967
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 13.043478 39.41967
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 16.666667 39.41967
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 37.500000 39.41967
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 16.666667 39.41967
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 45.833333 39.41967
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 12.500000 39.41967
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 37.500000 39.41967
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 45.833333 39.41967
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 54.166667 39.41967
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 29.166667 39.41967
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 37.500000 39.41967
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 62.500000 39.41967
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 79.166667 39.41967
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 60.869565 39.41967
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 62.500000 39.41967
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 70.833333 39.41967
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 54.166667 39.41967
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 79.166667 39.41967
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 83.333333 39.41967
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 91.666667 39.41967
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 91.666667 39.41967
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 95.833333 39.41967
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 100.000000 39.41967
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 83.333333 39.41967
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 95.833333 39.41967
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 83.333333 39.41967
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 65.217391 39.41967
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 58.333333 39.41967
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 54.166667 39.41967
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 70.833333 39.41967
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 62.500000 39.41967
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 54.166667 39.41967
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 66.666667 39.41967
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 75.000000 39.41967
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 58.333333 39.41967
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 60.869565 39.41967
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 41.666667 39.41967
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 33.333333 39.41967
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 37.500000 39.41967
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 62.500000 39.41967
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 58.333333 39.41967
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 66.666667 39.41967
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 95.833333 39.41967
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 58.333333 39.41967
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 33.333333 39.41967
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 45.454546 39.41967
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 29.166667 39.41967
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 25.000000 39.41967
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 29.166667 39.41967
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 20.833333 39.41967
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 29.166667 39.41967
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 41.666667 39.41967
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 54.166667 39.41967
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 25.000000 39.41967
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 86.956522 39.41967
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 70.833333 39.41967
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 62.500000 39.41967
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 54.166667 39.41967
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 66.666667 39.41967
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 58.333333 39.41967
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 29.166667 39.41967
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 20.833333 39.41967
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 33.333333 39.41967
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 20.833333 39.41967
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 25.000000 39.41967
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 17.391304 39.41967
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 29.166667 39.41967
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 12.500000 39.41967
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 8.695652 39.41967
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 12.500000 39.41967
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 0.000000 39.41967
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 8.333333 39.41967
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 4.166667 39.41967
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 37.500000 39.41967
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 33.333333 39.41967
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 30.434783 39.41967
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 50.000000 39.41967
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 58.333333 39.41967
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 25.000000 39.41967
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 12.500000 39.41967
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 33.333333 39.41967
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 78.260870 39.41967

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfCO1,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfCO1,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
      scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable CO Concentration Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 0% to 75.5% reliable.

dfCO1%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e0610eef2 41.965256 -87.66672 75.455415
001e06114500 41.714494 -87.643099 63.883350
001e06113ace 41.83107 -87.617298 51.487017
001e0610f05c 41.924903 -87.687703 43.870773
001e0610bc10 41.736314 -87.624179 43.631743
001e06114fd4 41.794477 -87.615957 43.574673
001e0610ba13 41.751238 -87.712990 39.419672
001e0610e537 41.961622 -87.665948 39.168176
001e0610ee43 41.788608 -87.598713 37.835900
001e061146bc 41.918733 -87.668257 34.312669
001e06113107 41.751142 -87.71299 33.855425
001e061130f4 41.896157 -87.662391 31.257091
001e06113cf1 41.884688 -87.627864 28.629479
001e06114503 41.666078 -87.539374 11.781424
001e0610f6db 41.791329 -87.598677 9.636675
001e061144c0 41.764122 -87.72242 8.485044
001e0610ba15 41.722457 -87.57535 3.145815
001e0610ba46 41.878377 -87.627678 2.435588
001e0610ef27 41.846579 -87.685557 0.000000
001e0610ee33 41.965089 -87.679076 0.000000
001e0610e532 41.857959 -87.656427 0.000000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfCO1$NodeMeanX, n = 5)

dfCO1$lat<-as.numeric(dfCO1$lat)
dfCO1$lon<-as.numeric(dfCO1$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfCO1,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfCO1$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfCO1$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfCO1$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for CO concentration data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfCO1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfCO1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
28.66028

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfCO1%>%
select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfCO2

dfCO2%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e061144c0 41.76412 -87.72242 100.00000 63.56703
001e0610ba46 41.87838 -87.62768 100.00000 63.56703
001e0610ba15 41.72246 -87.57535 100.00000 63.56703
001e0610f6db 41.79133 -87.59868 100.00000 63.56703
001e06114503 41.66608 -87.53937 96.64364 63.56703
001e06113107 41.75114 -87.71299 79.69208 63.56703
001e0610ba13 41.75124 -87.71299 79.05702 63.56703
001e06114500 41.71449 -87.64310 72.35331 63.56703
001e0610e537 41.96162 -87.66595 68.39149 63.56703
001e061130f4 41.89616 -87.66239 64.92731 63.56703
001e0610bc10 41.73631 -87.62418 64.42289 63.56703
001e0610f05c 41.92490 -87.68770 62.70975 63.56703
001e06114fd4 41.79448 -87.61596 62.23816 63.56703
001e061146bc 41.91873 -87.66826 62.19979 63.56703
001e0610eef2 41.96526 -87.66672 61.35660 63.56703
001e06113cf1 41.88469 -87.62786 61.11279 63.56703
001e0610ee43 41.78861 -87.59871 55.78541 63.56703
001e06113ace 41.83107 -87.61730 44.01734 63.56703
001e0610ef27 41.84658 -87.68556 0.00000 63.56703
001e0610ee33 41.96509 -87.67908 0.00000 63.56703
001e0610e532 41.85796 -87.65643 0.00000 63.56703

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfCO2$NodeSDScore, n = 5)

dfCO2$lat<-as.numeric(dfCO2$lat)
dfCO2$lon<-as.numeric(dfCO2$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfCO2,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfCO2$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfCO2$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfCO2$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for COerature data on 2012-12-15.

dfCO2%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\nOverall Consistency\nin Sensor Value Reliability', size = 4, vjust= -1.5, hjust=-1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT CO Concentration Network on 2012-12-15,

  • Score 1 = 28.7 : On average, only 28.7% of the CO concentration data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 63.4 : From the density distribution, it could be observed that the node sensor value reliability is consistently good more often than consistently bad, hence the above-moderate score here.

H2S Concentration

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting H2S concentration data in the network. It can be observed that most nodes collect around 20 data measurements for every 10-minute time interval.

dfH2S%>%
    ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable H2S Concentration Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfH2S%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfH2S1

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfH2S1%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 26.086956 36.72332
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 12.500000 36.72332
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 41.666667 36.72332
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 29.166667 36.72332
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 17.391304 36.72332
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 4.166667 36.72332
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 41.666667 36.72332
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 58.333333 36.72332
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 70.833333 36.72332
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 12.500000 36.72332
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 30.434783 36.72332
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 70.833333 36.72332
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 8.333333 36.72332
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 54.545454 36.72332
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 70.833333 36.72332
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 62.500000 36.72332
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 58.333333 36.72332
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 66.666667 36.72332
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 29.166667 36.72332
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 26.086956 36.72332
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 29.166667 36.72332
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 58.333333 36.72332
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 30.434783 36.72332
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 41.666667 36.72332
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 12.500000 36.72332
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 41.666667 36.72332
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 41.666667 36.72332
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 27.272727 36.72332
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 29.166667 36.72332
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 41.666667 36.72332
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 12.500000 36.72332
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 21.739130 36.72332
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 62.500000 36.72332
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 8.333333 36.72332
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 26.086956 36.72332
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 26.086956 36.72332
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 12.500000 36.72332
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 29.166667 36.72332
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 58.333333 36.72332
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 13.043478 36.72332
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 29.166667 36.72332
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 66.666667 36.72332
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 62.500000 36.72332
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 45.833333 36.72332
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 16.666667 36.72332
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 54.166667 36.72332
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 41.666667 36.72332
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 62.500000 36.72332
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 60.869565 36.72332
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 58.333333 36.72332
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 58.333333 36.72332
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 37.500000 36.72332
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 70.833333 36.72332
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 33.333333 36.72332
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 30.434783 36.72332
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 29.166667 36.72332
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 20.833333 36.72332
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 50.000000 36.72332
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 25.000000 36.72332
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 43.478261 36.72332

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfH2S1,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfH2S1,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
      scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable H2S Concentration Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 11.3% to 65.0% reliable.

dfH2S1%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e061130f4 41.896157 -87.662391 100.00000
001e0610bc10 41.736314 -87.624179 100.00000
001e06113ace 41.83107 -87.617298 100.00000
001e06114500 41.714494 -87.643099 94.44444
001e06114503 41.666078 -87.539374 93.66823
001e0610eef2 41.965256 -87.66672 92.06799
001e06114fd4 41.794477 -87.615957 85.80014
001e0610f05c 41.924903 -87.687703 62.21945
001e0610f6db 41.791329 -87.598677 57.51434
001e0610ba46 41.878377 -87.627678 53.15382
001e0610e537 41.961622 -87.665948 47.71538
001e06113cf1 41.884688 -87.627864 43.84561
001e061146bc 41.918733 -87.668257 41.73094
001e06113107 41.751142 -87.71299 41.00556
001e0610ba13 41.751238 -87.712990 36.72332
001e0610ee43 41.788608 -87.598713 27.72871
001e061144c0 41.764122 -87.72242 18.28327
001e0610ba15 41.722457 -87.57535 11.83541
001e0610ef27 41.846579 -87.685557 0.00000
001e0610ee33 41.965089 -87.679076 0.00000
001e0610e532 41.857959 -87.656427 0.00000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfH2S1$NodeMeanX, n = 5)

dfH2S1$lat<-as.numeric(dfH2S1$lat)
dfH2S1$lon<-as.numeric(dfH2S1$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfH2S1,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfH2S1$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfH2S1$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfH2S1$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for H2S concentration on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfH2S1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfH2S1%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
52.74936

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfH2S1%>%
select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfH2S2

dfH2S2%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e061144c0 41.76412 -87.72242 100.00000 61.80346
001e061130f4 41.89616 -87.66239 100.00000 61.80346
001e0610bc10 41.73631 -87.62418 100.00000 61.80346
001e06113ace 41.83107 -87.61730 100.00000 61.80346
001e06114500 41.71449 -87.64310 100.00000 61.80346
001e0610ba15 41.72246 -87.57535 99.84139 61.80346
001e06114503 41.66608 -87.53937 85.96484 61.80346
001e0610eef2 41.96526 -87.66672 83.67821 61.80346
001e06114fd4 41.79448 -87.61596 79.35105 61.80346
001e0610f6db 41.79133 -87.59868 74.27648 61.80346
001e0610ba46 41.87838 -87.62768 71.07506 61.80346
001e0610f05c 41.92490 -87.68770 68.82749 61.80346
001e0610ee43 41.78861 -87.59871 50.58038 61.80346
001e0610ba13 41.75124 -87.71299 41.95153 61.80346
001e06113107 41.75114 -87.71299 39.92343 61.80346
001e061146bc 41.91873 -87.66826 36.70309 61.80346
001e0610e537 41.96162 -87.66595 33.36216 61.80346
001e06113cf1 41.88469 -87.62786 32.33757 61.80346
001e0610ef27 41.84658 -87.68556 0.00000 61.80346
001e0610ee33 41.96509 -87.67908 0.00000 61.80346
001e0610e532 41.85796 -87.65643 0.00000 61.80346

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfH2S2$NodeSDScore, n = 5)

dfH2S2$lat<-as.numeric(dfH2S2$lat)
dfH2S2$lon<-as.numeric(dfH2S2$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfH2S2,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfH2S2$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfH2S2$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfH2S2$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for H2S concentration data on 2012-12-15.

dfH2S2%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT H2S concentration Network on 2012-12-15,

  • Score 1 = 52.7 : On average, only 52.7% of the H2S concentration data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 61.8 : From the density distribution, it could be observed that the node sensor value reliability is consistently good more often than consistently bad, hence the above-moderate score here.

NO2 Concentration

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting NO2 concentration data in the network. It can be observed that most nodes collect around 20 data measurements for every 10-minute time interval.

dfNO2%>%
    ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable NO2 Concentration Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfNO2%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfNO21

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfNO21%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 30.434783 40.26611
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 45.833333 40.26611
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 12.500000 40.26611
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 41.666667 40.26611
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 26.086956 40.26611
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 50.000000 40.26611
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 41.666667 40.26611
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 45.833333 40.26611
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 0.000000 40.26611
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 30.434783 40.26611
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 25.000000 40.26611
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 4.166667 40.26611
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 9.090909 40.26611
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 4.166667 40.26611
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 4.166667 40.26611
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 0.000000 40.26611
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 17.391304 40.26611
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 25.000000 40.26611
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 25.000000 40.26611
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 25.000000 40.26611
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 12.500000 40.26611
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 26.086956 40.26611
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 25.000000 40.26611
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 45.833333 40.26611
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 4.166667 40.26611
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 8.333333 40.26611
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 50.000000 40.26611
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 27.272727 40.26611
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 25.000000 40.26611
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 16.666667 40.26611
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 54.166667 40.26611
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 8.695652 40.26611
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 33.333333 40.26611
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 62.500000 40.26611
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 58.333333 40.26611
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 54.166667 40.26611
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 41.666667 40.26611
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 20.833333 40.26611
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 45.833333 40.26611
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 70.833333 40.26611
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 69.565217 40.26611
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 50.000000 40.26611
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 54.166667 40.26611
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 47.826087 40.26611
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 54.166667 40.26611
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 37.500000 40.26611
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 79.166667 40.26611
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 75.000000 40.26611
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 70.833333 40.26611
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 70.833333 40.26611
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 75.000000 40.26611
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 58.333333 40.26611
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 75.000000 40.26611
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 70.833333 40.26611
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 50.000000 40.26611
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 39.130435 40.26611
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 79.166667 40.26611
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 95.833333 40.26611
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 75.000000 40.26611
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 87.500000 40.26611
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 91.666667 40.26611
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 91.666667 40.26611
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 54.166667 40.26611
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 66.666667 40.26611
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 70.833333 40.26611
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 75.000000 40.26611
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 100.000000 40.26611
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 70.833333 40.26611
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 60.869565 40.26611
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 50.000000 40.26611
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 58.333333 40.26611
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 87.500000 40.26611
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 62.500000 40.26611
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 41.666667 40.26611
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 79.166667 40.26611
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 83.333333 40.26611
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 86.956522 40.26611
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 54.166667 40.26611
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 50.000000 40.26611
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 29.166667 40.26611
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 54.166667 40.26611
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 45.833333 40.26611
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 43.478261 40.26611

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfNO21,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfNO21,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
       scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable NO2 Concentration Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 0% to 100% reliable.

dfNO21%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e0610f05c 41.924903 -87.687703 100.000000
001e0610ba46 41.878377 -87.627678 100.000000
001e06113ace 41.83107 -87.617298 99.767261
001e0610eef2 41.965256 -87.66672 99.621326
001e06114fd4 41.794477 -87.615957 99.417522
001e061146bc 41.918733 -87.668257 99.069042
001e06113cf1 41.884688 -87.627864 98.714271
001e0610f6db 41.791329 -87.598677 98.231179
001e061130f4 41.896157 -87.662391 95.391757
001e061144c0 41.764122 -87.72242 91.421916
001e06113107 41.751142 -87.71299 85.478311
001e06114503 41.666078 -87.539374 85.264099
001e06114500 41.714494 -87.643099 81.703356
001e0610ee43 41.788608 -87.598713 77.654489
001e0610bc10 41.736314 -87.624179 68.007750
001e0610ba15 41.722457 -87.57535 53.672024
001e0610ba13 41.751238 -87.712990 40.266112
001e0610e537 41.961622 -87.665948 4.322665
001e0610ef27 41.846579 -87.685557 0.000000
001e0610ee33 41.965089 -87.679076 0.000000
001e0610e532 41.857959 -87.656427 0.000000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfNO21$NodeMeanX, n = 5)

dfNO21$lat<-as.numeric(dfNO21$lat)
dfNO21$lon<-as.numeric(dfNO21$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfNO21,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfNO21$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfNO21$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfNO21$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for NO2 concentration data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfNO21%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfNO21%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
70.3811

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfNO21%>%
select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfNO22

dfNO22%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e0610f05c 41.92490 -87.68770 100.00000 74.85045
001e0610ba46 41.87838 -87.62768 100.00000 74.85045
001e0610e537 41.96162 -87.66595 100.00000 74.85045
001e06113ace 41.83107 -87.61730 98.91567 74.85045
001e0610eef2 41.96526 -87.66672 98.21697 74.85045
001e06114fd4 41.79448 -87.61596 97.79367 74.85045
001e061144c0 41.76412 -87.72242 97.33161 74.85045
001e061146bc 41.91873 -87.66826 96.95073 74.85045
001e0610f6db 41.79133 -87.59868 96.90857 74.85045
001e06113cf1 41.88469 -87.62786 96.46912 74.85045
001e061130f4 41.89616 -87.66239 91.77244 74.85045
001e06113107 41.75114 -87.71299 90.37978 74.85045
001e06114503 41.66608 -87.53937 78.54444 74.85045
001e06114500 41.71449 -87.64310 78.01626 74.85045
001e0610ee43 41.78861 -87.59871 75.60495 74.85045
001e0610ba15 41.72246 -87.57535 68.24256 74.85045
001e0610ba13 41.75124 -87.71299 59.31641 74.85045
001e0610bc10 41.73631 -87.62418 47.39623 74.85045
001e0610ef27 41.84658 -87.68556 0.00000 74.85045
001e0610ee33 41.96509 -87.67908 0.00000 74.85045
001e0610e532 41.85796 -87.65643 0.00000 74.85045

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfNO22$NodeSDScore, n = 5)

dfNO22$lat<-as.numeric(dfNO22$lat)
dfNO22$lon<-as.numeric(dfNO22$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfNO22,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfNO22$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfNO22$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfNO22$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for NO2 concentration data on 2012-12-15.

dfNO22%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT NO2 concentration Network on 2012-12-15,

  • Score 1 = 70.4 : On average, only 70.4% of the NO2 concentration data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 74.9 : From the density distribution, it could be observed that the node sensor value reliability is consistently good more often than consistently bad, hence the above-moderate score here.

O3 Concentration

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting O3 concentration data in the network. It can be observed that most nodes collect around 20 data measurements for every 10-minute time interval.

dfO3%>%
    ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable O3 Concentration Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfO3%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfO31

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfO31%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 100 100
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 100 100

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfO31,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfO31,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
        scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable O3 Concentration Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 0% to 100% reliable.

dfO31%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e06113cf1 41.884688 -87.627864 100.00000
001e061146bc 41.918733 -87.668257 100.00000
001e0610f05c 41.924903 -87.687703 100.00000
001e0610ba46 41.878377 -87.627678 100.00000
001e0610bc10 41.736314 -87.624179 100.00000
001e0610ba13 41.751238 -87.712990 100.00000
001e0610f6db 41.791329 -87.598677 100.00000
001e0610eef2 41.965256 -87.66672 99.97106
001e061130f4 41.896157 -87.662391 99.88300
001e0610e537 41.961622 -87.665948 99.50684
001e06114fd4 41.794477 -87.615957 97.98209
001e0610ee43 41.788608 -87.598713 96.63345
001e06113107 41.751142 -87.71299 92.27179
001e061144c0 41.764122 -87.72242 92.12202
001e0610ba15 41.722457 -87.57535 92.10561
001e06113ace 41.83107 -87.617298 68.69716
001e06114500 41.714494 -87.643099 68.24957
001e06114503 41.666078 -87.539374 57.50222
001e0610ef27 41.846579 -87.685557 0.00000
001e0610ee33 41.965089 -87.679076 0.00000
001e0610e532 41.857959 -87.656427 0.00000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfO31$NodeMeanX, n = 5)

dfO31$lat<-as.numeric(dfO31$lat)
dfO31$lon<-as.numeric(dfO31$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfO31,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfO31$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfO31$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfO31$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for O3 Concentration data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfO31%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfO31%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
79.28213

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfO31%>%
select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             0, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfO32

dfO32%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e06113cf1 41.88469 -87.62786 100.00000 77.58294
001e061146bc 41.91873 -87.66826 100.00000 77.58294
001e0610f05c 41.92490 -87.68770 100.00000 77.58294
001e0610ba46 41.87838 -87.62768 100.00000 77.58294
001e0610bc10 41.73631 -87.62418 100.00000 77.58294
001e0610ba13 41.75124 -87.71299 100.00000 77.58294
001e0610f6db 41.79133 -87.59868 100.00000 77.58294
001e0610eef2 41.96526 -87.66672 99.65268 77.58294
001e06113107 41.75114 -87.71299 99.36023 77.58294
001e061130f4 41.89616 -87.66239 99.14727 77.58294
001e061144c0 41.76412 -87.72242 98.84547 77.58294
001e0610e537 41.96162 -87.66595 98.31807 77.58294
001e06114fd4 41.79448 -87.61596 95.93814 77.58294
001e0610ee43 41.78861 -87.59871 93.79447 77.58294
001e0610ba15 41.72246 -87.57535 88.93739 77.58294
001e06113ace 41.83107 -87.61730 59.97926 77.58294
001e06114500 41.71449 -87.64310 57.28921 77.58294
001e06114503 41.66608 -87.53937 37.97947 77.58294
001e0610ef27 41.84658 -87.68556 0.00000 77.58294
001e0610ee33 41.96509 -87.67908 0.00000 77.58294
001e0610e532 41.85796 -87.65643 0.00000 77.58294

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map. There, it can be picked out that node 001e06114503 is located at the southmost end of Chicago.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfO32$NodeSDScore, n = 5)

dfO32$lat<-as.numeric(dfO32$lat)
dfO32$lon<-as.numeric(dfO32$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfO32,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfO32$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfO32$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfO32$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for O3 Concentration data on 2012-12-15.

dfO32%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT O3 Concentration Network on 2012-12-15,

  • Score 1 = 79.3 : On average, only 79.3% of the O3 Concentration data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 77.6 : From the density distribution, it could be observed that the node sensor value reliability is consistently good more often than consistently bad, hence the above-moderate score here.

SO2 Concentration

We begin by observing how reliable and unreliable data measurements are distributed across the day for each node in the network on 2018-12-15. The figure below shows how the number of reliable and unreliable data collected varies across the day’s duration for each node collecting SO2 Concentration data in the network. It can be observed that most nodes collect around 20 data measurements for every 10-minute time interval.

dfSO2%>%
  ggplot()+
  geom_bar(aes(x=by10, fill=as.factor(val_qual)))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Number collected', x='Time', 
       title='Number of Reliable and Unreliable SO2 Concentration Data Collected on 2018-12-15 For Each Node\n- By Time')+
  scale_fill_manual(values=c('indianred1', 'cornflowerblue'), 
                    labels=c('Unreliable', 
                            'Reliable'), 
                    name="")+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

Calculate Proportion of Reliable Data Measurements

As the flowchart shows, we need to calculate X, which represents the proportion of reliable data collected by each node at each 10-minute time interval.

From X, we can then calculate the Node Sensor Value Reliability, which represents the average proportion of reliable data collected by each node during the day.

dfSO2%>%
  group_by(node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(node_id, by10, lat, lon, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  mutate(NodeMeanX = sum(X)/144)->dfSO21

The table below shows how X varies for each 10-minute time interval (by10) for a single node 001e0610ba13. Node Sensor Value Reliability remains constant, given that it is the average of X here.

dfSO21%>%
  arrange(node_id)%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id by10 lat lon X NodeMeanX
001e0610ba13 2018-12-15 00:00:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 00:10:00 41.751238 -87.712990 73.91304 60.72535
001e0610ba13 2018-12-15 00:20:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 00:30:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 00:40:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 00:50:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 01:00:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 01:10:00 41.751238 -87.712990 87.50000 60.72535
001e0610ba13 2018-12-15 01:20:00 41.751238 -87.712990 79.16667 60.72535
001e0610ba13 2018-12-15 01:30:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 01:40:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 01:50:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 02:00:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 02:10:00 41.751238 -87.712990 82.60870 60.72535
001e0610ba13 2018-12-15 02:20:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 02:30:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 02:40:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 02:50:00 41.751238 -87.712990 83.33333 60.72535
001e0610ba13 2018-12-15 03:00:00 41.751238 -87.712990 87.50000 60.72535
001e0610ba13 2018-12-15 03:10:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 03:20:00 41.751238 -87.712990 58.33333 60.72535
001e0610ba13 2018-12-15 03:30:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 03:40:00 41.751238 -87.712990 41.66667 60.72535
001e0610ba13 2018-12-15 03:50:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 04:00:00 41.751238 -87.712990 25.00000 60.72535
001e0610ba13 2018-12-15 04:10:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 04:20:00 41.751238 -87.712990 83.33333 60.72535
001e0610ba13 2018-12-15 04:30:00 41.751238 -87.712990 69.56522 60.72535
001e0610ba13 2018-12-15 04:40:00 41.751238 -87.712990 25.00000 60.72535
001e0610ba13 2018-12-15 04:50:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 05:00:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 05:10:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 05:20:00 41.751238 -87.712990 91.66667 60.72535
001e0610ba13 2018-12-15 05:30:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 05:40:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 05:50:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 06:00:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 06:10:00 41.751238 -87.712990 45.45455 60.72535
001e0610ba13 2018-12-15 06:20:00 41.751238 -87.712990 29.16667 60.72535
001e0610ba13 2018-12-15 06:30:00 41.751238 -87.712990 41.66667 60.72535
001e0610ba13 2018-12-15 06:40:00 41.751238 -87.712990 37.50000 60.72535
001e0610ba13 2018-12-15 06:50:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 07:00:00 41.751238 -87.712990 37.50000 60.72535
001e0610ba13 2018-12-15 07:10:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 07:20:00 41.751238 -87.712990 33.33333 60.72535
001e0610ba13 2018-12-15 07:30:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 07:40:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 07:50:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 08:00:00 41.751238 -87.712990 73.91304 60.72535
001e0610ba13 2018-12-15 08:10:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 08:20:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 08:30:00 41.751238 -87.712990 79.16667 60.72535
001e0610ba13 2018-12-15 08:40:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 08:50:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 09:00:00 41.751238 -87.712990 79.16667 60.72535
001e0610ba13 2018-12-15 09:10:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 09:20:00 41.751238 -87.712990 29.16667 60.72535
001e0610ba13 2018-12-15 09:30:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 09:40:00 41.751238 -87.712990 37.50000 60.72535
001e0610ba13 2018-12-15 09:50:00 41.751238 -87.712990 69.56522 60.72535
001e0610ba13 2018-12-15 10:00:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 10:10:00 41.751238 -87.712990 58.33333 60.72535
001e0610ba13 2018-12-15 10:20:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 10:30:00 41.751238 -87.712990 87.50000 60.72535
001e0610ba13 2018-12-15 10:40:00 41.751238 -87.712990 58.33333 60.72535
001e0610ba13 2018-12-15 10:50:00 41.751238 -87.712990 41.66667 60.72535
001e0610ba13 2018-12-15 11:00:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 11:10:00 41.751238 -87.712990 83.33333 60.72535
001e0610ba13 2018-12-15 11:20:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 11:30:00 41.751238 -87.712990 58.33333 60.72535
001e0610ba13 2018-12-15 11:40:00 41.751238 -87.712990 72.72727 60.72535
001e0610ba13 2018-12-15 11:50:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 12:00:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 12:10:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 12:20:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 12:30:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 12:40:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 12:50:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 13:00:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 13:10:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 13:20:00 41.751238 -87.712990 83.33333 60.72535
001e0610ba13 2018-12-15 13:30:00 41.751238 -87.712990 73.91304 60.72535
001e0610ba13 2018-12-15 13:40:00 41.751238 -87.712990 33.33333 60.72535
001e0610ba13 2018-12-15 13:50:00 41.751238 -87.712990 91.66667 60.72535
001e0610ba13 2018-12-15 14:00:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 14:10:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 14:20:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 14:30:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 14:40:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 14:50:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 15:00:00 41.751238 -87.712990 83.33333 60.72535
001e0610ba13 2018-12-15 15:10:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 15:20:00 41.751238 -87.712990 41.66667 60.72535
001e0610ba13 2018-12-15 15:30:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 15:40:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 15:50:00 41.751238 -87.712990 73.91304 60.72535
001e0610ba13 2018-12-15 16:00:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 16:10:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 16:20:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 16:30:00 41.751238 -87.712990 73.91304 60.72535
001e0610ba13 2018-12-15 16:40:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 16:50:00 41.751238 -87.712990 79.16667 60.72535
001e0610ba13 2018-12-15 17:00:00 41.751238 -87.712990 79.16667 60.72535
001e0610ba13 2018-12-15 17:10:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 17:20:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 17:30:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 17:40:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 17:50:00 41.751238 -87.712990 83.33333 60.72535
001e0610ba13 2018-12-15 18:00:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 18:10:00 41.751238 -87.712990 83.33333 60.72535
001e0610ba13 2018-12-15 18:20:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 18:30:00 41.751238 -87.712990 37.50000 60.72535
001e0610ba13 2018-12-15 18:40:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 18:50:00 41.751238 -87.712990 86.95652 60.72535
001e0610ba13 2018-12-15 19:00:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 19:10:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 19:20:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 19:30:00 41.751238 -87.712990 33.33333 60.72535
001e0610ba13 2018-12-15 19:40:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 19:50:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 20:00:00 41.751238 -87.712990 37.50000 60.72535
001e0610ba13 2018-12-15 20:10:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 20:20:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 20:30:00 41.751238 -87.712990 50.00000 60.72535
001e0610ba13 2018-12-15 20:40:00 41.751238 -87.712990 79.16667 60.72535
001e0610ba13 2018-12-15 20:50:00 41.751238 -87.712990 41.66667 60.72535
001e0610ba13 2018-12-15 21:00:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 21:10:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 21:20:00 41.751238 -87.712990 37.50000 60.72535
001e0610ba13 2018-12-15 21:30:00 41.751238 -87.712990 34.78261 60.72535
001e0610ba13 2018-12-15 21:40:00 41.751238 -87.712990 54.16667 60.72535
001e0610ba13 2018-12-15 21:50:00 41.751238 -87.712990 41.66667 60.72535
001e0610ba13 2018-12-15 22:00:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 22:10:00 41.751238 -87.712990 37.50000 60.72535
001e0610ba13 2018-12-15 22:20:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 22:30:00 41.751238 -87.712990 25.00000 60.72535
001e0610ba13 2018-12-15 22:40:00 41.751238 -87.712990 62.50000 60.72535
001e0610ba13 2018-12-15 22:50:00 41.751238 -87.712990 69.56522 60.72535
001e0610ba13 2018-12-15 23:00:00 41.751238 -87.712990 75.00000 60.72535
001e0610ba13 2018-12-15 23:10:00 41.751238 -87.712990 66.66667 60.72535
001e0610ba13 2018-12-15 23:20:00 41.751238 -87.712990 79.16667 60.72535
001e0610ba13 2018-12-15 23:30:00 41.751238 -87.712990 45.83333 60.72535
001e0610ba13 2018-12-15 23:40:00 41.751238 -87.712990 70.83333 60.72535
001e0610ba13 2018-12-15 23:50:00 41.751238 -87.712990 47.82609 60.72535

The plot below presents how X varies around each node’s Node Sensor Value Reliability.

  ggplot()+
  geom_line(data=dfSO21,aes(x=by10, y=NodeMeanX, group=1), col='black', size=1, linetype='dashed')+
  geom_line(data=dfSO21,aes(x=by10, y=X, group=1), col='indianred', size=1, alpha=0.5)+
        scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  scale_color_manual('',
                    values=c("Proportion of Reliable Data"='indianred'))+
  facet_wrap(~node_id, ncol=5)+
  labs(y='Proportion Reliable', x='Time', 
       title='Proportion of Reliable SO2 Data Collected on 2018-12-15 For Each Node - By Time',
       subtitle='Mean proportion for each node denoted by dashed line.')+
  plotTheme()+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below presents the Node Sensor Value Reliability of each node in the network. From the table, it can be observed that the average sensor value reliability levels for nodes vary between 0% to 82.9% reliable.

dfSO21%>%
  select(-X, -by10)%>%
  unique()%>%
  as.data.frame()%>%
  arrange(desc(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeMeanX
001e0610ba15 41.722457 -87.57535 82.9051383
001e061144c0 41.764122 -87.72242 75.6648315
001e0610ba13 41.751238 -87.712990 60.7253468
001e061146bc 41.918733 -87.668257 51.6541090
001e0610ee43 41.788608 -87.598713 49.8188406
001e06113cf1 41.884688 -87.627864 48.1645028
001e0610f05c 41.924903 -87.687703 46.3013285
001e06113107 41.751142 -87.71299 45.6653835
001e0610e537 41.961622 -87.665948 42.4152073
001e0610ba46 41.878377 -87.627678 42.2813964
001e0610f6db 41.791329 -87.598677 38.1982186
001e06114503 41.666078 -87.539374 31.3967346
001e06114fd4 41.794477 -87.615957 15.5725049
001e0610eef2 41.965256 -87.66672 7.8741445
001e061130f4 41.896157 -87.662391 0.1169988
001e0610ef27 41.846579 -87.685557 0.0000000
001e0610ee33 41.965089 -87.679076 0.0000000
001e0610bc10 41.736314 -87.624179 0.0000000
001e06113ace 41.83107 -87.617298 0.0000000
001e0610e532 41.857959 -87.656427 0.0000000
001e06114500 41.714494 -87.643099 0.0000000

The spatial distribution of this varying sensor value reliability levels by node is then visualised in the following map.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Blues", dfSO21$NodeMeanX, n = 5)

dfSO21$lat<-as.numeric(dfSO21$lat)
dfSO21$lon<-as.numeric(dfSO21$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfSO21,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeMeanX),fillOpacity = 0.5, 
                   popup = paste("Node:", dfSO21$node_id, "<br>",
                                 "Mean Proportion of Reliable Data Collected:", round(dfSO21$NodeMeanX), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfSO21$NodeMeanX, opacity = 0.7, title = NULL,position = "topright")

Constructing Score 1

The density plot below shows the distribution of node sensor value reliability relative to the its average - this average is taken as Score 1, which represents the overall network sensor value reliability of the AoT network for SO2 Concentration data on 2012-12-15. The network sensor value reliability represents essentially the average proportion of reliable data collected by each node at each time-interval of the day.

dfSO21%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 1`= mean(NodeMeanX))%>%
  ggplot()+
  geom_density(aes(NodeMeanX), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 1`), size = 2)+
  geom_text(aes(x= `Score 1`, y=0), label='Score 1:\n Average Proportion of\nReliable Data Collected\n by Each Node', size = 4, vjust= -2, hjust=-0.1)+
  labs(x= 'Node Sensor Value Reliability',
       y= 'Density',
       title = 'Distribution of Node Sensor Value Reliability Scores')+
  xlim(0, 100)+
  plotTheme()

dfSO21%>%
  select(node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  summarise(Score = mean(NodeMeanX))%>%
  kable()%>%
  kable_styling(bootstrap_options = "striped")
Score
30.41689

Constructing Score 2

Besides the average level of sensor value reliability (Score 1), we are also interested to know whether this level is consistent across the day for each node, and ultimately the average consistency in sensor value reliability levels in the network. Here, the value of consistency needs to be considered in relation of the average level of sensor value reliability scored. While a high level of consistency is desirable when the nodes are generally recording reliable data, a similarly high level of consistency is not at all desirable when the nodes are generally recording unreliable data. This is the basis for the second score.

The table below presents the node sensor value reliability consistency score for each node in the network. Here, the score is first standardised to fit a scale of 0 to 100, with a score of 100 indicating that the level of sensor value reliability is perfectly consistent across the day. In other words, the proportion of reliable values collected is identical for all the 10-minute time intervals of the day for the node. Then, depending on the average level of sensor value reliability, this consistency score is adjusted to reflect its desirability.

dfSO21%>%
 select(node_id, lat, lon, by10, X)%>%
  group_by(node_id, lat, lon)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>mean(X), 
                       mean(X),
                       abs(sd(X))))%>%
  mutate(NodeSDScore= ifelse(mean(X)==0, 
                             100, 
                             ifelse(mean(X)<50, 
                                    abs(100-abs(100-100*(NodeSD/mean(X)))),
                                    abs(100-100*(NodeSD/mean(X))))))%>%
  select(node_id, lat, lon, NodeSDScore)%>%
  unique()%>%
  as.data.frame()%>%
  mutate(`Score 2`= mean(NodeSDScore))->dfSO22

dfSO22%>%
  arrange(desc(NodeSDScore))%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id lat lon NodeSDScore Score 2
001e0610ef27 41.84658 -87.68556 100.00000 71.45969
001e061130f4 41.89616 -87.66239 100.00000 71.45969
001e06114fd4 41.79448 -87.61596 100.00000 71.45969
001e0610ee33 41.96509 -87.67908 100.00000 71.45969
001e0610bc10 41.73631 -87.62418 100.00000 71.45969
001e06113ace 41.83107 -87.61730 100.00000 71.45969
001e0610eef2 41.96526 -87.66672 100.00000 71.45969
001e0610e532 41.85796 -87.65643 100.00000 71.45969
001e06114500 41.71449 -87.64310 100.00000 71.45969
001e06114503 41.66608 -87.53937 79.17938 71.45969
001e0610ba15 41.72246 -87.57535 78.84579 71.45969
001e0610ba13 41.75124 -87.71299 74.04494 71.45969
001e061146bc 41.91873 -87.66826 71.19508 71.45969
001e061144c0 41.76412 -87.72242 60.37703 71.45969
001e0610f6db 41.79133 -87.59868 36.65302 71.45969
001e0610e537 41.96162 -87.66595 36.55186 71.45969
001e0610ba46 41.87838 -87.62768 36.14718 71.45969
001e0610f05c 41.92490 -87.68770 36.13700 71.45969
001e06113107 41.75114 -87.71299 33.48121 71.45969
001e0610ee43 41.78861 -87.59871 29.46617 71.45969
001e06113cf1 41.88469 -87.62786 28.57482 71.45969

The spatial distribution of this consistency in sensor value reliability by node is then visualised in the following map. There, it can be picked out that node 001e06114503 is located at the southmost end of Chicago.

chig<-readOGR('.', 'chigBound')
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\leech\OneDrive\Documents\Capstone\Exploratory", layer: "chigBound"
## with 1 features
## It has 1 fields
pal <- colorNumeric("Purples", dfSO22$NodeSDScore, n = 5)

dfSO22$lat<-as.numeric(dfSO22$lat)
dfSO22$lon<-as.numeric(dfSO22$lon)

leaflet() %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=chig, fillOpacity = 0, weight=2, color='black')%>%
  addCircleMarkers(data=dfSO22,
                   lng = ~lon, lat = ~lat, weight = 2,
                   radius = 7, opacity = 0.2,
                   fillColor= ~pal(NodeSDScore),fillOpacity = 0.5, 
                   popup = paste("Node:", dfSO22$node_id, "<br>",
                                 "Consistency Score for Level of Sensor Value Reliability:", round(dfSO22$NodeSDScore), "%", "<br>"))%>%
  addLegend(pal = pal, values = dfSO22$NodeSDScore, opacity = 0.7, title = NULL,position = "topright")

The density plot below shows the distribution ofnode sensor value reliability consistency scores relative to the the network average - this average is taken as Score 2, which represents the overall consistency in sensor value reliability of the AoT network for SO2 Concentration data on 2012-12-15.

dfSO22%>%
  ggplot()+
  geom_density(aes(NodeSDScore), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 2`), size = 2)+
  geom_text(aes(x= `Score 2`, y=0), label='Score 2:\n Overall Consistency\nin Sensor Value Reliability', size = 4, vjust= -2, hjust=1)+
  labs(x='Node Sensor Value Reliability Consistency Score', 
       y='Density',
       title='Distribution of Node Sensor Value Reliability Consistency Scores')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT SO2 Concentration Network on 2012-12-15,

  • Score 1 = 30.4 : On average, only 30.4% of the SO2 Concentration data measured by the nodes in the network every 10-minute is reliable. This is a low score.
  • Score 2 = 71.5 : From the density distribution, it could be observed that the node sensor value reliability is consistently good more often than consistently bad, hence the above-moderate score here.

3.4 Scoring Spatial Reliability

In this section, the method of scoring Spatial Reliability is presented for each data parameter type for the day of 2012-12-15. There are 2 scores obtained for this criteria. In summary, this section scores spatial reliability in terms of the average proportion of network active at any time-interval (Score 3) and average proportion of Chicago area covered (Score 4) at any time-interval during the day.

WhileScore 3 here is not strictly spatial, it helps us interpret the extent of spatial coverage indicated by Score 4 in relation to the number of nodes collecting reliable data. For instance, a tightly clustered pattern of many nodes in the network will have a high Score 3 but low Score 4, while a widely dispersed pattern of a few nodes in the network will have a low Score 3 but high Score 4. These two scores have to be interpreted together, and are therefore scored under the same criteria.

The flowchart below illustrates the scoring process in this section:

Click on the tabs below to view the scores constructed for each data parameter.

Temperature

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable temperature data in the AoT network vary across the day’s duration.

dfTemp%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfTemp3

ggplot(data=dfTemp3, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfTemp3$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable temperature data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable temperature data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfTemp3%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 40 46.51163 47.27067
2018-12-15 00:10:00 40 46.51163 47.27067
2018-12-15 00:20:00 40 46.51163 47.27067
2018-12-15 00:30:00 40 46.51163 47.27067
2018-12-15 00:40:00 40 46.51163 47.27067
2018-12-15 00:50:00 40 46.51163 47.27067
2018-12-15 01:00:00 40 46.51163 47.27067
2018-12-15 01:10:00 40 46.51163 47.27067
2018-12-15 01:20:00 40 46.51163 47.27067
2018-12-15 01:30:00 40 46.51163 47.27067
2018-12-15 01:40:00 40 46.51163 47.27067
2018-12-15 01:50:00 40 46.51163 47.27067
2018-12-15 02:00:00 40 46.51163 47.27067
2018-12-15 02:10:00 40 46.51163 47.27067
2018-12-15 02:20:00 40 46.51163 47.27067
2018-12-15 02:30:00 40 46.51163 47.27067
2018-12-15 02:40:00 40 46.51163 47.27067
2018-12-15 02:50:00 40 46.51163 47.27067
2018-12-15 03:00:00 40 46.51163 47.27067
2018-12-15 03:10:00 41 47.67442 47.27067
2018-12-15 03:20:00 41 47.67442 47.27067
2018-12-15 03:30:00 41 47.67442 47.27067
2018-12-15 03:40:00 41 47.67442 47.27067
2018-12-15 03:50:00 41 47.67442 47.27067
2018-12-15 04:00:00 41 47.67442 47.27067
2018-12-15 04:10:00 41 47.67442 47.27067
2018-12-15 04:20:00 41 47.67442 47.27067
2018-12-15 04:30:00 41 47.67442 47.27067
2018-12-15 04:40:00 41 47.67442 47.27067
2018-12-15 04:50:00 41 47.67442 47.27067
2018-12-15 05:00:00 41 47.67442 47.27067
2018-12-15 05:10:00 41 47.67442 47.27067
2018-12-15 05:20:00 41 47.67442 47.27067
2018-12-15 05:30:00 41 47.67442 47.27067
2018-12-15 05:40:00 41 47.67442 47.27067
2018-12-15 05:50:00 41 47.67442 47.27067
2018-12-15 06:00:00 41 47.67442 47.27067
2018-12-15 06:10:00 41 47.67442 47.27067
2018-12-15 06:20:00 41 47.67442 47.27067
2018-12-15 06:30:00 41 47.67442 47.27067
2018-12-15 06:40:00 41 47.67442 47.27067
2018-12-15 06:50:00 41 47.67442 47.27067
2018-12-15 07:00:00 41 47.67442 47.27067
2018-12-15 07:10:00 41 47.67442 47.27067
2018-12-15 07:20:00 41 47.67442 47.27067
2018-12-15 07:30:00 41 47.67442 47.27067
2018-12-15 07:40:00 41 47.67442 47.27067
2018-12-15 07:50:00 41 47.67442 47.27067
2018-12-15 08:00:00 41 47.67442 47.27067
2018-12-15 08:10:00 41 47.67442 47.27067
2018-12-15 08:20:00 41 47.67442 47.27067
2018-12-15 08:30:00 40 46.51163 47.27067
2018-12-15 08:40:00 40 46.51163 47.27067
2018-12-15 08:50:00 40 46.51163 47.27067
2018-12-15 09:00:00 40 46.51163 47.27067
2018-12-15 09:10:00 40 46.51163 47.27067
2018-12-15 09:20:00 40 46.51163 47.27067
2018-12-15 09:30:00 40 46.51163 47.27067
2018-12-15 09:40:00 40 46.51163 47.27067
2018-12-15 09:50:00 40 46.51163 47.27067
2018-12-15 10:00:00 40 46.51163 47.27067
2018-12-15 10:10:00 41 47.67442 47.27067
2018-12-15 10:20:00 41 47.67442 47.27067
2018-12-15 10:30:00 41 47.67442 47.27067
2018-12-15 10:40:00 41 47.67442 47.27067
2018-12-15 10:50:00 41 47.67442 47.27067
2018-12-15 11:00:00 41 47.67442 47.27067
2018-12-15 11:10:00 41 47.67442 47.27067
2018-12-15 11:20:00 41 47.67442 47.27067
2018-12-15 11:30:00 41 47.67442 47.27067
2018-12-15 11:40:00 41 47.67442 47.27067
2018-12-15 11:50:00 41 47.67442 47.27067
2018-12-15 12:00:00 41 47.67442 47.27067
2018-12-15 12:10:00 41 47.67442 47.27067
2018-12-15 12:20:00 41 47.67442 47.27067
2018-12-15 12:30:00 41 47.67442 47.27067
2018-12-15 12:40:00 41 47.67442 47.27067
2018-12-15 12:50:00 41 47.67442 47.27067
2018-12-15 13:00:00 41 47.67442 47.27067
2018-12-15 13:10:00 41 47.67442 47.27067
2018-12-15 13:20:00 41 47.67442 47.27067
2018-12-15 13:30:00 41 47.67442 47.27067
2018-12-15 13:40:00 41 47.67442 47.27067
2018-12-15 13:50:00 41 47.67442 47.27067
2018-12-15 14:00:00 41 47.67442 47.27067
2018-12-15 14:10:00 41 47.67442 47.27067
2018-12-15 14:20:00 41 47.67442 47.27067
2018-12-15 14:30:00 41 47.67442 47.27067
2018-12-15 14:40:00 41 47.67442 47.27067
2018-12-15 14:50:00 41 47.67442 47.27067
2018-12-15 15:00:00 40 46.51163 47.27067
2018-12-15 15:10:00 39 45.34884 47.27067
2018-12-15 15:20:00 39 45.34884 47.27067
2018-12-15 15:30:00 39 45.34884 47.27067
2018-12-15 15:40:00 39 45.34884 47.27067
2018-12-15 15:50:00 39 45.34884 47.27067
2018-12-15 16:00:00 39 45.34884 47.27067
2018-12-15 16:10:00 39 45.34884 47.27067
2018-12-15 16:20:00 39 45.34884 47.27067
2018-12-15 16:30:00 39 45.34884 47.27067
2018-12-15 16:40:00 39 45.34884 47.27067
2018-12-15 16:50:00 41 47.67442 47.27067
2018-12-15 17:00:00 41 47.67442 47.27067
2018-12-15 17:10:00 41 47.67442 47.27067
2018-12-15 17:20:00 41 47.67442 47.27067
2018-12-15 17:30:00 41 47.67442 47.27067
2018-12-15 17:40:00 41 47.67442 47.27067
2018-12-15 17:50:00 41 47.67442 47.27067
2018-12-15 18:00:00 41 47.67442 47.27067
2018-12-15 18:10:00 41 47.67442 47.27067
2018-12-15 18:20:00 41 47.67442 47.27067
2018-12-15 18:30:00 41 47.67442 47.27067
2018-12-15 18:40:00 41 47.67442 47.27067
2018-12-15 18:50:00 41 47.67442 47.27067
2018-12-15 19:00:00 41 47.67442 47.27067
2018-12-15 19:10:00 41 47.67442 47.27067
2018-12-15 19:20:00 41 47.67442 47.27067
2018-12-15 19:30:00 41 47.67442 47.27067
2018-12-15 19:40:00 41 47.67442 47.27067
2018-12-15 19:50:00 41 47.67442 47.27067
2018-12-15 20:00:00 41 47.67442 47.27067
2018-12-15 20:10:00 41 47.67442 47.27067
2018-12-15 20:20:00 41 47.67442 47.27067
2018-12-15 20:30:00 41 47.67442 47.27067
2018-12-15 20:40:00 41 47.67442 47.27067
2018-12-15 20:50:00 41 47.67442 47.27067
2018-12-15 21:00:00 41 47.67442 47.27067
2018-12-15 21:10:00 41 47.67442 47.27067
2018-12-15 21:20:00 41 47.67442 47.27067
2018-12-15 21:30:00 41 47.67442 47.27067
2018-12-15 21:40:00 41 47.67442 47.27067
2018-12-15 21:50:00 41 47.67442 47.27067
2018-12-15 22:00:00 41 47.67442 47.27067
2018-12-15 22:10:00 41 47.67442 47.27067
2018-12-15 22:20:00 41 47.67442 47.27067
2018-12-15 22:30:00 41 47.67442 47.27067
2018-12-15 22:40:00 41 47.67442 47.27067
2018-12-15 22:50:00 41 47.67442 47.27067
2018-12-15 23:00:00 41 47.67442 47.27067
2018-12-15 23:10:00 41 47.67442 47.27067
2018-12-15 23:20:00 41 47.67442 47.27067
2018-12-15 23:30:00 41 47.67442 47.27067
2018-12-15 23:40:00 41 47.67442 47.27067
2018-12-15 23:50:00 41 47.67442 47.27067

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for temperature data on 2012-12-15.

dfTemp3%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfTemp%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfTemp4

dfTemp4%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
by10 node_id lat lon
2018-12-15 00:00:00 001e0610ee36 41.75129 -87.60529
2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:00:00 001e0610bc12 41.75034 -87.66352
2018-12-15 00:00:00 001e06113f54 41.88461 -87.62458
2018-12-15 00:00:00 001e0611537d 41.79417 -87.60165
2018-12-15 00:00:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:00:00 001e0610ef27 41.84658 -87.68556
2018-12-15 00:00:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:00:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:00:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:00:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:00:00 001e06114503 41.66608 -87.53937
2018-12-15 00:00:00 001e0611536c 41.88575 -87.62969
2018-12-15 00:00:00 001e0610ee5d 41.92400 -87.76107
2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 00:00:00 001e06113dbc 41.71387 -87.53651
2018-12-15 00:00:00 001e0610e532 41.85796 -87.65643
2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:00:00 001e0610f703 41.87148 -87.67644
2018-12-15 00:00:00 001e0610fb4c 41.91358 -87.68241
2018-12-15 00:00:00 001e06113d22 41.80085 -87.70374
2018-12-15 00:00:00 001e0610ee33 41.96509 -87.67908
2018-12-15 00:00:00 001e0610e538 41.73659 -87.60476
2018-12-15 00:00:00 001e0610f732 41.89500 -87.74582
2018-12-15 00:00:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:00:00 001e0610eef4 41.91268 -87.68105
2018-12-15 00:00:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:00:00 001e0610bbf9 41.76832 -87.68340
2018-12-15 00:00:00 001e06113a48 41.94326 -87.68807
2018-12-15 00:00:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:00:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:00:00 001e0611462f 41.82353 -87.64105
2018-12-15 00:00:00 001e0610f8f4 41.83258 -87.64613
2018-12-15 00:00:00 001e061135cb 41.77937 -87.66442
2018-12-15 00:00:00 001e0610e835 41.96876 -87.67917
2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:00:00 001e061146ba 41.96759 -87.76257
2018-12-15 00:00:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:10:00 001e0610ba15 41.72246 -87.57535

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfTemp4a<-NULL
for(i in unique(dfTemp4$by10)){
  
subset <- 
      dfTemp4%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

Ps2AreaProp<-gArea(Ps2)/chigArea

df1<-NULL
df1$by10<-i
df1$AreaProp<-Ps2AreaProp
df1<-as.data.frame(df1)

dfTemp4a<-rbind(dfTemp4a, df1)
  
}

dfTemp4a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfTemp4a
dfTemp4a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.7217209 72.17209
2018-12-15 00:10:00 0.7217209 72.17209
2018-12-15 00:20:00 0.7217209 72.17209
2018-12-15 00:30:00 0.7217209 72.17209
2018-12-15 00:40:00 0.7217209 72.17209
2018-12-15 00:50:00 0.7217209 72.17209
2018-12-15 01:00:00 0.7217209 72.17209
2018-12-15 01:10:00 0.7217209 72.17209
2018-12-15 01:20:00 0.7217209 72.17209
2018-12-15 01:30:00 0.7217209 72.17209
2018-12-15 01:40:00 0.7217209 72.17209
2018-12-15 01:50:00 0.7217209 72.17209
2018-12-15 02:00:00 0.7217209 72.17209
2018-12-15 02:10:00 0.7217209 72.17209
2018-12-15 02:20:00 0.7217209 72.17209
2018-12-15 02:30:00 0.7217209 72.17209
2018-12-15 02:40:00 0.7217209 72.17209
2018-12-15 02:50:00 0.7217209 72.17209
2018-12-15 03:00:00 0.7217209 72.17209
2018-12-15 03:10:00 0.7217209 72.17209
2018-12-15 03:20:00 0.7217209 72.17209
2018-12-15 03:30:00 0.7217209 72.17209
2018-12-15 03:40:00 0.7217209 72.17209
2018-12-15 03:50:00 0.7217209 72.17209
2018-12-15 04:00:00 0.7217209 72.17209
2018-12-15 04:10:00 0.7217209 72.17209
2018-12-15 04:20:00 0.7217209 72.17209
2018-12-15 04:30:00 0.7217209 72.17209
2018-12-15 04:40:00 0.7217209 72.17209
2018-12-15 04:50:00 0.7217209 72.17209
2018-12-15 05:00:00 0.7217209 72.17209
2018-12-15 05:10:00 0.7217209 72.17209
2018-12-15 05:20:00 0.7217209 72.17209
2018-12-15 05:30:00 0.7217209 72.17209
2018-12-15 05:40:00 0.7217209 72.17209
2018-12-15 05:50:00 0.7217209 72.17209
2018-12-15 06:00:00 0.7217209 72.17209
2018-12-15 06:10:00 0.7217209 72.17209
2018-12-15 06:20:00 0.7217209 72.17209
2018-12-15 06:30:00 0.7217209 72.17209
2018-12-15 06:40:00 0.7217209 72.17209
2018-12-15 06:50:00 0.7217209 72.17209
2018-12-15 07:00:00 0.7217209 72.17209
2018-12-15 07:10:00 0.7217209 72.17209
2018-12-15 07:20:00 0.7217209 72.17209
2018-12-15 07:30:00 0.7217209 72.17209
2018-12-15 07:40:00 0.7217209 72.17209
2018-12-15 07:50:00 0.7217209 72.17209
2018-12-15 08:00:00 0.7217209 72.17209
2018-12-15 08:10:00 0.7217209 72.17209
2018-12-15 08:20:00 0.7217209 72.17209
2018-12-15 08:30:00 0.7217209 72.17209
2018-12-15 08:40:00 0.7217209 72.17209
2018-12-15 08:50:00 0.7217209 72.17209
2018-12-15 09:00:00 0.7217209 72.17209
2018-12-15 09:10:00 0.7217209 72.17209
2018-12-15 09:20:00 0.7217209 72.17209
2018-12-15 09:30:00 0.7217209 72.17209
2018-12-15 09:40:00 0.7217209 72.17209
2018-12-15 09:50:00 0.7217209 72.17209
2018-12-15 10:00:00 0.7217209 72.17209
2018-12-15 10:10:00 0.7217209 72.17209
2018-12-15 10:20:00 0.7217209 72.17209
2018-12-15 10:30:00 0.7217209 72.17209
2018-12-15 10:40:00 0.7217209 72.17209
2018-12-15 10:50:00 0.7217209 72.17209
2018-12-15 11:00:00 0.7217209 72.17209
2018-12-15 11:10:00 0.7217209 72.17209
2018-12-15 11:20:00 0.7217209 72.17209
2018-12-15 11:30:00 0.7217209 72.17209
2018-12-15 11:40:00 0.7217209 72.17209
2018-12-15 11:50:00 0.7217209 72.17209
2018-12-15 12:00:00 0.7217209 72.17209
2018-12-15 12:10:00 0.7217209 72.17209
2018-12-15 12:20:00 0.7217209 72.17209
2018-12-15 12:30:00 0.7217209 72.17209
2018-12-15 12:40:00 0.7217209 72.17209
2018-12-15 12:50:00 0.7217209 72.17209
2018-12-15 13:00:00 0.7217209 72.17209
2018-12-15 13:10:00 0.7217209 72.17209
2018-12-15 13:20:00 0.7217209 72.17209
2018-12-15 13:30:00 0.7217209 72.17209
2018-12-15 13:40:00 0.7217209 72.17209
2018-12-15 13:50:00 0.7217209 72.17209
2018-12-15 14:00:00 0.7217209 72.17209
2018-12-15 14:10:00 0.7217209 72.17209
2018-12-15 14:20:00 0.7217209 72.17209
2018-12-15 14:30:00 0.7217209 72.17209
2018-12-15 14:40:00 0.7217209 72.17209
2018-12-15 14:50:00 0.7217209 72.17209
2018-12-15 15:00:00 0.7217209 72.17209
2018-12-15 15:10:00 0.7217209 72.17209
2018-12-15 15:20:00 0.7217209 72.17209
2018-12-15 15:30:00 0.7217209 72.17209
2018-12-15 15:40:00 0.7217209 72.17209
2018-12-15 15:50:00 0.7217209 72.17209
2018-12-15 16:00:00 0.7217209 72.17209
2018-12-15 16:10:00 0.7217209 72.17209
2018-12-15 16:20:00 0.7217209 72.17209
2018-12-15 16:30:00 0.7217209 72.17209
2018-12-15 16:40:00 0.7217209 72.17209
2018-12-15 16:50:00 0.7217209 72.17209
2018-12-15 17:00:00 0.7217209 72.17209
2018-12-15 17:10:00 0.7217209 72.17209
2018-12-15 17:20:00 0.7217209 72.17209
2018-12-15 17:30:00 0.7217209 72.17209
2018-12-15 17:40:00 0.7217209 72.17209
2018-12-15 17:50:00 0.7217209 72.17209
2018-12-15 18:00:00 0.7217209 72.17209
2018-12-15 18:10:00 0.7217209 72.17209
2018-12-15 18:20:00 0.7217209 72.17209
2018-12-15 18:30:00 0.7217209 72.17209
2018-12-15 18:40:00 0.7217209 72.17209
2018-12-15 18:50:00 0.7217209 72.17209
2018-12-15 19:00:00 0.7217209 72.17209
2018-12-15 19:10:00 0.7217209 72.17209
2018-12-15 19:20:00 0.7217209 72.17209
2018-12-15 19:30:00 0.7217209 72.17209
2018-12-15 19:40:00 0.7217209 72.17209
2018-12-15 19:50:00 0.7217209 72.17209
2018-12-15 20:00:00 0.7217209 72.17209
2018-12-15 20:10:00 0.7217209 72.17209
2018-12-15 20:20:00 0.7217209 72.17209
2018-12-15 20:30:00 0.7217209 72.17209
2018-12-15 20:40:00 0.7217209 72.17209
2018-12-15 20:50:00 0.7217209 72.17209
2018-12-15 21:00:00 0.7217209 72.17209
2018-12-15 21:10:00 0.7217209 72.17209
2018-12-15 21:20:00 0.7217209 72.17209
2018-12-15 21:30:00 0.7217209 72.17209
2018-12-15 21:40:00 0.7217209 72.17209
2018-12-15 21:50:00 0.7217209 72.17209
2018-12-15 22:00:00 0.7217209 72.17209
2018-12-15 22:10:00 0.7217209 72.17209
2018-12-15 22:20:00 0.7217209 72.17209
2018-12-15 22:30:00 0.7217209 72.17209
2018-12-15 22:40:00 0.7217209 72.17209
2018-12-15 22:50:00 0.7217209 72.17209
2018-12-15 23:00:00 0.7217209 72.17209
2018-12-15 23:10:00 0.7217209 72.17209
2018-12-15 23:20:00 0.7217209 72.17209
2018-12-15 23:30:00 0.7217209 72.17209
2018-12-15 23:40:00 0.7217209 72.17209
2018-12-15 23:50:00 0.7217209 72.17209

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the temperature network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfTemp4$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfTemp4b<-merge(dfTemp4, by10, by='by10', all.x=TRUE)

for(i in unique(dfTemp4b$by10)){

subset <- 
      dfTemp4b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT Temperature Network on 2012-12-15,

  • Score 3 = 43.7 : At any given 10-minute time interval in any given node, an average 43.7% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 72.2 : At any given 10-minute time interval in any given node, reliable data is collected for an average 72.2% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

Humidity

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable humidity data in the AoT network vary across the day’s duration.

dfHumidity%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfHumidity3
## Adding missing grouping variables: `date`
ggplot(data=dfHumidity3, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfHumidity3$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable humidity data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable humidity data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfHumidity3%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 18 20.93023 20.84141
2018-12-15 00:10:00 18 20.93023 20.84141
2018-12-15 00:20:00 18 20.93023 20.84141
2018-12-15 00:30:00 18 20.93023 20.84141
2018-12-15 00:40:00 18 20.93023 20.84141
2018-12-15 00:50:00 18 20.93023 20.84141
2018-12-15 01:00:00 18 20.93023 20.84141
2018-12-15 01:10:00 18 20.93023 20.84141
2018-12-15 01:20:00 18 20.93023 20.84141
2018-12-15 01:30:00 18 20.93023 20.84141
2018-12-15 01:40:00 18 20.93023 20.84141
2018-12-15 01:50:00 18 20.93023 20.84141
2018-12-15 02:00:00 18 20.93023 20.84141
2018-12-15 02:10:00 18 20.93023 20.84141
2018-12-15 02:20:00 18 20.93023 20.84141
2018-12-15 02:30:00 18 20.93023 20.84141
2018-12-15 02:40:00 18 20.93023 20.84141
2018-12-15 02:50:00 18 20.93023 20.84141
2018-12-15 03:00:00 18 20.93023 20.84141
2018-12-15 03:10:00 18 20.93023 20.84141
2018-12-15 03:20:00 18 20.93023 20.84141
2018-12-15 03:30:00 18 20.93023 20.84141
2018-12-15 03:40:00 18 20.93023 20.84141
2018-12-15 03:50:00 18 20.93023 20.84141
2018-12-15 04:00:00 18 20.93023 20.84141
2018-12-15 04:10:00 18 20.93023 20.84141
2018-12-15 04:20:00 18 20.93023 20.84141
2018-12-15 04:30:00 18 20.93023 20.84141
2018-12-15 04:40:00 18 20.93023 20.84141
2018-12-15 04:50:00 18 20.93023 20.84141
2018-12-15 05:00:00 18 20.93023 20.84141
2018-12-15 05:10:00 18 20.93023 20.84141
2018-12-15 05:20:00 18 20.93023 20.84141
2018-12-15 05:30:00 18 20.93023 20.84141
2018-12-15 05:40:00 18 20.93023 20.84141
2018-12-15 05:50:00 18 20.93023 20.84141
2018-12-15 06:00:00 18 20.93023 20.84141
2018-12-15 06:10:00 18 20.93023 20.84141
2018-12-15 06:20:00 18 20.93023 20.84141
2018-12-15 06:30:00 18 20.93023 20.84141
2018-12-15 06:40:00 18 20.93023 20.84141
2018-12-15 06:50:00 18 20.93023 20.84141
2018-12-15 07:00:00 18 20.93023 20.84141
2018-12-15 07:10:00 18 20.93023 20.84141
2018-12-15 07:20:00 18 20.93023 20.84141
2018-12-15 07:30:00 18 20.93023 20.84141
2018-12-15 07:40:00 18 20.93023 20.84141
2018-12-15 07:50:00 18 20.93023 20.84141
2018-12-15 08:00:00 18 20.93023 20.84141
2018-12-15 08:10:00 18 20.93023 20.84141
2018-12-15 08:20:00 18 20.93023 20.84141
2018-12-15 08:30:00 18 20.93023 20.84141
2018-12-15 08:40:00 18 20.93023 20.84141
2018-12-15 08:50:00 18 20.93023 20.84141
2018-12-15 09:00:00 18 20.93023 20.84141
2018-12-15 09:10:00 18 20.93023 20.84141
2018-12-15 09:20:00 18 20.93023 20.84141
2018-12-15 09:30:00 18 20.93023 20.84141
2018-12-15 09:40:00 18 20.93023 20.84141
2018-12-15 09:50:00 18 20.93023 20.84141
2018-12-15 10:00:00 18 20.93023 20.84141
2018-12-15 10:10:00 18 20.93023 20.84141
2018-12-15 10:20:00 18 20.93023 20.84141
2018-12-15 10:30:00 18 20.93023 20.84141
2018-12-15 10:40:00 18 20.93023 20.84141
2018-12-15 10:50:00 18 20.93023 20.84141
2018-12-15 11:00:00 18 20.93023 20.84141
2018-12-15 11:10:00 18 20.93023 20.84141
2018-12-15 11:20:00 18 20.93023 20.84141
2018-12-15 11:30:00 18 20.93023 20.84141
2018-12-15 11:40:00 18 20.93023 20.84141
2018-12-15 11:50:00 18 20.93023 20.84141
2018-12-15 12:00:00 18 20.93023 20.84141
2018-12-15 12:10:00 18 20.93023 20.84141
2018-12-15 12:20:00 18 20.93023 20.84141
2018-12-15 12:30:00 18 20.93023 20.84141
2018-12-15 12:40:00 18 20.93023 20.84141
2018-12-15 12:50:00 18 20.93023 20.84141
2018-12-15 13:00:00 18 20.93023 20.84141
2018-12-15 13:10:00 18 20.93023 20.84141
2018-12-15 13:20:00 18 20.93023 20.84141
2018-12-15 13:30:00 18 20.93023 20.84141
2018-12-15 13:40:00 18 20.93023 20.84141
2018-12-15 13:50:00 18 20.93023 20.84141
2018-12-15 14:00:00 18 20.93023 20.84141
2018-12-15 14:10:00 18 20.93023 20.84141
2018-12-15 14:20:00 18 20.93023 20.84141
2018-12-15 14:30:00 18 20.93023 20.84141
2018-12-15 14:40:00 18 20.93023 20.84141
2018-12-15 14:50:00 18 20.93023 20.84141
2018-12-15 15:00:00 17 19.76744 20.84141
2018-12-15 15:10:00 17 19.76744 20.84141
2018-12-15 15:20:00 17 19.76744 20.84141
2018-12-15 15:30:00 17 19.76744 20.84141
2018-12-15 15:40:00 17 19.76744 20.84141
2018-12-15 15:50:00 17 19.76744 20.84141
2018-12-15 16:00:00 17 19.76744 20.84141
2018-12-15 16:10:00 17 19.76744 20.84141
2018-12-15 16:20:00 17 19.76744 20.84141
2018-12-15 16:30:00 17 19.76744 20.84141
2018-12-15 16:40:00 17 19.76744 20.84141
2018-12-15 16:50:00 18 20.93023 20.84141
2018-12-15 17:00:00 18 20.93023 20.84141
2018-12-15 17:10:00 18 20.93023 20.84141
2018-12-15 17:20:00 18 20.93023 20.84141
2018-12-15 17:30:00 18 20.93023 20.84141
2018-12-15 17:40:00 18 20.93023 20.84141
2018-12-15 17:50:00 18 20.93023 20.84141
2018-12-15 18:00:00 18 20.93023 20.84141
2018-12-15 18:10:00 18 20.93023 20.84141
2018-12-15 18:20:00 18 20.93023 20.84141
2018-12-15 18:30:00 18 20.93023 20.84141
2018-12-15 18:40:00 18 20.93023 20.84141
2018-12-15 18:50:00 18 20.93023 20.84141
2018-12-15 19:00:00 18 20.93023 20.84141
2018-12-15 19:10:00 18 20.93023 20.84141
2018-12-15 19:20:00 18 20.93023 20.84141
2018-12-15 19:30:00 18 20.93023 20.84141
2018-12-15 19:40:00 18 20.93023 20.84141
2018-12-15 19:50:00 18 20.93023 20.84141
2018-12-15 20:00:00 18 20.93023 20.84141
2018-12-15 20:10:00 18 20.93023 20.84141
2018-12-15 20:20:00 18 20.93023 20.84141
2018-12-15 20:30:00 18 20.93023 20.84141
2018-12-15 20:40:00 18 20.93023 20.84141
2018-12-15 20:50:00 18 20.93023 20.84141
2018-12-15 21:00:00 18 20.93023 20.84141
2018-12-15 21:10:00 18 20.93023 20.84141
2018-12-15 21:20:00 18 20.93023 20.84141
2018-12-15 21:30:00 18 20.93023 20.84141
2018-12-15 21:40:00 18 20.93023 20.84141
2018-12-15 21:50:00 18 20.93023 20.84141
2018-12-15 22:00:00 18 20.93023 20.84141
2018-12-15 22:10:00 18 20.93023 20.84141
2018-12-15 22:20:00 18 20.93023 20.84141
2018-12-15 22:30:00 18 20.93023 20.84141
2018-12-15 22:40:00 18 20.93023 20.84141
2018-12-15 22:50:00 18 20.93023 20.84141
2018-12-15 23:00:00 18 20.93023 20.84141
2018-12-15 23:10:00 18 20.93023 20.84141
2018-12-15 23:20:00 18 20.93023 20.84141
2018-12-15 23:30:00 18 20.93023 20.84141
2018-12-15 23:40:00 18 20.93023 20.84141
2018-12-15 23:50:00 18 20.93023 20.84141

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for humidity data on 2012-12-15.

dfHumidity3%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfHumidity%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfHumidity4
## Adding missing grouping variables: `date`
dfHumidity4%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
date by10 node_id lat lon
2018-12-15 2018-12-15 00:00:00 001e0610ee36 41.75129 -87.60529
2018-12-15 2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 2018-12-15 00:00:00 001e0610bc12 41.75034 -87.66352
2018-12-15 2018-12-15 00:00:00 001e06113f54 41.88461 -87.62458
2018-12-15 2018-12-15 00:00:00 001e061130f4 41.89616 -87.66239
2018-12-15 2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 2018-12-15 00:00:00 001e0610ee5d 41.92400 -87.76107
2018-12-15 2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 2018-12-15 00:00:00 001e06113dbc 41.71387 -87.53651
2018-12-15 2018-12-15 00:00:00 001e0610e532 41.85796 -87.65643
2018-12-15 2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 2018-12-15 00:00:00 001e0610ee33 41.96509 -87.67908
2018-12-15 2018-12-15 00:00:00 001e0610f732 41.89500 -87.74582
2018-12-15 2018-12-15 00:00:00 001e0610bbf9 41.76832 -87.68340
2018-12-15 2018-12-15 00:00:00 001e06113a48 41.94326 -87.68807
2018-12-15 2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 2018-12-15 00:00:00 001e0610e537 41.96162 -87.66595
2018-12-15 2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 2018-12-15 00:10:00 001e0610ee36 41.75129 -87.60529
2018-12-15 2018-12-15 00:10:00 001e06113f54 41.88461 -87.62458
2018-12-15 2018-12-15 00:10:00 001e0610e537 41.96162 -87.66595
2018-12-15 2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 2018-12-15 00:10:00 001e0610ee43 41.78861 -87.59871
2018-12-15 2018-12-15 00:10:00 001e061130f4 41.89616 -87.66239
2018-12-15 2018-12-15 00:10:00 001e0610f6db 41.79133 -87.59868
2018-12-15 2018-12-15 00:10:00 001e0610ee5d 41.92400 -87.76107
2018-12-15 2018-12-15 00:10:00 001e06113cf1 41.88469 -87.62786
2018-12-15 2018-12-15 00:10:00 001e0610e532 41.85796 -87.65643
2018-12-15 2018-12-15 00:10:00 001e06113dbc 41.71387 -87.53651
2018-12-15 2018-12-15 00:10:00 001e0610ba46 41.87838 -87.62768
2018-12-15 2018-12-15 00:10:00 001e0610f732 41.89500 -87.74582
2018-12-15 2018-12-15 00:10:00 001e0610bbf9 41.76832 -87.68340
2018-12-15 2018-12-15 00:10:00 001e0610ee33 41.96509 -87.67908
2018-12-15 2018-12-15 00:10:00 001e06113a48 41.94326 -87.68807
2018-12-15 2018-12-15 00:10:00 001e0610ba13 41.75124 -87.71299
2018-12-15 2018-12-15 00:10:00 001e0610bc12 41.75034 -87.66352
2018-12-15 2018-12-15 00:20:00 001e0610ba13 41.75124 -87.71299
2018-12-15 2018-12-15 00:20:00 001e0610ee36 41.75129 -87.60529
2018-12-15 2018-12-15 00:20:00 001e0610ee5d 41.92400 -87.76107
2018-12-15 2018-12-15 00:20:00 001e06113f54 41.88461 -87.62458
2018-12-15 2018-12-15 00:20:00 001e0610e537 41.96162 -87.66595

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfHumidity4a<-NULL
for(i in unique(dfHumidity4$by10)){
  
subset <- 
      dfHumidity4%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

Ps2AreaProp<-gArea(Ps2)/chigArea

df1<-NULL
df1$by10<-i
df1$AreaProp<-Ps2AreaProp
df1<-as.data.frame(df1)

dfHumidity4a<-rbind(dfHumidity4a, df1)
  
}

dfHumidity4a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfHumidity4a
dfHumidity4a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.5907152 59.07152
2018-12-15 00:10:00 0.5907152 59.07152
2018-12-15 00:20:00 0.5907152 59.07152
2018-12-15 00:30:00 0.5907152 59.07152
2018-12-15 00:40:00 0.5907152 59.07152
2018-12-15 00:50:00 0.5907152 59.07152
2018-12-15 01:00:00 0.5907152 59.07152
2018-12-15 01:10:00 0.5907152 59.07152
2018-12-15 01:20:00 0.5907152 59.07152
2018-12-15 01:30:00 0.5907152 59.07152
2018-12-15 01:40:00 0.5907152 59.07152
2018-12-15 01:50:00 0.5907152 59.07152
2018-12-15 02:00:00 0.5907152 59.07152
2018-12-15 02:10:00 0.5907152 59.07152
2018-12-15 02:20:00 0.5907152 59.07152
2018-12-15 02:30:00 0.5907152 59.07152
2018-12-15 02:40:00 0.5907152 59.07152
2018-12-15 02:50:00 0.5907152 59.07152
2018-12-15 03:00:00 0.5907152 59.07152
2018-12-15 03:10:00 0.5907152 59.07152
2018-12-15 03:20:00 0.5907152 59.07152
2018-12-15 03:30:00 0.5907152 59.07152
2018-12-15 03:40:00 0.5907152 59.07152
2018-12-15 03:50:00 0.5907152 59.07152
2018-12-15 04:00:00 0.5907152 59.07152
2018-12-15 04:10:00 0.5907152 59.07152
2018-12-15 04:20:00 0.5907152 59.07152
2018-12-15 04:30:00 0.5907152 59.07152
2018-12-15 04:40:00 0.5907152 59.07152
2018-12-15 04:50:00 0.5907152 59.07152
2018-12-15 05:00:00 0.5907152 59.07152
2018-12-15 05:10:00 0.5907152 59.07152
2018-12-15 05:20:00 0.5907152 59.07152
2018-12-15 05:30:00 0.5907152 59.07152
2018-12-15 05:40:00 0.5907152 59.07152
2018-12-15 05:50:00 0.5907152 59.07152
2018-12-15 06:00:00 0.5907152 59.07152
2018-12-15 06:10:00 0.5907152 59.07152
2018-12-15 06:20:00 0.5907152 59.07152
2018-12-15 06:30:00 0.5907152 59.07152
2018-12-15 06:40:00 0.5907152 59.07152
2018-12-15 06:50:00 0.5907152 59.07152
2018-12-15 07:00:00 0.5907152 59.07152
2018-12-15 07:10:00 0.5907152 59.07152
2018-12-15 07:20:00 0.5907152 59.07152
2018-12-15 07:30:00 0.5907152 59.07152
2018-12-15 07:40:00 0.5907152 59.07152
2018-12-15 07:50:00 0.5907152 59.07152
2018-12-15 08:00:00 0.5907152 59.07152
2018-12-15 08:10:00 0.5907152 59.07152
2018-12-15 08:20:00 0.5907152 59.07152
2018-12-15 08:30:00 0.5907152 59.07152
2018-12-15 08:40:00 0.5907152 59.07152
2018-12-15 08:50:00 0.5907152 59.07152
2018-12-15 09:00:00 0.5907152 59.07152
2018-12-15 09:10:00 0.5907152 59.07152
2018-12-15 09:20:00 0.5907152 59.07152
2018-12-15 09:30:00 0.5907152 59.07152
2018-12-15 09:40:00 0.5907152 59.07152
2018-12-15 09:50:00 0.5907152 59.07152
2018-12-15 10:00:00 0.5907152 59.07152
2018-12-15 10:10:00 0.5907152 59.07152
2018-12-15 10:20:00 0.5907152 59.07152
2018-12-15 10:30:00 0.5907152 59.07152
2018-12-15 10:40:00 0.5907152 59.07152
2018-12-15 10:50:00 0.5907152 59.07152
2018-12-15 11:00:00 0.5907152 59.07152
2018-12-15 11:10:00 0.5907152 59.07152
2018-12-15 11:20:00 0.5907152 59.07152
2018-12-15 11:30:00 0.5907152 59.07152
2018-12-15 11:40:00 0.5907152 59.07152
2018-12-15 11:50:00 0.5907152 59.07152
2018-12-15 12:00:00 0.5907152 59.07152
2018-12-15 12:10:00 0.5907152 59.07152
2018-12-15 12:20:00 0.5907152 59.07152
2018-12-15 12:30:00 0.5907152 59.07152
2018-12-15 12:40:00 0.5907152 59.07152
2018-12-15 12:50:00 0.5907152 59.07152
2018-12-15 13:00:00 0.5907152 59.07152
2018-12-15 13:10:00 0.5907152 59.07152
2018-12-15 13:20:00 0.5907152 59.07152
2018-12-15 13:30:00 0.5907152 59.07152
2018-12-15 13:40:00 0.5907152 59.07152
2018-12-15 13:50:00 0.5907152 59.07152
2018-12-15 14:00:00 0.5907152 59.07152
2018-12-15 14:10:00 0.5907152 59.07152
2018-12-15 14:20:00 0.5907152 59.07152
2018-12-15 14:30:00 0.5907152 59.07152
2018-12-15 14:40:00 0.5907152 59.07152
2018-12-15 14:50:00 0.5907152 59.07152
2018-12-15 15:00:00 0.5907152 59.07152
2018-12-15 15:10:00 0.5907152 59.07152
2018-12-15 15:20:00 0.5907152 59.07152
2018-12-15 15:30:00 0.5907152 59.07152
2018-12-15 15:40:00 0.5907152 59.07152
2018-12-15 15:50:00 0.5907152 59.07152
2018-12-15 16:00:00 0.5907152 59.07152
2018-12-15 16:10:00 0.5907152 59.07152
2018-12-15 16:20:00 0.5907152 59.07152
2018-12-15 16:30:00 0.5907152 59.07152
2018-12-15 16:40:00 0.5907152 59.07152
2018-12-15 16:50:00 0.5907152 59.07152
2018-12-15 17:00:00 0.5907152 59.07152
2018-12-15 17:10:00 0.5907152 59.07152
2018-12-15 17:20:00 0.5907152 59.07152
2018-12-15 17:30:00 0.5907152 59.07152
2018-12-15 17:40:00 0.5907152 59.07152
2018-12-15 17:50:00 0.5907152 59.07152
2018-12-15 18:00:00 0.5907152 59.07152
2018-12-15 18:10:00 0.5907152 59.07152
2018-12-15 18:20:00 0.5907152 59.07152
2018-12-15 18:30:00 0.5907152 59.07152
2018-12-15 18:40:00 0.5907152 59.07152
2018-12-15 18:50:00 0.5907152 59.07152
2018-12-15 19:00:00 0.5907152 59.07152
2018-12-15 19:10:00 0.5907152 59.07152
2018-12-15 19:20:00 0.5907152 59.07152
2018-12-15 19:30:00 0.5907152 59.07152
2018-12-15 19:40:00 0.5907152 59.07152
2018-12-15 19:50:00 0.5907152 59.07152
2018-12-15 20:00:00 0.5907152 59.07152
2018-12-15 20:10:00 0.5907152 59.07152
2018-12-15 20:20:00 0.5907152 59.07152
2018-12-15 20:30:00 0.5907152 59.07152
2018-12-15 20:40:00 0.5907152 59.07152
2018-12-15 20:50:00 0.5907152 59.07152
2018-12-15 21:00:00 0.5907152 59.07152
2018-12-15 21:10:00 0.5907152 59.07152
2018-12-15 21:20:00 0.5907152 59.07152
2018-12-15 21:30:00 0.5907152 59.07152
2018-12-15 21:40:00 0.5907152 59.07152
2018-12-15 21:50:00 0.5907152 59.07152
2018-12-15 22:00:00 0.5907152 59.07152
2018-12-15 22:10:00 0.5907152 59.07152
2018-12-15 22:20:00 0.5907152 59.07152
2018-12-15 22:30:00 0.5907152 59.07152
2018-12-15 22:40:00 0.5907152 59.07152
2018-12-15 22:50:00 0.5907152 59.07152
2018-12-15 23:00:00 0.5907152 59.07152
2018-12-15 23:10:00 0.5907152 59.07152
2018-12-15 23:20:00 0.5907152 59.07152
2018-12-15 23:30:00 0.5907152 59.07152
2018-12-15 23:40:00 0.5907152 59.07152
2018-12-15 23:50:00 0.5907152 59.07152

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the humidity network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfHumidity4$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfHumidity4b<-merge(dfHumidity4, by10, by='by10', all.x=TRUE)

for(i in unique(dfHumidity4b$by10)){

subset <- 
      dfHumidity4b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT humidity Network on 2012-12-15,

  • Score 3 = 20.8 : At any given 10-minute time interval in any given node, an average 20.8% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 59.1 : At any given 10-minute time interval in any given node, reliable data is collected for an average 59.1% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

Pressure

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable pressure data in the AoT network vary across the day’s duration.

dfPressure%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfPressure3
## Adding missing grouping variables: `date`
ggplot(data=dfPressure3, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfPressure3$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable pressure data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable pressure data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfPressure3%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 24 27.90698 27.64858
2018-12-15 00:10:00 24 27.90698 27.64858
2018-12-15 00:20:00 24 27.90698 27.64858
2018-12-15 00:30:00 24 27.90698 27.64858
2018-12-15 00:40:00 24 27.90698 27.64858
2018-12-15 00:50:00 24 27.90698 27.64858
2018-12-15 01:00:00 24 27.90698 27.64858
2018-12-15 01:10:00 24 27.90698 27.64858
2018-12-15 01:20:00 23 26.74419 27.64858
2018-12-15 01:30:00 23 26.74419 27.64858
2018-12-15 01:40:00 23 26.74419 27.64858
2018-12-15 01:50:00 23 26.74419 27.64858
2018-12-15 02:00:00 23 26.74419 27.64858
2018-12-15 02:10:00 23 26.74419 27.64858
2018-12-15 02:20:00 23 26.74419 27.64858
2018-12-15 02:30:00 23 26.74419 27.64858
2018-12-15 02:40:00 23 26.74419 27.64858
2018-12-15 02:50:00 23 26.74419 27.64858
2018-12-15 03:00:00 23 26.74419 27.64858
2018-12-15 03:10:00 24 27.90698 27.64858
2018-12-15 03:20:00 24 27.90698 27.64858
2018-12-15 03:30:00 24 27.90698 27.64858
2018-12-15 03:40:00 24 27.90698 27.64858
2018-12-15 03:50:00 24 27.90698 27.64858
2018-12-15 04:00:00 24 27.90698 27.64858
2018-12-15 04:10:00 24 27.90698 27.64858
2018-12-15 04:20:00 24 27.90698 27.64858
2018-12-15 04:30:00 24 27.90698 27.64858
2018-12-15 04:40:00 24 27.90698 27.64858
2018-12-15 04:50:00 24 27.90698 27.64858
2018-12-15 05:00:00 24 27.90698 27.64858
2018-12-15 05:10:00 24 27.90698 27.64858
2018-12-15 05:20:00 24 27.90698 27.64858
2018-12-15 05:30:00 24 27.90698 27.64858
2018-12-15 05:40:00 24 27.90698 27.64858
2018-12-15 05:50:00 24 27.90698 27.64858
2018-12-15 06:00:00 24 27.90698 27.64858
2018-12-15 06:10:00 24 27.90698 27.64858
2018-12-15 06:20:00 24 27.90698 27.64858
2018-12-15 06:30:00 24 27.90698 27.64858
2018-12-15 06:40:00 24 27.90698 27.64858
2018-12-15 06:50:00 24 27.90698 27.64858
2018-12-15 07:00:00 24 27.90698 27.64858
2018-12-15 07:10:00 24 27.90698 27.64858
2018-12-15 07:20:00 24 27.90698 27.64858
2018-12-15 07:30:00 24 27.90698 27.64858
2018-12-15 07:40:00 24 27.90698 27.64858
2018-12-15 07:50:00 24 27.90698 27.64858
2018-12-15 08:00:00 24 27.90698 27.64858
2018-12-15 08:10:00 24 27.90698 27.64858
2018-12-15 08:20:00 24 27.90698 27.64858
2018-12-15 08:30:00 24 27.90698 27.64858
2018-12-15 08:40:00 24 27.90698 27.64858
2018-12-15 08:50:00 24 27.90698 27.64858
2018-12-15 09:00:00 24 27.90698 27.64858
2018-12-15 09:10:00 24 27.90698 27.64858
2018-12-15 09:20:00 24 27.90698 27.64858
2018-12-15 09:30:00 24 27.90698 27.64858
2018-12-15 09:40:00 24 27.90698 27.64858
2018-12-15 09:50:00 24 27.90698 27.64858
2018-12-15 10:00:00 24 27.90698 27.64858
2018-12-15 10:10:00 24 27.90698 27.64858
2018-12-15 10:20:00 24 27.90698 27.64858
2018-12-15 10:30:00 24 27.90698 27.64858
2018-12-15 10:40:00 24 27.90698 27.64858
2018-12-15 10:50:00 24 27.90698 27.64858
2018-12-15 11:00:00 24 27.90698 27.64858
2018-12-15 11:10:00 24 27.90698 27.64858
2018-12-15 11:20:00 24 27.90698 27.64858
2018-12-15 11:30:00 24 27.90698 27.64858
2018-12-15 11:40:00 24 27.90698 27.64858
2018-12-15 11:50:00 24 27.90698 27.64858
2018-12-15 12:00:00 24 27.90698 27.64858
2018-12-15 12:10:00 24 27.90698 27.64858
2018-12-15 12:20:00 24 27.90698 27.64858
2018-12-15 12:30:00 24 27.90698 27.64858
2018-12-15 12:40:00 24 27.90698 27.64858
2018-12-15 12:50:00 24 27.90698 27.64858
2018-12-15 13:00:00 24 27.90698 27.64858
2018-12-15 13:10:00 24 27.90698 27.64858
2018-12-15 13:20:00 24 27.90698 27.64858
2018-12-15 13:30:00 24 27.90698 27.64858
2018-12-15 13:40:00 24 27.90698 27.64858
2018-12-15 13:50:00 24 27.90698 27.64858
2018-12-15 14:00:00 24 27.90698 27.64858
2018-12-15 14:10:00 24 27.90698 27.64858
2018-12-15 14:20:00 24 27.90698 27.64858
2018-12-15 14:30:00 24 27.90698 27.64858
2018-12-15 14:40:00 24 27.90698 27.64858
2018-12-15 14:50:00 24 27.90698 27.64858
2018-12-15 15:00:00 23 26.74419 27.64858
2018-12-15 15:10:00 22 25.58140 27.64858
2018-12-15 15:20:00 22 25.58140 27.64858
2018-12-15 15:30:00 22 25.58140 27.64858
2018-12-15 15:40:00 22 25.58140 27.64858
2018-12-15 15:50:00 22 25.58140 27.64858
2018-12-15 16:00:00 22 25.58140 27.64858
2018-12-15 16:10:00 22 25.58140 27.64858
2018-12-15 16:20:00 22 25.58140 27.64858
2018-12-15 16:30:00 22 25.58140 27.64858
2018-12-15 16:40:00 22 25.58140 27.64858
2018-12-15 16:50:00 24 27.90698 27.64858
2018-12-15 17:00:00 24 27.90698 27.64858
2018-12-15 17:10:00 24 27.90698 27.64858
2018-12-15 17:20:00 24 27.90698 27.64858
2018-12-15 17:30:00 24 27.90698 27.64858
2018-12-15 17:40:00 24 27.90698 27.64858
2018-12-15 17:50:00 24 27.90698 27.64858
2018-12-15 18:00:00 24 27.90698 27.64858
2018-12-15 18:10:00 24 27.90698 27.64858
2018-12-15 18:20:00 24 27.90698 27.64858
2018-12-15 18:30:00 24 27.90698 27.64858
2018-12-15 18:40:00 24 27.90698 27.64858
2018-12-15 18:50:00 24 27.90698 27.64858
2018-12-15 19:00:00 24 27.90698 27.64858
2018-12-15 19:10:00 24 27.90698 27.64858
2018-12-15 19:20:00 24 27.90698 27.64858
2018-12-15 19:30:00 24 27.90698 27.64858
2018-12-15 19:40:00 24 27.90698 27.64858
2018-12-15 19:50:00 24 27.90698 27.64858
2018-12-15 20:00:00 24 27.90698 27.64858
2018-12-15 20:10:00 24 27.90698 27.64858
2018-12-15 20:20:00 24 27.90698 27.64858
2018-12-15 20:30:00 24 27.90698 27.64858
2018-12-15 20:40:00 24 27.90698 27.64858
2018-12-15 20:50:00 24 27.90698 27.64858
2018-12-15 21:00:00 24 27.90698 27.64858
2018-12-15 21:10:00 24 27.90698 27.64858
2018-12-15 21:20:00 24 27.90698 27.64858
2018-12-15 21:30:00 24 27.90698 27.64858
2018-12-15 21:40:00 24 27.90698 27.64858
2018-12-15 21:50:00 24 27.90698 27.64858
2018-12-15 22:00:00 24 27.90698 27.64858
2018-12-15 22:10:00 24 27.90698 27.64858
2018-12-15 22:20:00 24 27.90698 27.64858
2018-12-15 22:30:00 24 27.90698 27.64858
2018-12-15 22:40:00 24 27.90698 27.64858
2018-12-15 22:50:00 24 27.90698 27.64858
2018-12-15 23:00:00 24 27.90698 27.64858
2018-12-15 23:10:00 24 27.90698 27.64858
2018-12-15 23:20:00 24 27.90698 27.64858
2018-12-15 23:30:00 24 27.90698 27.64858
2018-12-15 23:40:00 24 27.90698 27.64858
2018-12-15 23:50:00 24 27.90698 27.64858

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for pressure data on 2012-12-15.

dfPressure3%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfPressure%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfPressure4
## Adding missing grouping variables: `date`
dfPressure4%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
date by10 node_id lat lon
2018-12-15 2018-12-15 00:00:00 001e0610ee36 41.75129 -87.60529
2018-12-15 2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 2018-12-15 00:00:00 001e06113f54 41.88461 -87.62458
2018-12-15 2018-12-15 00:00:00 001e0611537d 41.79417 -87.60165
2018-12-15 2018-12-15 00:00:00 001e061144c0 41.76412 -87.72242
2018-12-15 2018-12-15 00:00:00 001e061130f4 41.89616 -87.66239
2018-12-15 2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 2018-12-15 00:00:00 001e0610f05c 41.92490 -87.68770
2018-12-15 2018-12-15 00:00:00 001e0610ee5d 41.92400 -87.76107
2018-12-15 2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 2018-12-15 00:00:00 001e06113dbc 41.71387 -87.53651
2018-12-15 2018-12-15 00:00:00 001e0610e532 41.85796 -87.65643
2018-12-15 2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 2018-12-15 00:00:00 001e0610ee33 41.96509 -87.67908
2018-12-15 2018-12-15 00:00:00 001e0610e538 41.73659 -87.60476
2018-12-15 2018-12-15 00:00:00 001e0610f732 41.89500 -87.74582
2018-12-15 2018-12-15 00:00:00 001e0610bc10 41.73631 -87.62418
2018-12-15 2018-12-15 00:00:00 001e0610eef4 41.91268 -87.68105
2018-12-15 2018-12-15 00:00:00 001e0610bbf9 41.76832 -87.68340
2018-12-15 2018-12-15 00:00:00 001e06113a48 41.94326 -87.68807
2018-12-15 2018-12-15 00:00:00 001e0610ba15 41.72246 -87.57535
2018-12-15 2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 2018-12-15 00:00:00 001e0610e537 41.96162 -87.66595
2018-12-15 2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 2018-12-15 00:10:00 001e0610ba15 41.72246 -87.57535
2018-12-15 2018-12-15 00:10:00 001e0610ee36 41.75129 -87.60529
2018-12-15 2018-12-15 00:10:00 001e06113f54 41.88461 -87.62458
2018-12-15 2018-12-15 00:10:00 001e0610e537 41.96162 -87.66595
2018-12-15 2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 2018-12-15 00:10:00 001e0610ee43 41.78861 -87.59871
2018-12-15 2018-12-15 00:10:00 001e061130f4 41.89616 -87.66239
2018-12-15 2018-12-15 00:10:00 001e0610e538 41.73659 -87.60476
2018-12-15 2018-12-15 00:10:00 001e0610f6db 41.79133 -87.59868
2018-12-15 2018-12-15 00:10:00 001e061144c0 41.76412 -87.72242
2018-12-15 2018-12-15 00:10:00 001e0610ee5d 41.92400 -87.76107
2018-12-15 2018-12-15 00:10:00 001e06113cf1 41.88469 -87.62786
2018-12-15 2018-12-15 00:10:00 001e0610e532 41.85796 -87.65643
2018-12-15 2018-12-15 00:10:00 001e06113dbc 41.71387 -87.53651
2018-12-15 2018-12-15 00:10:00 001e0610ba46 41.87838 -87.62768
2018-12-15 2018-12-15 00:10:00 001e0610eef4 41.91268 -87.68105
2018-12-15 2018-12-15 00:10:00 001e0611537d 41.79417 -87.60165

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfPressure4a<-NULL
for(i in unique(dfPressure4$by10)){
  
subset <- 
      dfPressure4%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

Ps2AreaProp<-gArea(Ps2)/chigArea

df1<-NULL
df1$by10<-i
df1$AreaProp<-Ps2AreaProp
df1<-as.data.frame(df1)

dfPressure4a<-rbind(dfPressure4a, df1)
  
}

dfPressure4a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfPressure4a
dfPressure4a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.5907152 59.07152
2018-12-15 00:10:00 0.5907152 59.07152
2018-12-15 00:20:00 0.5907152 59.07152
2018-12-15 00:30:00 0.5907152 59.07152
2018-12-15 00:40:00 0.5907152 59.07152
2018-12-15 00:50:00 0.5907152 59.07152
2018-12-15 01:00:00 0.5907152 59.07152
2018-12-15 01:10:00 0.5907152 59.07152
2018-12-15 01:20:00 0.5907152 59.07152
2018-12-15 01:30:00 0.5907152 59.07152
2018-12-15 01:40:00 0.5907152 59.07152
2018-12-15 01:50:00 0.5907152 59.07152
2018-12-15 02:00:00 0.5907152 59.07152
2018-12-15 02:10:00 0.5907152 59.07152
2018-12-15 02:20:00 0.5907152 59.07152
2018-12-15 02:30:00 0.5907152 59.07152
2018-12-15 02:40:00 0.5907152 59.07152
2018-12-15 02:50:00 0.5907152 59.07152
2018-12-15 03:00:00 0.5907152 59.07152
2018-12-15 03:10:00 0.5907152 59.07152
2018-12-15 03:20:00 0.5907152 59.07152
2018-12-15 03:30:00 0.5907152 59.07152
2018-12-15 03:40:00 0.5907152 59.07152
2018-12-15 03:50:00 0.5907152 59.07152
2018-12-15 04:00:00 0.5907152 59.07152
2018-12-15 04:10:00 0.5907152 59.07152
2018-12-15 04:20:00 0.5907152 59.07152
2018-12-15 04:30:00 0.5907152 59.07152
2018-12-15 04:40:00 0.5907152 59.07152
2018-12-15 04:50:00 0.5907152 59.07152
2018-12-15 05:00:00 0.5907152 59.07152
2018-12-15 05:10:00 0.5907152 59.07152
2018-12-15 05:20:00 0.5907152 59.07152
2018-12-15 05:30:00 0.5907152 59.07152
2018-12-15 05:40:00 0.5907152 59.07152
2018-12-15 05:50:00 0.5907152 59.07152
2018-12-15 06:00:00 0.5907152 59.07152
2018-12-15 06:10:00 0.5907152 59.07152
2018-12-15 06:20:00 0.5907152 59.07152
2018-12-15 06:30:00 0.5907152 59.07152
2018-12-15 06:40:00 0.5907152 59.07152
2018-12-15 06:50:00 0.5907152 59.07152
2018-12-15 07:00:00 0.5907152 59.07152
2018-12-15 07:10:00 0.5907152 59.07152
2018-12-15 07:20:00 0.5907152 59.07152
2018-12-15 07:30:00 0.5907152 59.07152
2018-12-15 07:40:00 0.5907152 59.07152
2018-12-15 07:50:00 0.5907152 59.07152
2018-12-15 08:00:00 0.5907152 59.07152
2018-12-15 08:10:00 0.5907152 59.07152
2018-12-15 08:20:00 0.5907152 59.07152
2018-12-15 08:30:00 0.5907152 59.07152
2018-12-15 08:40:00 0.5907152 59.07152
2018-12-15 08:50:00 0.5907152 59.07152
2018-12-15 09:00:00 0.5907152 59.07152
2018-12-15 09:10:00 0.5907152 59.07152
2018-12-15 09:20:00 0.5907152 59.07152
2018-12-15 09:30:00 0.5907152 59.07152
2018-12-15 09:40:00 0.5907152 59.07152
2018-12-15 09:50:00 0.5907152 59.07152
2018-12-15 10:00:00 0.5907152 59.07152
2018-12-15 10:10:00 0.5907152 59.07152
2018-12-15 10:20:00 0.5907152 59.07152
2018-12-15 10:30:00 0.5907152 59.07152
2018-12-15 10:40:00 0.5907152 59.07152
2018-12-15 10:50:00 0.5907152 59.07152
2018-12-15 11:00:00 0.5907152 59.07152
2018-12-15 11:10:00 0.5907152 59.07152
2018-12-15 11:20:00 0.5907152 59.07152
2018-12-15 11:30:00 0.5907152 59.07152
2018-12-15 11:40:00 0.5907152 59.07152
2018-12-15 11:50:00 0.5907152 59.07152
2018-12-15 12:00:00 0.5907152 59.07152
2018-12-15 12:10:00 0.5907152 59.07152
2018-12-15 12:20:00 0.5907152 59.07152
2018-12-15 12:30:00 0.5907152 59.07152
2018-12-15 12:40:00 0.5907152 59.07152
2018-12-15 12:50:00 0.5907152 59.07152
2018-12-15 13:00:00 0.5907152 59.07152
2018-12-15 13:10:00 0.5907152 59.07152
2018-12-15 13:20:00 0.5907152 59.07152
2018-12-15 13:30:00 0.5907152 59.07152
2018-12-15 13:40:00 0.5907152 59.07152
2018-12-15 13:50:00 0.5907152 59.07152
2018-12-15 14:00:00 0.5907152 59.07152
2018-12-15 14:10:00 0.5907152 59.07152
2018-12-15 14:20:00 0.5907152 59.07152
2018-12-15 14:30:00 0.5907152 59.07152
2018-12-15 14:40:00 0.5907152 59.07152
2018-12-15 14:50:00 0.5907152 59.07152
2018-12-15 15:00:00 0.5907152 59.07152
2018-12-15 15:10:00 0.5907152 59.07152
2018-12-15 15:20:00 0.5907152 59.07152
2018-12-15 15:30:00 0.5907152 59.07152
2018-12-15 15:40:00 0.5907152 59.07152
2018-12-15 15:50:00 0.5907152 59.07152
2018-12-15 16:00:00 0.5907152 59.07152
2018-12-15 16:10:00 0.5907152 59.07152
2018-12-15 16:20:00 0.5907152 59.07152
2018-12-15 16:30:00 0.5907152 59.07152
2018-12-15 16:40:00 0.5907152 59.07152
2018-12-15 16:50:00 0.5907152 59.07152
2018-12-15 17:00:00 0.5907152 59.07152
2018-12-15 17:10:00 0.5907152 59.07152
2018-12-15 17:20:00 0.5907152 59.07152
2018-12-15 17:30:00 0.5907152 59.07152
2018-12-15 17:40:00 0.5907152 59.07152
2018-12-15 17:50:00 0.5907152 59.07152
2018-12-15 18:00:00 0.5907152 59.07152
2018-12-15 18:10:00 0.5907152 59.07152
2018-12-15 18:20:00 0.5907152 59.07152
2018-12-15 18:30:00 0.5907152 59.07152
2018-12-15 18:40:00 0.5907152 59.07152
2018-12-15 18:50:00 0.5907152 59.07152
2018-12-15 19:00:00 0.5907152 59.07152
2018-12-15 19:10:00 0.5907152 59.07152
2018-12-15 19:20:00 0.5907152 59.07152
2018-12-15 19:30:00 0.5907152 59.07152
2018-12-15 19:40:00 0.5907152 59.07152
2018-12-15 19:50:00 0.5907152 59.07152
2018-12-15 20:00:00 0.5907152 59.07152
2018-12-15 20:10:00 0.5907152 59.07152
2018-12-15 20:20:00 0.5907152 59.07152
2018-12-15 20:30:00 0.5907152 59.07152
2018-12-15 20:40:00 0.5907152 59.07152
2018-12-15 20:50:00 0.5907152 59.07152
2018-12-15 21:00:00 0.5907152 59.07152
2018-12-15 21:10:00 0.5907152 59.07152
2018-12-15 21:20:00 0.5907152 59.07152
2018-12-15 21:30:00 0.5907152 59.07152
2018-12-15 21:40:00 0.5907152 59.07152
2018-12-15 21:50:00 0.5907152 59.07152
2018-12-15 22:00:00 0.5907152 59.07152
2018-12-15 22:10:00 0.5907152 59.07152
2018-12-15 22:20:00 0.5907152 59.07152
2018-12-15 22:30:00 0.5907152 59.07152
2018-12-15 22:40:00 0.5907152 59.07152
2018-12-15 22:50:00 0.5907152 59.07152
2018-12-15 23:00:00 0.5907152 59.07152
2018-12-15 23:10:00 0.5907152 59.07152
2018-12-15 23:20:00 0.5907152 59.07152
2018-12-15 23:30:00 0.5907152 59.07152
2018-12-15 23:40:00 0.5907152 59.07152
2018-12-15 23:50:00 0.5907152 59.07152

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the pressure network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfPressure4$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfPressure4b<-merge(dfPressure4, by10, by='by10', all.x=TRUE)

for(i in unique(dfPressure4b$by10)){

subset <- 
      dfPressure4b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT pressure Network on 2012-12-15,

  • Score 3 = 27.6 : At any given 10-minute time interval in any given node, an average 27.6% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 59.1 : At any given 10-minute time interval in any given node, reliable data is collected for an average 59.1% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

PM2.5 Concentration

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable PM2.5 concentration data in the AoT network vary across the day’s duration.

dfPM25%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfPM253

ggplot(data=dfPM253, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfPM253$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable PM 2.5 concentration data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable PM2.5 concentration data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfPM253%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 1 1.162791 1.657595
2018-12-15 00:10:00 1 1.162791 1.657595
2018-12-15 00:20:00 1 1.162791 1.657595
2018-12-15 00:30:00 1 1.162791 1.657595
2018-12-15 00:40:00 1 1.162791 1.657595
2018-12-15 00:50:00 1 1.162791 1.657595
2018-12-15 01:00:00 1 1.162791 1.657595
2018-12-15 01:10:00 1 1.162791 1.657595
2018-12-15 01:20:00 1 1.162791 1.657595
2018-12-15 01:30:00 1 1.162791 1.657595
2018-12-15 01:40:00 1 1.162791 1.657595
2018-12-15 01:50:00 1 1.162791 1.657595
2018-12-15 02:00:00 1 1.162791 1.657595
2018-12-15 02:10:00 1 1.162791 1.657595
2018-12-15 02:20:00 1 1.162791 1.657595
2018-12-15 02:30:00 1 1.162791 1.657595
2018-12-15 02:40:00 1 1.162791 1.657595
2018-12-15 02:50:00 1 1.162791 1.657595
2018-12-15 03:00:00 1 1.162791 1.657595
2018-12-15 03:10:00 1 1.162791 1.657595
2018-12-15 03:20:00 1 1.162791 1.657595
2018-12-15 03:30:00 1 1.162791 1.657595
2018-12-15 03:40:00 1 1.162791 1.657595
2018-12-15 03:50:00 1 1.162791 1.657595
2018-12-15 04:00:00 1 1.162791 1.657595
2018-12-15 04:10:00 1 1.162791 1.657595
2018-12-15 04:20:00 1 1.162791 1.657595
2018-12-15 04:30:00 1 1.162791 1.657595
2018-12-15 04:40:00 1 1.162791 1.657595
2018-12-15 04:50:00 1 1.162791 1.657595
2018-12-15 05:00:00 1 1.162791 1.657595
2018-12-15 05:10:00 1 1.162791 1.657595
2018-12-15 05:20:00 1 1.162791 1.657595
2018-12-15 05:30:00 1 1.162791 1.657595
2018-12-15 05:40:00 1 1.162791 1.657595
2018-12-15 05:50:00 1 1.162791 1.657595
2018-12-15 06:00:00 1 1.162791 1.657595
2018-12-15 06:10:00 1 1.162791 1.657595
2018-12-15 06:20:00 1 1.162791 1.657595
2018-12-15 06:30:00 2 2.325581 1.657595
2018-12-15 06:40:00 2 2.325581 1.657595
2018-12-15 06:50:00 2 2.325581 1.657595
2018-12-15 07:00:00 2 2.325581 1.657595
2018-12-15 07:10:00 2 2.325581 1.657595
2018-12-15 07:20:00 2 2.325581 1.657595
2018-12-15 07:30:00 2 2.325581 1.657595
2018-12-15 07:40:00 2 2.325581 1.657595
2018-12-15 07:50:00 2 2.325581 1.657595
2018-12-15 08:00:00 2 2.325581 1.657595
2018-12-15 08:10:00 2 2.325581 1.657595
2018-12-15 08:20:00 2 2.325581 1.657595
2018-12-15 08:30:00 2 2.325581 1.657595
2018-12-15 08:40:00 2 2.325581 1.657595
2018-12-15 08:50:00 2 2.325581 1.657595
2018-12-15 09:00:00 2 2.325581 1.657595
2018-12-15 09:10:00 2 2.325581 1.657595
2018-12-15 09:20:00 2 2.325581 1.657595
2018-12-15 09:30:00 2 2.325581 1.657595
2018-12-15 09:40:00 2 2.325581 1.657595
2018-12-15 09:50:00 2 2.325581 1.657595
2018-12-15 10:00:00 2 2.325581 1.657595
2018-12-15 10:10:00 2 2.325581 1.657595
2018-12-15 10:20:00 2 2.325581 1.657595
2018-12-15 10:30:00 2 2.325581 1.657595
2018-12-15 10:40:00 2 2.325581 1.657595
2018-12-15 10:50:00 2 2.325581 1.657595
2018-12-15 11:00:00 2 2.325581 1.657595
2018-12-15 11:10:00 2 2.325581 1.657595
2018-12-15 11:20:00 2 2.325581 1.657595
2018-12-15 11:30:00 2 2.325581 1.657595
2018-12-15 11:40:00 2 2.325581 1.657595
2018-12-15 11:50:00 2 2.325581 1.657595
2018-12-15 12:00:00 2 2.325581 1.657595
2018-12-15 12:10:00 2 2.325581 1.657595
2018-12-15 12:20:00 2 2.325581 1.657595
2018-12-15 12:30:00 2 2.325581 1.657595
2018-12-15 12:40:00 2 2.325581 1.657595
2018-12-15 12:50:00 2 2.325581 1.657595
2018-12-15 13:00:00 2 2.325581 1.657595
2018-12-15 13:10:00 2 2.325581 1.657595
2018-12-15 13:20:00 2 2.325581 1.657595
2018-12-15 13:30:00 2 2.325581 1.657595
2018-12-15 13:40:00 2 2.325581 1.657595
2018-12-15 13:50:00 2 2.325581 1.657595
2018-12-15 14:00:00 2 2.325581 1.657595
2018-12-15 14:10:00 2 2.325581 1.657595
2018-12-15 14:20:00 2 2.325581 1.657595
2018-12-15 14:30:00 2 2.325581 1.657595
2018-12-15 14:40:00 2 2.325581 1.657595
2018-12-15 14:50:00 2 2.325581 1.657595
2018-12-15 15:00:00 1 1.162791 1.657595
2018-12-15 15:10:00 1 1.162791 1.657595
2018-12-15 15:20:00 1 1.162791 1.657595
2018-12-15 15:30:00 1 1.162791 1.657595
2018-12-15 15:40:00 1 1.162791 1.657595
2018-12-15 15:50:00 1 1.162791 1.657595
2018-12-15 16:00:00 1 1.162791 1.657595
2018-12-15 16:10:00 1 1.162791 1.657595
2018-12-15 16:50:00 1 1.162791 1.657595
2018-12-15 17:00:00 1 1.162791 1.657595
2018-12-15 17:10:00 1 1.162791 1.657595
2018-12-15 17:20:00 1 1.162791 1.657595
2018-12-15 17:30:00 1 1.162791 1.657595
2018-12-15 17:40:00 1 1.162791 1.657595
2018-12-15 17:50:00 1 1.162791 1.657595
2018-12-15 18:00:00 1 1.162791 1.657595
2018-12-15 18:10:00 1 1.162791 1.657595
2018-12-15 18:20:00 1 1.162791 1.657595
2018-12-15 18:30:00 1 1.162791 1.657595
2018-12-15 18:40:00 1 1.162791 1.657595
2018-12-15 18:50:00 1 1.162791 1.657595
2018-12-15 19:00:00 1 1.162791 1.657595
2018-12-15 19:10:00 1 1.162791 1.657595
2018-12-15 19:20:00 1 1.162791 1.657595
2018-12-15 19:30:00 1 1.162791 1.657595
2018-12-15 19:40:00 1 1.162791 1.657595
2018-12-15 19:50:00 1 1.162791 1.657595
2018-12-15 20:00:00 1 1.162791 1.657595
2018-12-15 20:10:00 1 1.162791 1.657595
2018-12-15 20:20:00 1 1.162791 1.657595
2018-12-15 20:30:00 1 1.162791 1.657595
2018-12-15 20:40:00 1 1.162791 1.657595
2018-12-15 20:50:00 1 1.162791 1.657595
2018-12-15 21:00:00 1 1.162791 1.657595
2018-12-15 21:10:00 1 1.162791 1.657595
2018-12-15 21:20:00 1 1.162791 1.657595
2018-12-15 21:30:00 1 1.162791 1.657595
2018-12-15 21:40:00 1 1.162791 1.657595
2018-12-15 21:50:00 1 1.162791 1.657595
2018-12-15 22:00:00 1 1.162791 1.657595
2018-12-15 22:10:00 1 1.162791 1.657595
2018-12-15 22:20:00 2 2.325581 1.657595
2018-12-15 22:30:00 1 1.162791 1.657595
2018-12-15 22:40:00 2 2.325581 1.657595
2018-12-15 22:50:00 2 2.325581 1.657595
2018-12-15 23:00:00 2 2.325581 1.657595
2018-12-15 23:10:00 2 2.325581 1.657595
2018-12-15 23:20:00 2 2.325581 1.657595
2018-12-15 23:30:00 2 2.325581 1.657595
2018-12-15 23:40:00 2 2.325581 1.657595
2018-12-15 23:50:00 2 2.325581 1.657595

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for PM2.5 concentration data on 2012-12-15.

dfPM253%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfPM25%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfPM254

dfPM254%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
by10 node_id lat lon
2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 00:20:00 001e06113107 41.75114 -87.71299
2018-12-15 00:30:00 001e06113107 41.75114 -87.71299
2018-12-15 00:40:00 001e06113107 41.75114 -87.71299
2018-12-15 00:50:00 001e06113107 41.75114 -87.71299
2018-12-15 01:00:00 001e06113107 41.75114 -87.71299
2018-12-15 01:10:00 001e06113107 41.75114 -87.71299
2018-12-15 01:20:00 001e06113107 41.75114 -87.71299
2018-12-15 01:30:00 001e06113107 41.75114 -87.71299
2018-12-15 01:40:00 001e06113107 41.75114 -87.71299
2018-12-15 01:50:00 001e06113107 41.75114 -87.71299
2018-12-15 02:00:00 001e06113107 41.75114 -87.71299
2018-12-15 02:10:00 001e06113107 41.75114 -87.71299
2018-12-15 02:20:00 001e06113107 41.75114 -87.71299
2018-12-15 02:30:00 001e06113107 41.75114 -87.71299
2018-12-15 02:40:00 001e06113107 41.75114 -87.71299
2018-12-15 02:50:00 001e06113107 41.75114 -87.71299
2018-12-15 03:00:00 001e06113107 41.75114 -87.71299
2018-12-15 03:10:00 001e06113107 41.75114 -87.71299
2018-12-15 03:20:00 001e06113107 41.75114 -87.71299
2018-12-15 03:30:00 001e06113107 41.75114 -87.71299
2018-12-15 03:40:00 001e06113107 41.75114 -87.71299
2018-12-15 03:50:00 001e06113107 41.75114 -87.71299
2018-12-15 04:00:00 001e06113107 41.75114 -87.71299
2018-12-15 04:10:00 001e06113107 41.75114 -87.71299
2018-12-15 04:20:00 001e06113107 41.75114 -87.71299
2018-12-15 04:30:00 001e06113107 41.75114 -87.71299
2018-12-15 04:40:00 001e06113107 41.75114 -87.71299
2018-12-15 04:50:00 001e06113107 41.75114 -87.71299
2018-12-15 05:00:00 001e06113107 41.75114 -87.71299
2018-12-15 05:10:00 001e06113107 41.75114 -87.71299
2018-12-15 05:20:00 001e06113107 41.75114 -87.71299
2018-12-15 05:30:00 001e06113107 41.75114 -87.71299
2018-12-15 05:40:00 001e06113107 41.75114 -87.71299
2018-12-15 05:50:00 001e06113107 41.75114 -87.71299
2018-12-15 06:00:00 001e06113107 41.75114 -87.71299
2018-12-15 06:10:00 001e06113107 41.75114 -87.71299
2018-12-15 06:20:00 001e06113107 41.75114 -87.71299
2018-12-15 06:30:00 001e06113107 41.75114 -87.71299
2018-12-15 06:30:00 001e0610bc10 41.73631 -87.62418

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfPM254a<-NULL

for(i in unique(dfPM254$by10)){
  
subset <- 
      dfPM254%>%
      filter(by10==i)

if(nrow(subset)>1){
  subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
  subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
  P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    dfPM254a<-rbind(dfPM254a, df1)
}else{
  df1<-NULL
  df1$by10<-i
  df1$AreaProp<-0
  df1<-as.data.frame(df1)
  dfPM254a<-rbind(dfPM254a, df1)
}
  
}

dfPM254a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfPM254a
dfPM254a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.0000000 0.7976292
2018-12-15 00:10:00 0.0000000 0.7976292
2018-12-15 00:20:00 0.0000000 0.7976292
2018-12-15 00:30:00 0.0000000 0.7976292
2018-12-15 00:40:00 0.0000000 0.7976292
2018-12-15 00:50:00 0.0000000 0.7976292
2018-12-15 01:00:00 0.0000000 0.7976292
2018-12-15 01:10:00 0.0000000 0.7976292
2018-12-15 01:20:00 0.0000000 0.7976292
2018-12-15 01:30:00 0.0000000 0.7976292
2018-12-15 01:40:00 0.0000000 0.7976292
2018-12-15 01:50:00 0.0000000 0.7976292
2018-12-15 02:00:00 0.0000000 0.7976292
2018-12-15 02:10:00 0.0000000 0.7976292
2018-12-15 02:20:00 0.0000000 0.7976292
2018-12-15 02:30:00 0.0000000 0.7976292
2018-12-15 02:40:00 0.0000000 0.7976292
2018-12-15 02:50:00 0.0000000 0.7976292
2018-12-15 03:00:00 0.0000000 0.7976292
2018-12-15 03:10:00 0.0000000 0.7976292
2018-12-15 03:20:00 0.0000000 0.7976292
2018-12-15 03:30:00 0.0000000 0.7976292
2018-12-15 03:40:00 0.0000000 0.7976292
2018-12-15 03:50:00 0.0000000 0.7976292
2018-12-15 04:00:00 0.0000000 0.7976292
2018-12-15 04:10:00 0.0000000 0.7976292
2018-12-15 04:20:00 0.0000000 0.7976292
2018-12-15 04:30:00 0.0000000 0.7976292
2018-12-15 04:40:00 0.0000000 0.7976292
2018-12-15 04:50:00 0.0000000 0.7976292
2018-12-15 05:00:00 0.0000000 0.7976292
2018-12-15 05:10:00 0.0000000 0.7976292
2018-12-15 05:20:00 0.0000000 0.7976292
2018-12-15 05:30:00 0.0000000 0.7976292
2018-12-15 05:40:00 0.0000000 0.7976292
2018-12-15 05:50:00 0.0000000 0.7976292
2018-12-15 06:00:00 0.0000000 0.7976292
2018-12-15 06:10:00 0.0000000 0.7976292
2018-12-15 06:20:00 0.0000000 0.7976292
2018-12-15 06:30:00 0.0187443 0.7976292
2018-12-15 06:40:00 0.0187443 0.7976292
2018-12-15 06:50:00 0.0187443 0.7976292
2018-12-15 07:00:00 0.0187443 0.7976292
2018-12-15 07:10:00 0.0187443 0.7976292
2018-12-15 07:20:00 0.0187443 0.7976292
2018-12-15 07:30:00 0.0187443 0.7976292
2018-12-15 07:40:00 0.0187443 0.7976292
2018-12-15 07:50:00 0.0187443 0.7976292
2018-12-15 08:00:00 0.0187443 0.7976292
2018-12-15 08:10:00 0.0187443 0.7976292
2018-12-15 08:20:00 0.0187443 0.7976292
2018-12-15 08:30:00 0.0187443 0.7976292
2018-12-15 08:40:00 0.0187443 0.7976292
2018-12-15 08:50:00 0.0187443 0.7976292
2018-12-15 09:00:00 0.0187443 0.7976292
2018-12-15 09:10:00 0.0187443 0.7976292
2018-12-15 09:20:00 0.0187443 0.7976292
2018-12-15 09:30:00 0.0187443 0.7976292
2018-12-15 09:40:00 0.0187443 0.7976292
2018-12-15 09:50:00 0.0187443 0.7976292
2018-12-15 10:00:00 0.0187443 0.7976292
2018-12-15 10:10:00 0.0187443 0.7976292
2018-12-15 10:20:00 0.0187443 0.7976292
2018-12-15 10:30:00 0.0187443 0.7976292
2018-12-15 10:40:00 0.0187443 0.7976292
2018-12-15 10:50:00 0.0187443 0.7976292
2018-12-15 11:00:00 0.0187443 0.7976292
2018-12-15 11:10:00 0.0187443 0.7976292
2018-12-15 11:20:00 0.0187443 0.7976292
2018-12-15 11:30:00 0.0187443 0.7976292
2018-12-15 11:40:00 0.0187443 0.7976292
2018-12-15 11:50:00 0.0187443 0.7976292
2018-12-15 12:00:00 0.0187443 0.7976292
2018-12-15 12:10:00 0.0187443 0.7976292
2018-12-15 12:20:00 0.0187443 0.7976292
2018-12-15 12:30:00 0.0187443 0.7976292
2018-12-15 12:40:00 0.0187443 0.7976292
2018-12-15 12:50:00 0.0187443 0.7976292
2018-12-15 13:00:00 0.0187443 0.7976292
2018-12-15 13:10:00 0.0187443 0.7976292
2018-12-15 13:20:00 0.0187443 0.7976292
2018-12-15 13:30:00 0.0187443 0.7976292
2018-12-15 13:40:00 0.0187443 0.7976292
2018-12-15 13:50:00 0.0187443 0.7976292
2018-12-15 14:00:00 0.0187443 0.7976292
2018-12-15 14:10:00 0.0187443 0.7976292
2018-12-15 14:20:00 0.0187443 0.7976292
2018-12-15 14:30:00 0.0187443 0.7976292
2018-12-15 14:40:00 0.0187443 0.7976292
2018-12-15 14:50:00 0.0187443 0.7976292
2018-12-15 15:00:00 0.0000000 0.7976292
2018-12-15 15:10:00 0.0000000 0.7976292
2018-12-15 15:20:00 0.0000000 0.7976292
2018-12-15 15:30:00 0.0000000 0.7976292
2018-12-15 15:40:00 0.0000000 0.7976292
2018-12-15 15:50:00 0.0000000 0.7976292
2018-12-15 16:00:00 0.0000000 0.7976292
2018-12-15 16:10:00 0.0000000 0.7976292
2018-12-15 16:50:00 0.0000000 0.7976292
2018-12-15 17:00:00 0.0000000 0.7976292
2018-12-15 17:10:00 0.0000000 0.7976292
2018-12-15 17:20:00 0.0000000 0.7976292
2018-12-15 17:30:00 0.0000000 0.7976292
2018-12-15 17:40:00 0.0000000 0.7976292
2018-12-15 17:50:00 0.0000000 0.7976292
2018-12-15 18:00:00 0.0000000 0.7976292
2018-12-15 18:10:00 0.0000000 0.7976292
2018-12-15 18:20:00 0.0000000 0.7976292
2018-12-15 18:30:00 0.0000000 0.7976292
2018-12-15 18:40:00 0.0000000 0.7976292
2018-12-15 18:50:00 0.0000000 0.7976292
2018-12-15 19:00:00 0.0000000 0.7976292
2018-12-15 19:10:00 0.0000000 0.7976292
2018-12-15 19:20:00 0.0000000 0.7976292
2018-12-15 19:30:00 0.0000000 0.7976292
2018-12-15 19:40:00 0.0000000 0.7976292
2018-12-15 19:50:00 0.0000000 0.7976292
2018-12-15 20:00:00 0.0000000 0.7976292
2018-12-15 20:10:00 0.0000000 0.7976292
2018-12-15 20:20:00 0.0000000 0.7976292
2018-12-15 20:30:00 0.0000000 0.7976292
2018-12-15 20:40:00 0.0000000 0.7976292
2018-12-15 20:50:00 0.0000000 0.7976292
2018-12-15 21:00:00 0.0000000 0.7976292
2018-12-15 21:10:00 0.0000000 0.7976292
2018-12-15 21:20:00 0.0000000 0.7976292
2018-12-15 21:30:00 0.0000000 0.7976292
2018-12-15 21:40:00 0.0000000 0.7976292
2018-12-15 21:50:00 0.0000000 0.7976292
2018-12-15 22:00:00 0.0000000 0.7976292
2018-12-15 22:10:00 0.0000000 0.7976292
2018-12-15 22:20:00 0.0187443 0.7976292
2018-12-15 22:30:00 0.0000000 0.7976292
2018-12-15 22:40:00 0.0187443 0.7976292
2018-12-15 22:50:00 0.0187443 0.7976292
2018-12-15 23:00:00 0.0187443 0.7976292
2018-12-15 23:10:00 0.0187443 0.7976292
2018-12-15 23:20:00 0.0187443 0.7976292
2018-12-15 23:30:00 0.0187443 0.7976292
2018-12-15 23:40:00 0.0187443 0.7976292
2018-12-15 23:50:00 0.0187443 0.7976292

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 12:00(bottom left), and 23:00(bottom right). In the case of the PM2.5 concentration network on this day,there is no node collecting any reliable data for 00:00 and 07:00 - n map can be displayed. For 12:00 and 23:00, however, the coverage area of one single node in the south can be observed.

p<-list()

by10<-as.data.frame(unique(dfPM254$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfPM254b<-merge(dfPM254, by10, by='by10', all.x=TRUE)

for(i in unique(dfPM254b$by10)){

subset <- 
      dfPM254b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[71]], p[[141]])

In summary, for the AoT PM2.5 concentration Network on 2012-12-15,

  • Score 3 = 1.7 : At any given 10-minute time interval in any given node, an average 1.7% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 0.80 : At any given 10-minute time interval in any given node, reliable data is collected for an average 0.8% of Chicago’s area.

CO Concentration

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable CO concentration data in the AoT network vary across the day’s duration.

dfCO%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfCO3

ggplot(data=dfCO3, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfCO3$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable CO concentration data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable CO concentration data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfCO3%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 16 18.604651 17.11886
2018-12-15 00:10:00 14 16.279070 17.11886
2018-12-15 00:20:00 16 18.604651 17.11886
2018-12-15 00:30:00 16 18.604651 17.11886
2018-12-15 00:40:00 15 17.441861 17.11886
2018-12-15 00:50:00 14 16.279070 17.11886
2018-12-15 01:00:00 17 19.767442 17.11886
2018-12-15 01:10:00 16 18.604651 17.11886
2018-12-15 01:20:00 17 19.767442 17.11886
2018-12-15 01:30:00 17 19.767442 17.11886
2018-12-15 01:40:00 16 18.604651 17.11886
2018-12-15 01:50:00 15 17.441861 17.11886
2018-12-15 02:00:00 15 17.441861 17.11886
2018-12-15 02:10:00 16 18.604651 17.11886
2018-12-15 02:20:00 15 17.441861 17.11886
2018-12-15 02:30:00 14 16.279070 17.11886
2018-12-15 02:40:00 16 18.604651 17.11886
2018-12-15 02:50:00 16 18.604651 17.11886
2018-12-15 03:00:00 16 18.604651 17.11886
2018-12-15 03:10:00 17 19.767442 17.11886
2018-12-15 03:20:00 15 17.441861 17.11886
2018-12-15 03:30:00 15 17.441861 17.11886
2018-12-15 03:40:00 17 19.767442 17.11886
2018-12-15 03:50:00 14 16.279070 17.11886
2018-12-15 04:00:00 16 18.604651 17.11886
2018-12-15 04:10:00 16 18.604651 17.11886
2018-12-15 04:20:00 16 18.604651 17.11886
2018-12-15 04:30:00 16 18.604651 17.11886
2018-12-15 04:40:00 18 20.930233 17.11886
2018-12-15 04:50:00 17 19.767442 17.11886
2018-12-15 05:00:00 17 19.767442 17.11886
2018-12-15 05:10:00 17 19.767442 17.11886
2018-12-15 05:20:00 18 20.930233 17.11886
2018-12-15 05:30:00 18 20.930233 17.11886
2018-12-15 05:40:00 16 18.604651 17.11886
2018-12-15 05:50:00 17 19.767442 17.11886
2018-12-15 06:00:00 17 19.767442 17.11886
2018-12-15 06:10:00 17 19.767442 17.11886
2018-12-15 06:20:00 17 19.767442 17.11886
2018-12-15 06:30:00 17 19.767442 17.11886
2018-12-15 06:40:00 17 19.767442 17.11886
2018-12-15 06:50:00 16 18.604651 17.11886
2018-12-15 07:00:00 18 20.930233 17.11886
2018-12-15 07:10:00 17 19.767442 17.11886
2018-12-15 07:20:00 18 20.930233 17.11886
2018-12-15 07:30:00 17 19.767442 17.11886
2018-12-15 07:40:00 16 18.604651 17.11886
2018-12-15 07:50:00 18 20.930233 17.11886
2018-12-15 08:00:00 17 19.767442 17.11886
2018-12-15 08:10:00 15 17.441861 17.11886
2018-12-15 08:20:00 18 20.930233 17.11886
2018-12-15 08:30:00 17 19.767442 17.11886
2018-12-15 08:40:00 17 19.767442 17.11886
2018-12-15 08:50:00 17 19.767442 17.11886
2018-12-15 09:00:00 17 19.767442 17.11886
2018-12-15 09:10:00 18 20.930233 17.11886
2018-12-15 09:20:00 16 18.604651 17.11886
2018-12-15 09:30:00 15 17.441861 17.11886
2018-12-15 09:40:00 14 16.279070 17.11886
2018-12-15 09:50:00 15 17.441861 17.11886
2018-12-15 10:00:00 16 18.604651 17.11886
2018-12-15 10:10:00 16 18.604651 17.11886
2018-12-15 10:20:00 16 18.604651 17.11886
2018-12-15 10:30:00 16 18.604651 17.11886
2018-12-15 10:40:00 16 18.604651 17.11886
2018-12-15 10:50:00 16 18.604651 17.11886
2018-12-15 11:00:00 16 18.604651 17.11886
2018-12-15 11:10:00 16 18.604651 17.11886
2018-12-15 11:20:00 15 17.441861 17.11886
2018-12-15 11:30:00 14 16.279070 17.11886
2018-12-15 11:40:00 16 18.604651 17.11886
2018-12-15 11:50:00 14 16.279070 17.11886
2018-12-15 12:00:00 15 17.441861 17.11886
2018-12-15 12:10:00 16 18.604651 17.11886
2018-12-15 12:20:00 15 17.441861 17.11886
2018-12-15 12:30:00 14 16.279070 17.11886
2018-12-15 12:40:00 14 16.279070 17.11886
2018-12-15 12:50:00 15 17.441861 17.11886
2018-12-15 13:00:00 16 18.604651 17.11886
2018-12-15 13:10:00 15 17.441861 17.11886
2018-12-15 13:20:00 17 19.767442 17.11886
2018-12-15 13:30:00 15 17.441861 17.11886
2018-12-15 13:40:00 17 19.767442 17.11886
2018-12-15 13:50:00 16 18.604651 17.11886
2018-12-15 14:00:00 14 16.279070 17.11886
2018-12-15 14:10:00 16 18.604651 17.11886
2018-12-15 14:20:00 14 16.279070 17.11886
2018-12-15 14:30:00 16 18.604651 17.11886
2018-12-15 14:40:00 14 16.279070 17.11886
2018-12-15 14:50:00 14 16.279070 17.11886
2018-12-15 15:00:00 14 16.279070 17.11886
2018-12-15 15:10:00 12 13.953488 17.11886
2018-12-15 15:20:00 13 15.116279 17.11886
2018-12-15 15:30:00 14 16.279070 17.11886
2018-12-15 15:40:00 14 16.279070 17.11886
2018-12-15 15:50:00 12 13.953488 17.11886
2018-12-15 16:00:00 13 15.116279 17.11886
2018-12-15 16:10:00 12 13.953488 17.11886
2018-12-15 16:20:00 13 15.116279 17.11886
2018-12-15 16:30:00 13 15.116279 17.11886
2018-12-15 16:40:00 13 15.116279 17.11886
2018-12-15 16:50:00 13 15.116279 17.11886
2018-12-15 17:00:00 13 15.116279 17.11886
2018-12-15 17:10:00 11 12.790698 17.11886
2018-12-15 17:20:00 13 15.116279 17.11886
2018-12-15 17:30:00 12 13.953488 17.11886
2018-12-15 17:40:00 13 15.116279 17.11886
2018-12-15 17:50:00 11 12.790698 17.11886
2018-12-15 18:00:00 13 15.116279 17.11886
2018-12-15 18:10:00 12 13.953488 17.11886
2018-12-15 18:20:00 10 11.627907 17.11886
2018-12-15 18:30:00 11 12.790698 17.11886
2018-12-15 18:40:00 11 12.790698 17.11886
2018-12-15 18:50:00 10 11.627907 17.11886
2018-12-15 19:00:00 10 11.627907 17.11886
2018-12-15 19:10:00 10 11.627907 17.11886
2018-12-15 19:20:00 11 12.790698 17.11886
2018-12-15 19:30:00 7 8.139535 17.11886
2018-12-15 19:40:00 10 11.627907 17.11886
2018-12-15 19:50:00 11 12.790698 17.11886
2018-12-15 20:00:00 10 11.627907 17.11886
2018-12-15 20:10:00 13 15.116279 17.11886
2018-12-15 20:20:00 9 10.465116 17.11886
2018-12-15 20:30:00 12 13.953488 17.11886
2018-12-15 20:40:00 12 13.953488 17.11886
2018-12-15 20:50:00 10 11.627907 17.11886
2018-12-15 21:00:00 13 15.116279 17.11886
2018-12-15 21:10:00 13 15.116279 17.11886
2018-12-15 21:20:00 12 13.953488 17.11886
2018-12-15 21:30:00 13 15.116279 17.11886
2018-12-15 21:40:00 13 15.116279 17.11886
2018-12-15 21:50:00 13 15.116279 17.11886
2018-12-15 22:00:00 13 15.116279 17.11886
2018-12-15 22:10:00 14 16.279070 17.11886
2018-12-15 22:20:00 15 17.441861 17.11886
2018-12-15 22:30:00 16 18.604651 17.11886
2018-12-15 22:40:00 16 18.604651 17.11886
2018-12-15 22:50:00 16 18.604651 17.11886
2018-12-15 23:00:00 16 18.604651 17.11886
2018-12-15 23:10:00 14 16.279070 17.11886
2018-12-15 23:20:00 16 18.604651 17.11886
2018-12-15 23:30:00 16 18.604651 17.11886
2018-12-15 23:40:00 16 18.604651 17.11886
2018-12-15 23:50:00 16 18.604651 17.11886

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for CO concentration data on 2012-12-15.

dfCO3%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfCO%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfCO4

dfCO4%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
by10 node_id lat lon
2018-12-15 00:00:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:00:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 00:00:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:00:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:00:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:00:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:00:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:00:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:00:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:00:00 001e06114503 41.66608 -87.53937
2018-12-15 00:10:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:10:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:10:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:10:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:10:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:10:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:10:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:10:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:10:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:10:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:10:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 00:10:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:10:00 001e06114503 41.66608 -87.53937
2018-12-15 00:20:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:20:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:20:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:20:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:20:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:20:00 001e06113107 41.75114 -87.71299
2018-12-15 00:20:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:20:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:20:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:20:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:20:00 001e0610ee43 41.78861 -87.59871

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfCO4a<-NULL

for(i in unique(dfCO4$by10)){
  
subset <- 
      dfCO4%>%
      filter(by10==i)

if(nrow(subset)>1){
  subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
  subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
  P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    dfCO4a<-rbind(dfCO4a, df1)
}else{
  df1<-NULL
  df1$by10<-i
  df1$AreaProp<-0
  df1<-as.data.frame(df1)
  dfCO4a<-rbind(dfCO4a, df1)
}
  
}

dfCO4a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfCO4a
dfCO4a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.5639259 52.29076
2018-12-15 00:10:00 0.5639259 52.29076
2018-12-15 00:20:00 0.6010409 52.29076
2018-12-15 00:30:00 0.5639259 52.29076
2018-12-15 00:40:00 0.5639259 52.29076
2018-12-15 00:50:00 0.5639259 52.29076
2018-12-15 01:00:00 0.6010409 52.29076
2018-12-15 01:10:00 0.5639259 52.29076
2018-12-15 01:20:00 0.5639259 52.29076
2018-12-15 01:30:00 0.5639259 52.29076
2018-12-15 01:40:00 0.5639259 52.29076
2018-12-15 01:50:00 0.5639259 52.29076
2018-12-15 02:00:00 0.5639259 52.29076
2018-12-15 02:10:00 0.5639259 52.29076
2018-12-15 02:20:00 0.5639259 52.29076
2018-12-15 02:30:00 0.4172133 52.29076
2018-12-15 02:40:00 0.5639259 52.29076
2018-12-15 02:50:00 0.5639259 52.29076
2018-12-15 03:00:00 0.5639259 52.29076
2018-12-15 03:10:00 0.5639259 52.29076
2018-12-15 03:20:00 0.5639259 52.29076
2018-12-15 03:30:00 0.5639259 52.29076
2018-12-15 03:40:00 0.6010409 52.29076
2018-12-15 03:50:00 0.5639259 52.29076
2018-12-15 04:00:00 0.5639259 52.29076
2018-12-15 04:10:00 0.5639259 52.29076
2018-12-15 04:20:00 0.5639259 52.29076
2018-12-15 04:30:00 0.6010409 52.29076
2018-12-15 04:40:00 0.6010409 52.29076
2018-12-15 04:50:00 0.6010409 52.29076
2018-12-15 05:00:00 0.5639259 52.29076
2018-12-15 05:10:00 0.6010409 52.29076
2018-12-15 05:20:00 0.6010409 52.29076
2018-12-15 05:30:00 0.6010409 52.29076
2018-12-15 05:40:00 0.6010409 52.29076
2018-12-15 05:50:00 0.6010409 52.29076
2018-12-15 06:00:00 0.6010409 52.29076
2018-12-15 06:10:00 0.6010409 52.29076
2018-12-15 06:20:00 0.4496355 52.29076
2018-12-15 06:30:00 0.6010409 52.29076
2018-12-15 06:40:00 0.6010409 52.29076
2018-12-15 06:50:00 0.6010409 52.29076
2018-12-15 07:00:00 0.6010409 52.29076
2018-12-15 07:10:00 0.6010409 52.29076
2018-12-15 07:20:00 0.6010409 52.29076
2018-12-15 07:30:00 0.6010409 52.29076
2018-12-15 07:40:00 0.6010409 52.29076
2018-12-15 07:50:00 0.6010409 52.29076
2018-12-15 08:00:00 0.6010409 52.29076
2018-12-15 08:10:00 0.5639259 52.29076
2018-12-15 08:20:00 0.6010409 52.29076
2018-12-15 08:30:00 0.6010409 52.29076
2018-12-15 08:40:00 0.6010409 52.29076
2018-12-15 08:50:00 0.6010409 52.29076
2018-12-15 09:00:00 0.5639259 52.29076
2018-12-15 09:10:00 0.6010409 52.29076
2018-12-15 09:20:00 0.6010409 52.29076
2018-12-15 09:30:00 0.5639259 52.29076
2018-12-15 09:40:00 0.6010409 52.29076
2018-12-15 09:50:00 0.6010409 52.29076
2018-12-15 10:00:00 0.6010409 52.29076
2018-12-15 10:10:00 0.6010409 52.29076
2018-12-15 10:20:00 0.6010409 52.29076
2018-12-15 10:30:00 0.6010409 52.29076
2018-12-15 10:40:00 0.6010409 52.29076
2018-12-15 10:50:00 0.5639259 52.29076
2018-12-15 11:00:00 0.5639259 52.29076
2018-12-15 11:10:00 0.5639259 52.29076
2018-12-15 11:20:00 0.5639259 52.29076
2018-12-15 11:30:00 0.5639259 52.29076
2018-12-15 11:40:00 0.6010409 52.29076
2018-12-15 11:50:00 0.5639259 52.29076
2018-12-15 12:00:00 0.6010409 52.29076
2018-12-15 12:10:00 0.6010409 52.29076
2018-12-15 12:20:00 0.5639259 52.29076
2018-12-15 12:30:00 0.5639259 52.29076
2018-12-15 12:40:00 0.4114511 52.29076
2018-12-15 12:50:00 0.6010409 52.29076
2018-12-15 13:00:00 0.6010409 52.29076
2018-12-15 13:10:00 0.6010409 52.29076
2018-12-15 13:20:00 0.5639259 52.29076
2018-12-15 13:30:00 0.5639259 52.29076
2018-12-15 13:40:00 0.5639259 52.29076
2018-12-15 13:50:00 0.5639259 52.29076
2018-12-15 14:00:00 0.5639259 52.29076
2018-12-15 14:10:00 0.5639259 52.29076
2018-12-15 14:20:00 0.3790289 52.29076
2018-12-15 14:30:00 0.6010409 52.29076
2018-12-15 14:40:00 0.5639259 52.29076
2018-12-15 14:50:00 0.4172133 52.29076
2018-12-15 15:00:00 0.5639259 52.29076
2018-12-15 15:10:00 0.5639259 52.29076
2018-12-15 15:20:00 0.3790289 52.29076
2018-12-15 15:30:00 0.5639259 52.29076
2018-12-15 15:40:00 0.5639259 52.29076
2018-12-15 15:50:00 0.3790195 52.29076
2018-12-15 16:00:00 0.3790289 52.29076
2018-12-15 16:10:00 0.3790289 52.29076
2018-12-15 16:20:00 0.3790289 52.29076
2018-12-15 16:30:00 0.3790289 52.29076
2018-12-15 16:40:00 0.3790289 52.29076
2018-12-15 16:50:00 0.3790258 52.29076
2018-12-15 17:00:00 0.5639259 52.29076
2018-12-15 17:10:00 0.3790289 52.29076
2018-12-15 17:20:00 0.3790289 52.29076
2018-12-15 17:30:00 0.5639222 52.29076
2018-12-15 17:40:00 0.3790289 52.29076
2018-12-15 17:50:00 0.3790289 52.29076
2018-12-15 18:00:00 0.4730930 52.29076
2018-12-15 18:10:00 0.3790195 52.29076
2018-12-15 18:20:00 0.2992375 52.29076
2018-12-15 18:30:00 0.4730930 52.29076
2018-12-15 18:40:00 0.5639222 52.29076
2018-12-15 18:50:00 0.3790164 52.29076
2018-12-15 19:00:00 0.2640792 52.29076
2018-12-15 19:10:00 0.2992375 52.29076
2018-12-15 19:20:00 0.2992469 52.29076
2018-12-15 19:30:00 0.2525195 52.29076
2018-12-15 19:40:00 0.2574762 52.29076
2018-12-15 19:50:00 0.2992375 52.29076
2018-12-15 20:00:00 0.3790164 52.29076
2018-12-15 20:10:00 0.4172133 52.29076
2018-12-15 20:20:00 0.3790195 52.29076
2018-12-15 20:30:00 0.5639222 52.29076
2018-12-15 20:40:00 0.3790195 52.29076
2018-12-15 20:50:00 0.2679517 52.29076
2018-12-15 21:00:00 0.3790195 52.29076
2018-12-15 21:10:00 0.6010409 52.29076
2018-12-15 21:20:00 0.3790195 52.29076
2018-12-15 21:30:00 0.5639222 52.29076
2018-12-15 21:40:00 0.3790195 52.29076
2018-12-15 21:50:00 0.3790195 52.29076
2018-12-15 22:00:00 0.4114417 52.29076
2018-12-15 22:10:00 0.5639259 52.29076
2018-12-15 22:20:00 0.4114511 52.29076
2018-12-15 22:30:00 0.4496355 52.29076
2018-12-15 22:40:00 0.6010409 52.29076
2018-12-15 22:50:00 0.6010409 52.29076
2018-12-15 23:00:00 0.6010409 52.29076
2018-12-15 23:10:00 0.5639259 52.29076
2018-12-15 23:20:00 0.6010409 52.29076
2018-12-15 23:30:00 0.6010409 52.29076
2018-12-15 23:40:00 0.4496355 52.29076
2018-12-15 23:50:00 0.6010409 52.29076

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the CO concentration network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfCO4$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfCO4b<-merge(dfCO4, by10, by='by10', all.x=TRUE)

for(i in unique(dfCO4b$by10)){

subset <- 
      dfCO4b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT CO concentration Network on 2012-12-15,

  • Score 3 = 17.1 : At any given 10-minute time interval in any given node, an average 17.1% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 52.3 : At any given 10-minute time interval in any given node, reliable data is collected for an average 52.3% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

H2S Concentration

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable H2S concentration data in the AoT network vary across the day’s duration.

dfH2S%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfH2S3

ggplot(data=dfH2S3, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfH2S3$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable H2S concentration data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable H2S concentration data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfH2S3%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 15 17.44186 19.80782
2018-12-15 00:10:00 16 18.60465 19.80782
2018-12-15 00:20:00 15 17.44186 19.80782
2018-12-15 00:30:00 15 17.44186 19.80782
2018-12-15 00:40:00 15 17.44186 19.80782
2018-12-15 00:50:00 16 18.60465 19.80782
2018-12-15 01:00:00 16 18.60465 19.80782
2018-12-15 01:10:00 15 17.44186 19.80782
2018-12-15 01:20:00 17 19.76744 19.80782
2018-12-15 01:30:00 16 18.60465 19.80782
2018-12-15 01:40:00 17 19.76744 19.80782
2018-12-15 01:50:00 17 19.76744 19.80782
2018-12-15 02:00:00 17 19.76744 19.80782
2018-12-15 02:10:00 17 19.76744 19.80782
2018-12-15 02:20:00 17 19.76744 19.80782
2018-12-15 02:30:00 16 18.60465 19.80782
2018-12-15 02:40:00 17 19.76744 19.80782
2018-12-15 02:50:00 16 18.60465 19.80782
2018-12-15 03:00:00 17 19.76744 19.80782
2018-12-15 03:10:00 16 18.60465 19.80782
2018-12-15 03:20:00 17 19.76744 19.80782
2018-12-15 03:30:00 17 19.76744 19.80782
2018-12-15 03:40:00 17 19.76744 19.80782
2018-12-15 03:50:00 17 19.76744 19.80782
2018-12-15 04:00:00 17 19.76744 19.80782
2018-12-15 04:10:00 17 19.76744 19.80782
2018-12-15 04:20:00 16 18.60465 19.80782
2018-12-15 04:30:00 16 18.60465 19.80782
2018-12-15 04:40:00 17 19.76744 19.80782
2018-12-15 04:50:00 17 19.76744 19.80782
2018-12-15 05:00:00 17 19.76744 19.80782
2018-12-15 05:10:00 16 18.60465 19.80782
2018-12-15 05:20:00 16 18.60465 19.80782
2018-12-15 05:30:00 17 19.76744 19.80782
2018-12-15 05:40:00 17 19.76744 19.80782
2018-12-15 05:50:00 17 19.76744 19.80782
2018-12-15 06:00:00 17 19.76744 19.80782
2018-12-15 06:10:00 17 19.76744 19.80782
2018-12-15 06:20:00 17 19.76744 19.80782
2018-12-15 06:30:00 17 19.76744 19.80782
2018-12-15 06:40:00 17 19.76744 19.80782
2018-12-15 06:50:00 16 18.60465 19.80782
2018-12-15 07:00:00 16 18.60465 19.80782
2018-12-15 07:10:00 17 19.76744 19.80782
2018-12-15 07:20:00 17 19.76744 19.80782
2018-12-15 07:30:00 17 19.76744 19.80782
2018-12-15 07:40:00 17 19.76744 19.80782
2018-12-15 07:50:00 16 18.60465 19.80782
2018-12-15 08:00:00 17 19.76744 19.80782
2018-12-15 08:10:00 17 19.76744 19.80782
2018-12-15 08:20:00 17 19.76744 19.80782
2018-12-15 08:30:00 16 18.60465 19.80782
2018-12-15 08:40:00 17 19.76744 19.80782
2018-12-15 08:50:00 17 19.76744 19.80782
2018-12-15 09:00:00 18 20.93023 19.80782
2018-12-15 09:10:00 18 20.93023 19.80782
2018-12-15 09:20:00 18 20.93023 19.80782
2018-12-15 09:30:00 18 20.93023 19.80782
2018-12-15 09:40:00 18 20.93023 19.80782
2018-12-15 09:50:00 17 19.76744 19.80782
2018-12-15 10:00:00 17 19.76744 19.80782
2018-12-15 10:10:00 17 19.76744 19.80782
2018-12-15 10:20:00 17 19.76744 19.80782
2018-12-15 10:30:00 17 19.76744 19.80782
2018-12-15 10:40:00 17 19.76744 19.80782
2018-12-15 10:50:00 18 20.93023 19.80782
2018-12-15 11:00:00 17 19.76744 19.80782
2018-12-15 11:10:00 18 20.93023 19.80782
2018-12-15 11:20:00 17 19.76744 19.80782
2018-12-15 11:30:00 17 19.76744 19.80782
2018-12-15 11:40:00 17 19.76744 19.80782
2018-12-15 11:50:00 17 19.76744 19.80782
2018-12-15 12:00:00 17 19.76744 19.80782
2018-12-15 12:10:00 18 20.93023 19.80782
2018-12-15 12:20:00 18 20.93023 19.80782
2018-12-15 12:30:00 18 20.93023 19.80782
2018-12-15 12:40:00 17 19.76744 19.80782
2018-12-15 12:50:00 17 19.76744 19.80782
2018-12-15 13:00:00 17 19.76744 19.80782
2018-12-15 13:10:00 17 19.76744 19.80782
2018-12-15 13:20:00 18 20.93023 19.80782
2018-12-15 13:30:00 18 20.93023 19.80782
2018-12-15 13:40:00 18 20.93023 19.80782
2018-12-15 13:50:00 17 19.76744 19.80782
2018-12-15 14:00:00 17 19.76744 19.80782
2018-12-15 14:10:00 18 20.93023 19.80782
2018-12-15 14:20:00 17 19.76744 19.80782
2018-12-15 14:30:00 18 20.93023 19.80782
2018-12-15 14:40:00 17 19.76744 19.80782
2018-12-15 14:50:00 17 19.76744 19.80782
2018-12-15 15:00:00 17 19.76744 19.80782
2018-12-15 15:10:00 16 18.60465 19.80782
2018-12-15 15:20:00 17 19.76744 19.80782
2018-12-15 15:30:00 16 18.60465 19.80782
2018-12-15 15:40:00 16 18.60465 19.80782
2018-12-15 15:50:00 17 19.76744 19.80782
2018-12-15 16:00:00 16 18.60465 19.80782
2018-12-15 16:10:00 17 19.76744 19.80782
2018-12-15 16:20:00 16 18.60465 19.80782
2018-12-15 16:30:00 17 19.76744 19.80782
2018-12-15 16:40:00 16 18.60465 19.80782
2018-12-15 16:50:00 17 19.76744 19.80782
2018-12-15 17:00:00 17 19.76744 19.80782
2018-12-15 17:10:00 17 19.76744 19.80782
2018-12-15 17:20:00 17 19.76744 19.80782
2018-12-15 17:30:00 17 19.76744 19.80782
2018-12-15 17:40:00 17 19.76744 19.80782
2018-12-15 17:50:00 17 19.76744 19.80782
2018-12-15 18:00:00 17 19.76744 19.80782
2018-12-15 18:10:00 18 20.93023 19.80782
2018-12-15 18:20:00 17 19.76744 19.80782
2018-12-15 18:30:00 17 19.76744 19.80782
2018-12-15 18:40:00 17 19.76744 19.80782
2018-12-15 18:50:00 17 19.76744 19.80782
2018-12-15 19:00:00 17 19.76744 19.80782
2018-12-15 19:10:00 17 19.76744 19.80782
2018-12-15 19:20:00 17 19.76744 19.80782
2018-12-15 19:30:00 18 20.93023 19.80782
2018-12-15 19:40:00 17 19.76744 19.80782
2018-12-15 19:50:00 17 19.76744 19.80782
2018-12-15 20:00:00 18 20.93023 19.80782
2018-12-15 20:10:00 17 19.76744 19.80782
2018-12-15 20:20:00 17 19.76744 19.80782
2018-12-15 20:30:00 18 20.93023 19.80782
2018-12-15 20:40:00 17 19.76744 19.80782
2018-12-15 20:50:00 18 20.93023 19.80782
2018-12-15 21:00:00 18 20.93023 19.80782
2018-12-15 21:10:00 18 20.93023 19.80782
2018-12-15 21:20:00 18 20.93023 19.80782
2018-12-15 21:30:00 18 20.93023 19.80782
2018-12-15 21:40:00 18 20.93023 19.80782
2018-12-15 21:50:00 18 20.93023 19.80782
2018-12-15 22:00:00 18 20.93023 19.80782
2018-12-15 22:10:00 18 20.93023 19.80782
2018-12-15 22:20:00 18 20.93023 19.80782
2018-12-15 22:30:00 18 20.93023 19.80782
2018-12-15 22:40:00 17 19.76744 19.80782
2018-12-15 22:50:00 18 20.93023 19.80782
2018-12-15 23:00:00 18 20.93023 19.80782
2018-12-15 23:10:00 18 20.93023 19.80782
2018-12-15 23:20:00 18 20.93023 19.80782
2018-12-15 23:30:00 18 20.93023 19.80782
2018-12-15 23:40:00 17 19.76744 19.80782
2018-12-15 23:50:00 18 20.93023 19.80782

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for H2S concentration data on 2012-12-15.

dfH2S3%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfH2S%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfH2S4

dfH2S4%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
by10 node_id lat lon
2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:00:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:00:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:00:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:00:00 001e06114503 41.66608 -87.53937
2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:00:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:00:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:00:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:00:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:00:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:10:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:10:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:10:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:10:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:10:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:10:00 001e06114503 41.66608 -87.53937
2018-12-15 00:10:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:10:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:10:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:10:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:10:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 00:10:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:10:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:10:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:10:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:20:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:20:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:20:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:20:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:20:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:20:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:20:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:20:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:20:00 001e06114503 41.66608 -87.53937
2018-12-15 00:20:00 001e06113107 41.75114 -87.71299

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfH2S4a<-NULL

for(i in unique(dfH2S4$by10)){
  
subset <- 
      dfH2S4%>%
      filter(by10==i)

if(nrow(subset)>1){
  subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
  subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
  P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    dfH2S4a<-rbind(dfH2S4a, df1)
}else{
  df1<-NULL
  df1$by10<-i
  df1$AreaProp<-0
  df1<-as.data.frame(df1)
  dfH2S4a<-rbind(dfH2S4a, df1)
}
  
}

dfH2S4a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfH2S4a
dfH2S4a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.5639259 57.91328
2018-12-15 00:10:00 0.5639259 57.91328
2018-12-15 00:20:00 0.5639259 57.91328
2018-12-15 00:30:00 0.5639259 57.91328
2018-12-15 00:40:00 0.5639259 57.91328
2018-12-15 00:50:00 0.5639259 57.91328
2018-12-15 01:00:00 0.5639259 57.91328
2018-12-15 01:10:00 0.5639259 57.91328
2018-12-15 01:20:00 0.5639259 57.91328
2018-12-15 01:30:00 0.5639259 57.91328
2018-12-15 01:40:00 0.5639259 57.91328
2018-12-15 01:50:00 0.5639259 57.91328
2018-12-15 02:00:00 0.5639259 57.91328
2018-12-15 02:10:00 0.5639259 57.91328
2018-12-15 02:20:00 0.5639259 57.91328
2018-12-15 02:30:00 0.5639259 57.91328
2018-12-15 02:40:00 0.5639259 57.91328
2018-12-15 02:50:00 0.5639259 57.91328
2018-12-15 03:00:00 0.5639259 57.91328
2018-12-15 03:10:00 0.5639259 57.91328
2018-12-15 03:20:00 0.5639259 57.91328
2018-12-15 03:30:00 0.5639259 57.91328
2018-12-15 03:40:00 0.5639259 57.91328
2018-12-15 03:50:00 0.5639259 57.91328
2018-12-15 04:00:00 0.5639259 57.91328
2018-12-15 04:10:00 0.5639259 57.91328
2018-12-15 04:20:00 0.5639259 57.91328
2018-12-15 04:30:00 0.5639259 57.91328
2018-12-15 04:40:00 0.5639259 57.91328
2018-12-15 04:50:00 0.5639259 57.91328
2018-12-15 05:00:00 0.5639259 57.91328
2018-12-15 05:10:00 0.5639259 57.91328
2018-12-15 05:20:00 0.5639259 57.91328
2018-12-15 05:30:00 0.5639259 57.91328
2018-12-15 05:40:00 0.5639259 57.91328
2018-12-15 05:50:00 0.5639259 57.91328
2018-12-15 06:00:00 0.5639259 57.91328
2018-12-15 06:10:00 0.5639259 57.91328
2018-12-15 06:20:00 0.5639259 57.91328
2018-12-15 06:30:00 0.5639259 57.91328
2018-12-15 06:40:00 0.5639259 57.91328
2018-12-15 06:50:00 0.5639259 57.91328
2018-12-15 07:00:00 0.5639259 57.91328
2018-12-15 07:10:00 0.5639259 57.91328
2018-12-15 07:20:00 0.5639259 57.91328
2018-12-15 07:30:00 0.5639259 57.91328
2018-12-15 07:40:00 0.5639259 57.91328
2018-12-15 07:50:00 0.5639259 57.91328
2018-12-15 08:00:00 0.5639259 57.91328
2018-12-15 08:10:00 0.5639259 57.91328
2018-12-15 08:20:00 0.5639259 57.91328
2018-12-15 08:30:00 0.5639259 57.91328
2018-12-15 08:40:00 0.5639259 57.91328
2018-12-15 08:50:00 0.5639259 57.91328
2018-12-15 09:00:00 0.6010409 57.91328
2018-12-15 09:10:00 0.6010409 57.91328
2018-12-15 09:20:00 0.6010409 57.91328
2018-12-15 09:30:00 0.6010409 57.91328
2018-12-15 09:40:00 0.6010409 57.91328
2018-12-15 09:50:00 0.6010409 57.91328
2018-12-15 10:00:00 0.6010409 57.91328
2018-12-15 10:10:00 0.6010409 57.91328
2018-12-15 10:20:00 0.6010409 57.91328
2018-12-15 10:30:00 0.6010409 57.91328
2018-12-15 10:40:00 0.6010409 57.91328
2018-12-15 10:50:00 0.6010409 57.91328
2018-12-15 11:00:00 0.6010409 57.91328
2018-12-15 11:10:00 0.6010409 57.91328
2018-12-15 11:20:00 0.6010409 57.91328
2018-12-15 11:30:00 0.6010409 57.91328
2018-12-15 11:40:00 0.6010409 57.91328
2018-12-15 11:50:00 0.6010409 57.91328
2018-12-15 12:00:00 0.6010409 57.91328
2018-12-15 12:10:00 0.6010409 57.91328
2018-12-15 12:20:00 0.6010409 57.91328
2018-12-15 12:30:00 0.6010409 57.91328
2018-12-15 12:40:00 0.6010409 57.91328
2018-12-15 12:50:00 0.5639259 57.91328
2018-12-15 13:00:00 0.5639259 57.91328
2018-12-15 13:10:00 0.5639259 57.91328
2018-12-15 13:20:00 0.6010409 57.91328
2018-12-15 13:30:00 0.6010409 57.91328
2018-12-15 13:40:00 0.6010409 57.91328
2018-12-15 13:50:00 0.6010409 57.91328
2018-12-15 14:00:00 0.6010409 57.91328
2018-12-15 14:10:00 0.6010409 57.91328
2018-12-15 14:20:00 0.5639259 57.91328
2018-12-15 14:30:00 0.6010409 57.91328
2018-12-15 14:40:00 0.6010409 57.91328
2018-12-15 14:50:00 0.5639259 57.91328
2018-12-15 15:00:00 0.6010409 57.91328
2018-12-15 15:10:00 0.5639259 57.91328
2018-12-15 15:20:00 0.6010409 57.91328
2018-12-15 15:30:00 0.5639259 57.91328
2018-12-15 15:40:00 0.5639259 57.91328
2018-12-15 15:50:00 0.6010409 57.91328
2018-12-15 16:00:00 0.5639259 57.91328
2018-12-15 16:10:00 0.6010409 57.91328
2018-12-15 16:20:00 0.5639259 57.91328
2018-12-15 16:30:00 0.6010409 57.91328
2018-12-15 16:40:00 0.5639259 57.91328
2018-12-15 16:50:00 0.5639259 57.91328
2018-12-15 17:00:00 0.5639259 57.91328
2018-12-15 17:10:00 0.5639259 57.91328
2018-12-15 17:20:00 0.5639259 57.91328
2018-12-15 17:30:00 0.5639259 57.91328
2018-12-15 17:40:00 0.5639259 57.91328
2018-12-15 17:50:00 0.5639259 57.91328
2018-12-15 18:00:00 0.5639259 57.91328
2018-12-15 18:10:00 0.6010409 57.91328
2018-12-15 18:20:00 0.5639259 57.91328
2018-12-15 18:30:00 0.5639259 57.91328
2018-12-15 18:40:00 0.5639259 57.91328
2018-12-15 18:50:00 0.5639259 57.91328
2018-12-15 19:00:00 0.5639259 57.91328
2018-12-15 19:10:00 0.5639259 57.91328
2018-12-15 19:20:00 0.5639259 57.91328
2018-12-15 19:30:00 0.6010409 57.91328
2018-12-15 19:40:00 0.5639259 57.91328
2018-12-15 19:50:00 0.5639259 57.91328
2018-12-15 20:00:00 0.6010409 57.91328
2018-12-15 20:10:00 0.5639259 57.91328
2018-12-15 20:20:00 0.5639259 57.91328
2018-12-15 20:30:00 0.6010409 57.91328
2018-12-15 20:40:00 0.5639259 57.91328
2018-12-15 20:50:00 0.6010409 57.91328
2018-12-15 21:00:00 0.6010409 57.91328
2018-12-15 21:10:00 0.6010409 57.91328
2018-12-15 21:20:00 0.6010409 57.91328
2018-12-15 21:30:00 0.6010409 57.91328
2018-12-15 21:40:00 0.6010409 57.91328
2018-12-15 21:50:00 0.6010409 57.91328
2018-12-15 22:00:00 0.6010409 57.91328
2018-12-15 22:10:00 0.6010409 57.91328
2018-12-15 22:20:00 0.6010409 57.91328
2018-12-15 22:30:00 0.6010409 57.91328
2018-12-15 22:40:00 0.6010409 57.91328
2018-12-15 22:50:00 0.6010409 57.91328
2018-12-15 23:00:00 0.6010409 57.91328
2018-12-15 23:10:00 0.6010409 57.91328
2018-12-15 23:20:00 0.6010409 57.91328
2018-12-15 23:30:00 0.6010409 57.91328
2018-12-15 23:40:00 0.6010409 57.91328
2018-12-15 23:50:00 0.6010409 57.91328

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the H2S concentration network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfH2S4$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfH2S4b<-merge(dfH2S4, by10, by='by10', all.x=TRUE)

for(i in unique(dfH2S4b$by10)){

subset <- 
      dfH2S4b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT H2S concentration Network on 2012-12-15,

  • Score 3 = 19.8 : At any given 10-minute time interval in any given node, an average 19.8% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 57.9 : At any given 10-minute time interval in any given node, reliable data is collected for an average 57.9% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

NO2 Concentration

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable NO2 concentration data in the AoT network vary across the day’s duration.

dfNO2%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfNO23

ggplot(data=dfNO23, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfNO23$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable NO2 concentration data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable NO2 concentration data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfNO23%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 16 18.60465 20.00162
2018-12-15 00:10:00 16 18.60465 20.00162
2018-12-15 00:20:00 17 19.76744 20.00162
2018-12-15 00:30:00 16 18.60465 20.00162
2018-12-15 00:40:00 16 18.60465 20.00162
2018-12-15 00:50:00 16 18.60465 20.00162
2018-12-15 01:00:00 16 18.60465 20.00162
2018-12-15 01:10:00 17 19.76744 20.00162
2018-12-15 01:20:00 16 18.60465 20.00162
2018-12-15 01:30:00 17 19.76744 20.00162
2018-12-15 01:40:00 16 18.60465 20.00162
2018-12-15 01:50:00 17 19.76744 20.00162
2018-12-15 02:00:00 17 19.76744 20.00162
2018-12-15 02:10:00 17 19.76744 20.00162
2018-12-15 02:20:00 16 18.60465 20.00162
2018-12-15 02:30:00 16 18.60465 20.00162
2018-12-15 02:40:00 17 19.76744 20.00162
2018-12-15 02:50:00 17 19.76744 20.00162
2018-12-15 03:00:00 16 18.60465 20.00162
2018-12-15 03:10:00 18 20.93023 20.00162
2018-12-15 03:20:00 18 20.93023 20.00162
2018-12-15 03:30:00 17 19.76744 20.00162
2018-12-15 03:40:00 17 19.76744 20.00162
2018-12-15 03:50:00 18 20.93023 20.00162
2018-12-15 04:00:00 17 19.76744 20.00162
2018-12-15 04:10:00 18 20.93023 20.00162
2018-12-15 04:20:00 16 18.60465 20.00162
2018-12-15 04:30:00 18 20.93023 20.00162
2018-12-15 04:40:00 18 20.93023 20.00162
2018-12-15 04:50:00 18 20.93023 20.00162
2018-12-15 05:00:00 17 19.76744 20.00162
2018-12-15 05:10:00 17 19.76744 20.00162
2018-12-15 05:20:00 18 20.93023 20.00162
2018-12-15 05:30:00 18 20.93023 20.00162
2018-12-15 05:40:00 18 20.93023 20.00162
2018-12-15 05:50:00 18 20.93023 20.00162
2018-12-15 06:00:00 17 19.76744 20.00162
2018-12-15 06:10:00 18 20.93023 20.00162
2018-12-15 06:20:00 18 20.93023 20.00162
2018-12-15 06:30:00 17 19.76744 20.00162
2018-12-15 06:40:00 18 20.93023 20.00162
2018-12-15 06:50:00 17 19.76744 20.00162
2018-12-15 07:00:00 18 20.93023 20.00162
2018-12-15 07:10:00 17 19.76744 20.00162
2018-12-15 07:20:00 18 20.93023 20.00162
2018-12-15 07:30:00 17 19.76744 20.00162
2018-12-15 07:40:00 17 19.76744 20.00162
2018-12-15 07:50:00 17 19.76744 20.00162
2018-12-15 08:00:00 17 19.76744 20.00162
2018-12-15 08:10:00 18 20.93023 20.00162
2018-12-15 08:20:00 17 19.76744 20.00162
2018-12-15 08:30:00 18 20.93023 20.00162
2018-12-15 08:40:00 18 20.93023 20.00162
2018-12-15 08:50:00 16 18.60465 20.00162
2018-12-15 09:00:00 16 18.60465 20.00162
2018-12-15 09:10:00 16 18.60465 20.00162
2018-12-15 09:20:00 16 18.60465 20.00162
2018-12-15 09:30:00 17 19.76744 20.00162
2018-12-15 09:40:00 16 18.60465 20.00162
2018-12-15 09:50:00 16 18.60465 20.00162
2018-12-15 10:00:00 16 18.60465 20.00162
2018-12-15 10:10:00 17 19.76744 20.00162
2018-12-15 10:20:00 16 18.60465 20.00162
2018-12-15 10:30:00 17 19.76744 20.00162
2018-12-15 10:40:00 16 18.60465 20.00162
2018-12-15 10:50:00 17 19.76744 20.00162
2018-12-15 11:00:00 16 18.60465 20.00162
2018-12-15 11:10:00 17 19.76744 20.00162
2018-12-15 11:20:00 16 18.60465 20.00162
2018-12-15 11:30:00 16 18.60465 20.00162
2018-12-15 11:40:00 18 20.93023 20.00162
2018-12-15 11:50:00 17 19.76744 20.00162
2018-12-15 12:00:00 18 20.93023 20.00162
2018-12-15 12:10:00 18 20.93023 20.00162
2018-12-15 12:20:00 18 20.93023 20.00162
2018-12-15 12:30:00 17 19.76744 20.00162
2018-12-15 12:40:00 17 19.76744 20.00162
2018-12-15 12:50:00 17 19.76744 20.00162
2018-12-15 13:00:00 17 19.76744 20.00162
2018-12-15 13:10:00 17 19.76744 20.00162
2018-12-15 13:20:00 17 19.76744 20.00162
2018-12-15 13:30:00 18 20.93023 20.00162
2018-12-15 13:40:00 17 19.76744 20.00162
2018-12-15 13:50:00 18 20.93023 20.00162
2018-12-15 14:00:00 18 20.93023 20.00162
2018-12-15 14:10:00 18 20.93023 20.00162
2018-12-15 14:20:00 18 20.93023 20.00162
2018-12-15 14:30:00 17 19.76744 20.00162
2018-12-15 14:40:00 17 19.76744 20.00162
2018-12-15 14:50:00 18 20.93023 20.00162
2018-12-15 15:00:00 17 19.76744 20.00162
2018-12-15 15:10:00 17 19.76744 20.00162
2018-12-15 15:20:00 17 19.76744 20.00162
2018-12-15 15:30:00 17 19.76744 20.00162
2018-12-15 15:40:00 16 18.60465 20.00162
2018-12-15 15:50:00 16 18.60465 20.00162
2018-12-15 16:00:00 16 18.60465 20.00162
2018-12-15 16:10:00 17 19.76744 20.00162
2018-12-15 16:20:00 17 19.76744 20.00162
2018-12-15 16:30:00 17 19.76744 20.00162
2018-12-15 16:40:00 17 19.76744 20.00162
2018-12-15 16:50:00 18 20.93023 20.00162
2018-12-15 17:00:00 18 20.93023 20.00162
2018-12-15 17:10:00 18 20.93023 20.00162
2018-12-15 17:20:00 18 20.93023 20.00162
2018-12-15 17:30:00 18 20.93023 20.00162
2018-12-15 17:40:00 18 20.93023 20.00162
2018-12-15 17:50:00 18 20.93023 20.00162
2018-12-15 18:00:00 18 20.93023 20.00162
2018-12-15 18:10:00 18 20.93023 20.00162
2018-12-15 18:20:00 18 20.93023 20.00162
2018-12-15 18:30:00 17 19.76744 20.00162
2018-12-15 18:40:00 18 20.93023 20.00162
2018-12-15 18:50:00 18 20.93023 20.00162
2018-12-15 19:00:00 18 20.93023 20.00162
2018-12-15 19:10:00 17 19.76744 20.00162
2018-12-15 19:20:00 18 20.93023 20.00162
2018-12-15 19:30:00 18 20.93023 20.00162
2018-12-15 19:40:00 17 19.76744 20.00162
2018-12-15 19:50:00 18 20.93023 20.00162
2018-12-15 20:00:00 18 20.93023 20.00162
2018-12-15 20:10:00 18 20.93023 20.00162
2018-12-15 20:20:00 17 19.76744 20.00162
2018-12-15 20:30:00 17 19.76744 20.00162
2018-12-15 20:40:00 18 20.93023 20.00162
2018-12-15 20:50:00 18 20.93023 20.00162
2018-12-15 21:00:00 18 20.93023 20.00162
2018-12-15 21:10:00 18 20.93023 20.00162
2018-12-15 21:20:00 18 20.93023 20.00162
2018-12-15 21:30:00 18 20.93023 20.00162
2018-12-15 21:40:00 17 19.76744 20.00162
2018-12-15 21:50:00 17 19.76744 20.00162
2018-12-15 22:00:00 17 19.76744 20.00162
2018-12-15 22:10:00 17 19.76744 20.00162
2018-12-15 22:20:00 17 19.76744 20.00162
2018-12-15 22:30:00 17 19.76744 20.00162
2018-12-15 22:40:00 17 19.76744 20.00162
2018-12-15 22:50:00 18 20.93023 20.00162
2018-12-15 23:00:00 17 19.76744 20.00162
2018-12-15 23:10:00 18 20.93023 20.00162
2018-12-15 23:20:00 18 20.93023 20.00162
2018-12-15 23:30:00 17 19.76744 20.00162
2018-12-15 23:40:00 17 19.76744 20.00162
2018-12-15 23:50:00 17 19.76744 20.00162

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for NO2 concentration data on 2012-12-15.

dfNO23%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfNO2%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfNO24

dfNO24%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
by10 node_id lat lon
2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:00:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:00:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:00:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:00:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:00:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:00:00 001e06114503 41.66608 -87.53937
2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:00:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:00:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:00:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:00:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 00:10:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:10:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:10:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:10:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:10:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:10:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:10:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:10:00 001e06114503 41.66608 -87.53937
2018-12-15 00:10:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:10:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:10:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:10:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:10:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:10:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:10:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:20:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:20:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:20:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:20:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:20:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:20:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:20:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:20:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:20:00 001e06113107 41.75114 -87.71299

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfNO24a<-NULL

for(i in unique(dfNO24$by10)){
  
subset <- 
      dfNO24%>%
      filter(by10==i)

if(nrow(subset)>1){
  subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
  subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
  P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    dfNO24a<-rbind(dfNO24a, df1)
}else{
  df1<-NULL
  df1$by10<-i
  df1$AreaProp<-0
  df1<-as.data.frame(df1)
  dfNO24a<-rbind(dfNO24a, df1)
}
  
}

dfNO24a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfNO24a
dfNO24a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.6010409 59.82058
2018-12-15 00:10:00 0.6010409 59.82058
2018-12-15 00:20:00 0.6010409 59.82058
2018-12-15 00:30:00 0.6010409 59.82058
2018-12-15 00:40:00 0.6010409 59.82058
2018-12-15 00:50:00 0.6010409 59.82058
2018-12-15 01:00:00 0.6010409 59.82058
2018-12-15 01:10:00 0.6010409 59.82058
2018-12-15 01:20:00 0.5639259 59.82058
2018-12-15 01:30:00 0.5639259 59.82058
2018-12-15 01:40:00 0.5639259 59.82058
2018-12-15 01:50:00 0.5639259 59.82058
2018-12-15 02:00:00 0.5639259 59.82058
2018-12-15 02:10:00 0.5639259 59.82058
2018-12-15 02:20:00 0.5639259 59.82058
2018-12-15 02:30:00 0.5639259 59.82058
2018-12-15 02:40:00 0.5639259 59.82058
2018-12-15 02:50:00 0.5639259 59.82058
2018-12-15 03:00:00 0.5639259 59.82058
2018-12-15 03:10:00 0.6010409 59.82058
2018-12-15 03:20:00 0.6010409 59.82058
2018-12-15 03:30:00 0.6010409 59.82058
2018-12-15 03:40:00 0.6010409 59.82058
2018-12-15 03:50:00 0.6010409 59.82058
2018-12-15 04:00:00 0.6010409 59.82058
2018-12-15 04:10:00 0.6010409 59.82058
2018-12-15 04:20:00 0.6010409 59.82058
2018-12-15 04:30:00 0.6010409 59.82058
2018-12-15 04:40:00 0.6010409 59.82058
2018-12-15 04:50:00 0.6010409 59.82058
2018-12-15 05:00:00 0.6010409 59.82058
2018-12-15 05:10:00 0.6010409 59.82058
2018-12-15 05:20:00 0.6010409 59.82058
2018-12-15 05:30:00 0.6010409 59.82058
2018-12-15 05:40:00 0.6010409 59.82058
2018-12-15 05:50:00 0.6010409 59.82058
2018-12-15 06:00:00 0.6010409 59.82058
2018-12-15 06:10:00 0.6010409 59.82058
2018-12-15 06:20:00 0.6010409 59.82058
2018-12-15 06:30:00 0.6010409 59.82058
2018-12-15 06:40:00 0.6010409 59.82058
2018-12-15 06:50:00 0.6010409 59.82058
2018-12-15 07:00:00 0.6010409 59.82058
2018-12-15 07:10:00 0.6010409 59.82058
2018-12-15 07:20:00 0.6010409 59.82058
2018-12-15 07:30:00 0.6010409 59.82058
2018-12-15 07:40:00 0.6010409 59.82058
2018-12-15 07:50:00 0.6010409 59.82058
2018-12-15 08:00:00 0.6010409 59.82058
2018-12-15 08:10:00 0.6010409 59.82058
2018-12-15 08:20:00 0.6010409 59.82058
2018-12-15 08:30:00 0.6010409 59.82058
2018-12-15 08:40:00 0.6010409 59.82058
2018-12-15 08:50:00 0.6010409 59.82058
2018-12-15 09:00:00 0.6010409 59.82058
2018-12-15 09:10:00 0.6010409 59.82058
2018-12-15 09:20:00 0.6010409 59.82058
2018-12-15 09:30:00 0.6010409 59.82058
2018-12-15 09:40:00 0.6010409 59.82058
2018-12-15 09:50:00 0.6010409 59.82058
2018-12-15 10:00:00 0.6010409 59.82058
2018-12-15 10:10:00 0.6010409 59.82058
2018-12-15 10:20:00 0.6010409 59.82058
2018-12-15 10:30:00 0.6010409 59.82058
2018-12-15 10:40:00 0.6010409 59.82058
2018-12-15 10:50:00 0.6010409 59.82058
2018-12-15 11:00:00 0.6010409 59.82058
2018-12-15 11:10:00 0.6010409 59.82058
2018-12-15 11:20:00 0.6010409 59.82058
2018-12-15 11:30:00 0.6010409 59.82058
2018-12-15 11:40:00 0.6010409 59.82058
2018-12-15 11:50:00 0.6010409 59.82058
2018-12-15 12:00:00 0.6010409 59.82058
2018-12-15 12:10:00 0.6010409 59.82058
2018-12-15 12:20:00 0.6010409 59.82058
2018-12-15 12:30:00 0.6010409 59.82058
2018-12-15 12:40:00 0.6010409 59.82058
2018-12-15 12:50:00 0.6010409 59.82058
2018-12-15 13:00:00 0.6010409 59.82058
2018-12-15 13:10:00 0.6010409 59.82058
2018-12-15 13:20:00 0.6010409 59.82058
2018-12-15 13:30:00 0.6010409 59.82058
2018-12-15 13:40:00 0.6010409 59.82058
2018-12-15 13:50:00 0.6010409 59.82058
2018-12-15 14:00:00 0.6010409 59.82058
2018-12-15 14:10:00 0.6010409 59.82058
2018-12-15 14:20:00 0.6010409 59.82058
2018-12-15 14:30:00 0.6010409 59.82058
2018-12-15 14:40:00 0.6010409 59.82058
2018-12-15 14:50:00 0.6010409 59.82058
2018-12-15 15:00:00 0.6010409 59.82058
2018-12-15 15:10:00 0.6010409 59.82058
2018-12-15 15:20:00 0.6010409 59.82058
2018-12-15 15:30:00 0.6010409 59.82058
2018-12-15 15:40:00 0.6010409 59.82058
2018-12-15 15:50:00 0.6010409 59.82058
2018-12-15 16:00:00 0.6010409 59.82058
2018-12-15 16:10:00 0.6010409 59.82058
2018-12-15 16:20:00 0.6010409 59.82058
2018-12-15 16:30:00 0.6010409 59.82058
2018-12-15 16:40:00 0.6010409 59.82058
2018-12-15 16:50:00 0.6010409 59.82058
2018-12-15 17:00:00 0.6010409 59.82058
2018-12-15 17:10:00 0.6010409 59.82058
2018-12-15 17:20:00 0.6010409 59.82058
2018-12-15 17:30:00 0.6010409 59.82058
2018-12-15 17:40:00 0.6010409 59.82058
2018-12-15 17:50:00 0.6010409 59.82058
2018-12-15 18:00:00 0.6010409 59.82058
2018-12-15 18:10:00 0.6010409 59.82058
2018-12-15 18:20:00 0.6010409 59.82058
2018-12-15 18:30:00 0.6010409 59.82058
2018-12-15 18:40:00 0.6010409 59.82058
2018-12-15 18:50:00 0.6010409 59.82058
2018-12-15 19:00:00 0.6010409 59.82058
2018-12-15 19:10:00 0.6010409 59.82058
2018-12-15 19:20:00 0.6010409 59.82058
2018-12-15 19:30:00 0.6010409 59.82058
2018-12-15 19:40:00 0.6010409 59.82058
2018-12-15 19:50:00 0.6010409 59.82058
2018-12-15 20:00:00 0.6010409 59.82058
2018-12-15 20:10:00 0.6010409 59.82058
2018-12-15 20:20:00 0.6010409 59.82058
2018-12-15 20:30:00 0.6010409 59.82058
2018-12-15 20:40:00 0.6010409 59.82058
2018-12-15 20:50:00 0.6010409 59.82058
2018-12-15 21:00:00 0.6010409 59.82058
2018-12-15 21:10:00 0.6010409 59.82058
2018-12-15 21:20:00 0.6010409 59.82058
2018-12-15 21:30:00 0.6010409 59.82058
2018-12-15 21:40:00 0.6010409 59.82058
2018-12-15 21:50:00 0.6010409 59.82058
2018-12-15 22:00:00 0.6010409 59.82058
2018-12-15 22:10:00 0.6010409 59.82058
2018-12-15 22:20:00 0.6010409 59.82058
2018-12-15 22:30:00 0.6010409 59.82058
2018-12-15 22:40:00 0.6010409 59.82058
2018-12-15 22:50:00 0.6010409 59.82058
2018-12-15 23:00:00 0.6010409 59.82058
2018-12-15 23:10:00 0.6010409 59.82058
2018-12-15 23:20:00 0.6010409 59.82058
2018-12-15 23:30:00 0.6010409 59.82058
2018-12-15 23:40:00 0.6010409 59.82058
2018-12-15 23:50:00 0.6010409 59.82058

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the NO2 concentration network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfNO24$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfNO24b<-merge(dfNO24, by10, by='by10', all.x=TRUE)

for(i in unique(dfNO24b$by10)){

subset <- 
      dfNO24b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT NO2 concentration Network on 2012-12-15,

  • Score 3 = 20.0 : At any given 10-minute time interval in any given node, an average 20.0% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 59.8 : At any given 10-minute time interval in any given node, reliable data is collected for an average 59.8% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

O3 Concentration

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable O3 concentration data in the AoT network vary across the day’s duration.

dfO3%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfO33

ggplot(data=dfO33, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfO33$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable O3 concentration data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable O3 concentration data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfO33%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 16 18.60465 20.59109
2018-12-15 00:10:00 16 18.60465 20.59109
2018-12-15 00:20:00 17 19.76744 20.59109
2018-12-15 00:30:00 17 19.76744 20.59109
2018-12-15 00:40:00 16 18.60465 20.59109
2018-12-15 00:50:00 17 19.76744 20.59109
2018-12-15 01:00:00 17 19.76744 20.59109
2018-12-15 01:10:00 17 19.76744 20.59109
2018-12-15 01:20:00 17 19.76744 20.59109
2018-12-15 01:30:00 17 19.76744 20.59109
2018-12-15 01:40:00 17 19.76744 20.59109
2018-12-15 01:50:00 17 19.76744 20.59109
2018-12-15 02:00:00 17 19.76744 20.59109
2018-12-15 02:10:00 17 19.76744 20.59109
2018-12-15 02:20:00 17 19.76744 20.59109
2018-12-15 02:30:00 17 19.76744 20.59109
2018-12-15 02:40:00 17 19.76744 20.59109
2018-12-15 02:50:00 17 19.76744 20.59109
2018-12-15 03:00:00 17 19.76744 20.59109
2018-12-15 03:10:00 18 20.93023 20.59109
2018-12-15 03:20:00 18 20.93023 20.59109
2018-12-15 03:30:00 18 20.93023 20.59109
2018-12-15 03:40:00 18 20.93023 20.59109
2018-12-15 03:50:00 18 20.93023 20.59109
2018-12-15 04:00:00 18 20.93023 20.59109
2018-12-15 04:10:00 18 20.93023 20.59109
2018-12-15 04:20:00 18 20.93023 20.59109
2018-12-15 04:30:00 18 20.93023 20.59109
2018-12-15 04:40:00 18 20.93023 20.59109
2018-12-15 04:50:00 18 20.93023 20.59109
2018-12-15 05:00:00 18 20.93023 20.59109
2018-12-15 05:10:00 18 20.93023 20.59109
2018-12-15 05:20:00 18 20.93023 20.59109
2018-12-15 05:30:00 18 20.93023 20.59109
2018-12-15 05:40:00 18 20.93023 20.59109
2018-12-15 05:50:00 18 20.93023 20.59109
2018-12-15 06:00:00 18 20.93023 20.59109
2018-12-15 06:10:00 18 20.93023 20.59109
2018-12-15 06:20:00 18 20.93023 20.59109
2018-12-15 06:30:00 18 20.93023 20.59109
2018-12-15 06:40:00 18 20.93023 20.59109
2018-12-15 06:50:00 18 20.93023 20.59109
2018-12-15 07:00:00 18 20.93023 20.59109
2018-12-15 07:10:00 18 20.93023 20.59109
2018-12-15 07:20:00 18 20.93023 20.59109
2018-12-15 07:30:00 18 20.93023 20.59109
2018-12-15 07:40:00 18 20.93023 20.59109
2018-12-15 07:50:00 18 20.93023 20.59109
2018-12-15 08:00:00 18 20.93023 20.59109
2018-12-15 08:10:00 18 20.93023 20.59109
2018-12-15 08:20:00 18 20.93023 20.59109
2018-12-15 08:30:00 18 20.93023 20.59109
2018-12-15 08:40:00 18 20.93023 20.59109
2018-12-15 08:50:00 18 20.93023 20.59109
2018-12-15 09:00:00 18 20.93023 20.59109
2018-12-15 09:10:00 18 20.93023 20.59109
2018-12-15 09:20:00 18 20.93023 20.59109
2018-12-15 09:30:00 18 20.93023 20.59109
2018-12-15 09:40:00 18 20.93023 20.59109
2018-12-15 09:50:00 18 20.93023 20.59109
2018-12-15 10:00:00 18 20.93023 20.59109
2018-12-15 10:10:00 18 20.93023 20.59109
2018-12-15 10:20:00 17 19.76744 20.59109
2018-12-15 10:30:00 17 19.76744 20.59109
2018-12-15 10:40:00 16 18.60465 20.59109
2018-12-15 10:50:00 17 19.76744 20.59109
2018-12-15 11:00:00 17 19.76744 20.59109
2018-12-15 11:10:00 17 19.76744 20.59109
2018-12-15 11:20:00 18 20.93023 20.59109
2018-12-15 11:30:00 18 20.93023 20.59109
2018-12-15 11:40:00 18 20.93023 20.59109
2018-12-15 11:50:00 17 19.76744 20.59109
2018-12-15 12:00:00 18 20.93023 20.59109
2018-12-15 12:10:00 17 19.76744 20.59109
2018-12-15 12:20:00 18 20.93023 20.59109
2018-12-15 12:30:00 18 20.93023 20.59109
2018-12-15 12:40:00 18 20.93023 20.59109
2018-12-15 12:50:00 18 20.93023 20.59109
2018-12-15 13:00:00 18 20.93023 20.59109
2018-12-15 13:10:00 18 20.93023 20.59109
2018-12-15 13:20:00 18 20.93023 20.59109
2018-12-15 13:30:00 18 20.93023 20.59109
2018-12-15 13:40:00 18 20.93023 20.59109
2018-12-15 13:50:00 18 20.93023 20.59109
2018-12-15 14:00:00 18 20.93023 20.59109
2018-12-15 14:10:00 18 20.93023 20.59109
2018-12-15 14:20:00 18 20.93023 20.59109
2018-12-15 14:30:00 18 20.93023 20.59109
2018-12-15 14:40:00 18 20.93023 20.59109
2018-12-15 14:50:00 18 20.93023 20.59109
2018-12-15 15:00:00 17 19.76744 20.59109
2018-12-15 15:10:00 17 19.76744 20.59109
2018-12-15 15:20:00 17 19.76744 20.59109
2018-12-15 15:30:00 17 19.76744 20.59109
2018-12-15 15:40:00 17 19.76744 20.59109
2018-12-15 15:50:00 17 19.76744 20.59109
2018-12-15 16:00:00 17 19.76744 20.59109
2018-12-15 16:10:00 17 19.76744 20.59109
2018-12-15 16:20:00 17 19.76744 20.59109
2018-12-15 16:30:00 17 19.76744 20.59109
2018-12-15 16:40:00 17 19.76744 20.59109
2018-12-15 16:50:00 18 20.93023 20.59109
2018-12-15 17:00:00 18 20.93023 20.59109
2018-12-15 17:10:00 18 20.93023 20.59109
2018-12-15 17:20:00 18 20.93023 20.59109
2018-12-15 17:30:00 18 20.93023 20.59109
2018-12-15 17:40:00 18 20.93023 20.59109
2018-12-15 17:50:00 18 20.93023 20.59109
2018-12-15 18:00:00 18 20.93023 20.59109
2018-12-15 18:10:00 18 20.93023 20.59109
2018-12-15 18:20:00 18 20.93023 20.59109
2018-12-15 18:30:00 18 20.93023 20.59109
2018-12-15 18:40:00 18 20.93023 20.59109
2018-12-15 18:50:00 18 20.93023 20.59109
2018-12-15 19:00:00 18 20.93023 20.59109
2018-12-15 19:10:00 18 20.93023 20.59109
2018-12-15 19:20:00 18 20.93023 20.59109
2018-12-15 19:30:00 18 20.93023 20.59109
2018-12-15 19:40:00 18 20.93023 20.59109
2018-12-15 19:50:00 18 20.93023 20.59109
2018-12-15 20:00:00 18 20.93023 20.59109
2018-12-15 20:10:00 18 20.93023 20.59109
2018-12-15 20:20:00 18 20.93023 20.59109
2018-12-15 20:30:00 18 20.93023 20.59109
2018-12-15 20:40:00 18 20.93023 20.59109
2018-12-15 20:50:00 18 20.93023 20.59109
2018-12-15 21:00:00 18 20.93023 20.59109
2018-12-15 21:10:00 18 20.93023 20.59109
2018-12-15 21:20:00 18 20.93023 20.59109
2018-12-15 21:30:00 18 20.93023 20.59109
2018-12-15 21:40:00 18 20.93023 20.59109
2018-12-15 21:50:00 18 20.93023 20.59109
2018-12-15 22:00:00 18 20.93023 20.59109
2018-12-15 22:10:00 18 20.93023 20.59109
2018-12-15 22:20:00 18 20.93023 20.59109
2018-12-15 22:30:00 18 20.93023 20.59109
2018-12-15 22:40:00 18 20.93023 20.59109
2018-12-15 22:50:00 18 20.93023 20.59109
2018-12-15 23:00:00 18 20.93023 20.59109
2018-12-15 23:10:00 18 20.93023 20.59109
2018-12-15 23:20:00 18 20.93023 20.59109
2018-12-15 23:30:00 18 20.93023 20.59109
2018-12-15 23:40:00 18 20.93023 20.59109
2018-12-15 23:50:00 18 20.93023 20.59109

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for O3 concentration data on 2012-12-15.

dfO33%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfO3%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfO34

dfO34%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
by10 node_id lat lon
2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:00:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:00:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:00:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:00:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:00:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:00:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:00:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:00:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:00:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:00:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:10:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:10:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 00:10:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:10:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:10:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:10:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:10:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:10:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:10:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:10:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:10:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:10:00 001e0610bc10 41.73631 -87.62418
2018-12-15 00:10:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:10:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:10:00 001e06113ace 41.83107 -87.61730
2018-12-15 00:20:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:20:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:20:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:20:00 001e0610eef2 41.96526 -87.66672
2018-12-15 00:20:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:20:00 001e061130f4 41.89616 -87.66239
2018-12-15 00:20:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:20:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:20:00 001e061144c0 41.76412 -87.72242

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfO34a<-NULL

for(i in unique(dfO34$by10)){
  
subset <- 
      dfO34%>%
      filter(by10==i)

if(nrow(subset)>1){
  subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
  subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
  P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    dfO34a<-rbind(dfO34a, df1)
}else{
  df1<-NULL
  df1$by10<-i
  df1$AreaProp<-0
  df1<-as.data.frame(df1)
  dfO34a<-rbind(dfO34a, df1)
}
  
}

dfO34a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfO34a
dfO34a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.4364638 59.04799
2018-12-15 00:10:00 0.4364638 59.04799
2018-12-15 00:20:00 0.6010409 59.04799
2018-12-15 00:30:00 0.6010409 59.04799
2018-12-15 00:40:00 0.4364638 59.04799
2018-12-15 00:50:00 0.6010409 59.04799
2018-12-15 01:00:00 0.6010409 59.04799
2018-12-15 01:10:00 0.6010409 59.04799
2018-12-15 01:20:00 0.5639259 59.04799
2018-12-15 01:30:00 0.5639259 59.04799
2018-12-15 01:40:00 0.5639259 59.04799
2018-12-15 01:50:00 0.5639259 59.04799
2018-12-15 02:00:00 0.5639259 59.04799
2018-12-15 02:10:00 0.5639259 59.04799
2018-12-15 02:20:00 0.5639259 59.04799
2018-12-15 02:30:00 0.5639259 59.04799
2018-12-15 02:40:00 0.5639259 59.04799
2018-12-15 02:50:00 0.5639259 59.04799
2018-12-15 03:00:00 0.5639259 59.04799
2018-12-15 03:10:00 0.6010409 59.04799
2018-12-15 03:20:00 0.6010409 59.04799
2018-12-15 03:30:00 0.6010409 59.04799
2018-12-15 03:40:00 0.6010409 59.04799
2018-12-15 03:50:00 0.6010409 59.04799
2018-12-15 04:00:00 0.6010409 59.04799
2018-12-15 04:10:00 0.6010409 59.04799
2018-12-15 04:20:00 0.6010409 59.04799
2018-12-15 04:30:00 0.6010409 59.04799
2018-12-15 04:40:00 0.6010409 59.04799
2018-12-15 04:50:00 0.6010409 59.04799
2018-12-15 05:00:00 0.6010409 59.04799
2018-12-15 05:10:00 0.6010409 59.04799
2018-12-15 05:20:00 0.6010409 59.04799
2018-12-15 05:30:00 0.6010409 59.04799
2018-12-15 05:40:00 0.6010409 59.04799
2018-12-15 05:50:00 0.6010409 59.04799
2018-12-15 06:00:00 0.6010409 59.04799
2018-12-15 06:10:00 0.6010409 59.04799
2018-12-15 06:20:00 0.6010409 59.04799
2018-12-15 06:30:00 0.6010409 59.04799
2018-12-15 06:40:00 0.6010409 59.04799
2018-12-15 06:50:00 0.6010409 59.04799
2018-12-15 07:00:00 0.6010409 59.04799
2018-12-15 07:10:00 0.6010409 59.04799
2018-12-15 07:20:00 0.6010409 59.04799
2018-12-15 07:30:00 0.6010409 59.04799
2018-12-15 07:40:00 0.6010409 59.04799
2018-12-15 07:50:00 0.6010409 59.04799
2018-12-15 08:00:00 0.6010409 59.04799
2018-12-15 08:10:00 0.6010409 59.04799
2018-12-15 08:20:00 0.6010409 59.04799
2018-12-15 08:30:00 0.6010409 59.04799
2018-12-15 08:40:00 0.6010409 59.04799
2018-12-15 08:50:00 0.6010409 59.04799
2018-12-15 09:00:00 0.6010409 59.04799
2018-12-15 09:10:00 0.6010409 59.04799
2018-12-15 09:20:00 0.6010409 59.04799
2018-12-15 09:30:00 0.6010409 59.04799
2018-12-15 09:40:00 0.6010409 59.04799
2018-12-15 09:50:00 0.6010409 59.04799
2018-12-15 10:00:00 0.6010409 59.04799
2018-12-15 10:10:00 0.6010409 59.04799
2018-12-15 10:20:00 0.6010409 59.04799
2018-12-15 10:30:00 0.6010409 59.04799
2018-12-15 10:40:00 0.4364638 59.04799
2018-12-15 10:50:00 0.4496355 59.04799
2018-12-15 11:00:00 0.4496355 59.04799
2018-12-15 11:10:00 0.4496355 59.04799
2018-12-15 11:20:00 0.6010409 59.04799
2018-12-15 11:30:00 0.6010409 59.04799
2018-12-15 11:40:00 0.6010409 59.04799
2018-12-15 11:50:00 0.6010409 59.04799
2018-12-15 12:00:00 0.6010409 59.04799
2018-12-15 12:10:00 0.6010409 59.04799
2018-12-15 12:20:00 0.6010409 59.04799
2018-12-15 12:30:00 0.6010409 59.04799
2018-12-15 12:40:00 0.6010409 59.04799
2018-12-15 12:50:00 0.6010409 59.04799
2018-12-15 13:00:00 0.6010409 59.04799
2018-12-15 13:10:00 0.6010409 59.04799
2018-12-15 13:20:00 0.6010409 59.04799
2018-12-15 13:30:00 0.6010409 59.04799
2018-12-15 13:40:00 0.6010409 59.04799
2018-12-15 13:50:00 0.6010409 59.04799
2018-12-15 14:00:00 0.6010409 59.04799
2018-12-15 14:10:00 0.6010409 59.04799
2018-12-15 14:20:00 0.6010409 59.04799
2018-12-15 14:30:00 0.6010409 59.04799
2018-12-15 14:40:00 0.6010409 59.04799
2018-12-15 14:50:00 0.6010409 59.04799
2018-12-15 15:00:00 0.6010409 59.04799
2018-12-15 15:10:00 0.6010409 59.04799
2018-12-15 15:20:00 0.6010409 59.04799
2018-12-15 15:30:00 0.6010409 59.04799
2018-12-15 15:40:00 0.6010409 59.04799
2018-12-15 15:50:00 0.6010409 59.04799
2018-12-15 16:00:00 0.6010409 59.04799
2018-12-15 16:10:00 0.6010409 59.04799
2018-12-15 16:20:00 0.6010409 59.04799
2018-12-15 16:30:00 0.6010409 59.04799
2018-12-15 16:40:00 0.6010409 59.04799
2018-12-15 16:50:00 0.6010409 59.04799
2018-12-15 17:00:00 0.6010409 59.04799
2018-12-15 17:10:00 0.6010409 59.04799
2018-12-15 17:20:00 0.6010409 59.04799
2018-12-15 17:30:00 0.6010409 59.04799
2018-12-15 17:40:00 0.6010409 59.04799
2018-12-15 17:50:00 0.6010409 59.04799
2018-12-15 18:00:00 0.6010409 59.04799
2018-12-15 18:10:00 0.6010409 59.04799
2018-12-15 18:20:00 0.6010409 59.04799
2018-12-15 18:30:00 0.6010409 59.04799
2018-12-15 18:40:00 0.6010409 59.04799
2018-12-15 18:50:00 0.6010409 59.04799
2018-12-15 19:00:00 0.6010409 59.04799
2018-12-15 19:10:00 0.6010409 59.04799
2018-12-15 19:20:00 0.6010409 59.04799
2018-12-15 19:30:00 0.6010409 59.04799
2018-12-15 19:40:00 0.6010409 59.04799
2018-12-15 19:50:00 0.6010409 59.04799
2018-12-15 20:00:00 0.6010409 59.04799
2018-12-15 20:10:00 0.6010409 59.04799
2018-12-15 20:20:00 0.6010409 59.04799
2018-12-15 20:30:00 0.6010409 59.04799
2018-12-15 20:40:00 0.6010409 59.04799
2018-12-15 20:50:00 0.6010409 59.04799
2018-12-15 21:00:00 0.6010409 59.04799
2018-12-15 21:10:00 0.6010409 59.04799
2018-12-15 21:20:00 0.6010409 59.04799
2018-12-15 21:30:00 0.6010409 59.04799
2018-12-15 21:40:00 0.6010409 59.04799
2018-12-15 21:50:00 0.6010409 59.04799
2018-12-15 22:00:00 0.6010409 59.04799
2018-12-15 22:10:00 0.6010409 59.04799
2018-12-15 22:20:00 0.6010409 59.04799
2018-12-15 22:30:00 0.6010409 59.04799
2018-12-15 22:40:00 0.6010409 59.04799
2018-12-15 22:50:00 0.6010409 59.04799
2018-12-15 23:00:00 0.6010409 59.04799
2018-12-15 23:10:00 0.6010409 59.04799
2018-12-15 23:20:00 0.6010409 59.04799
2018-12-15 23:30:00 0.6010409 59.04799
2018-12-15 23:40:00 0.6010409 59.04799
2018-12-15 23:50:00 0.6010409 59.04799

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the O3 concentration network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfO34$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfO34b<-merge(dfO34, by10, by='by10', all.x=TRUE)

for(i in unique(dfO34b$by10)){

subset <- 
      dfO34b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT O3 concentration Network on 2012-12-15,

  • Score 3 = 20.6 : At any given 10-minute time interval in any given node, an average 20.6% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 59.0 : At any given 10-minute time interval in any given node, reliable data is collected for an average 59.0% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

SO2 Concentration

Constructing Score 3

Before we calculate the relevant proportions, it is useful to observe how the absolute number of nodes collecting reliable data in the network varies across the different time-intervals of the day. The figure below shows how the number of nodes collecting reliable SO2 concentration data in the AoT network vary across the day’s duration.

dfSO2%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))->dfSO23

ggplot(data=dfSO23, aes(x=by10, y=count))+
  scale_x_discrete(breaks=c('2018-12-15 00:00:00', 
                            '2018-12-15 04:00:00',
                            '2018-12-15 08:00:00', 
                            '2018-12-15 12:00:00', 
                            '2018-12-15 16:00:00', 
                            '2018-12-15 20:00:00'), 
                   labels=c('00:00',
                            '04:00',
                            '08:00',
                            '12:00',
                            '16:00',
                            '20:00'
                            ))+
  geom_col(fill='indianred', col=NA)+
  ylim(0, 86)+
  geom_hline(yintercept=86, col='black', size=1)+
  geom_hline(yintercept=c(1:85), col='white')+
  geom_vline(aes(xintercept=as.numeric(by10)), col='white', size=2)+
  geom_hline(aes(yintercept=mean(count)), col='black', size=1)+
  geom_text(aes(y=86, x=0), label='Full network size: 86 nodes', size=4, hjust=-1, vjust=-1)+
  geom_text(aes(y=mean(count), x=0), label=paste('Average network size:', round(mean(dfSO23$count)),  'nodes', sep=""), size=4, hjust=-1, vjust=-1)+
  labs(x='Time', y='Number of Active Nodes', 
       title='Number of nodes collecting reliable SO2 concentration data throughout the day',
       subtitle='Each x-axis tick represents a 10-minute time interval')+
  theme(plot.title=element_text(face='bold', size=20),
        text=element_text(size=20),
        legend.position = 'bottom', 
        axis.text.x=element_text(angle=90, hjust=1))

The table below shows the number of nodes collecting reliable SO2 concentration data at each time interval during the day, the proportion of these active nodes in relation to the full network of 86 nodes, and the average proportion during the full day. This average proportion is Score 3.

dfSO23%>%
  arrange(by10)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
by10 count propActive Score 3
2018-12-15 00:00:00 12 13.95349 14.76906
2018-12-15 00:10:00 12 13.95349 14.76906
2018-12-15 00:20:00 13 15.11628 14.76906
2018-12-15 00:30:00 13 15.11628 14.76906
2018-12-15 00:40:00 13 15.11628 14.76906
2018-12-15 00:50:00 13 15.11628 14.76906
2018-12-15 01:00:00 12 13.95349 14.76906
2018-12-15 01:10:00 12 13.95349 14.76906
2018-12-15 01:20:00 12 13.95349 14.76906
2018-12-15 01:30:00 12 13.95349 14.76906
2018-12-15 01:40:00 12 13.95349 14.76906
2018-12-15 01:50:00 12 13.95349 14.76906
2018-12-15 02:00:00 12 13.95349 14.76906
2018-12-15 02:10:00 12 13.95349 14.76906
2018-12-15 02:20:00 12 13.95349 14.76906
2018-12-15 02:30:00 12 13.95349 14.76906
2018-12-15 02:40:00 12 13.95349 14.76906
2018-12-15 02:50:00 12 13.95349 14.76906
2018-12-15 03:00:00 12 13.95349 14.76906
2018-12-15 03:10:00 13 15.11628 14.76906
2018-12-15 03:20:00 12 13.95349 14.76906
2018-12-15 03:30:00 13 15.11628 14.76906
2018-12-15 03:40:00 13 15.11628 14.76906
2018-12-15 03:50:00 12 13.95349 14.76906
2018-12-15 04:00:00 12 13.95349 14.76906
2018-12-15 04:10:00 12 13.95349 14.76906
2018-12-15 04:20:00 13 15.11628 14.76906
2018-12-15 04:30:00 12 13.95349 14.76906
2018-12-15 04:40:00 12 13.95349 14.76906
2018-12-15 04:50:00 13 15.11628 14.76906
2018-12-15 05:00:00 12 13.95349 14.76906
2018-12-15 05:10:00 13 15.11628 14.76906
2018-12-15 05:20:00 12 13.95349 14.76906
2018-12-15 05:30:00 13 15.11628 14.76906
2018-12-15 05:40:00 13 15.11628 14.76906
2018-12-15 05:50:00 13 15.11628 14.76906
2018-12-15 06:00:00 13 15.11628 14.76906
2018-12-15 06:10:00 13 15.11628 14.76906
2018-12-15 06:20:00 13 15.11628 14.76906
2018-12-15 06:30:00 13 15.11628 14.76906
2018-12-15 06:40:00 13 15.11628 14.76906
2018-12-15 06:50:00 12 13.95349 14.76906
2018-12-15 07:00:00 13 15.11628 14.76906
2018-12-15 07:10:00 12 13.95349 14.76906
2018-12-15 07:20:00 13 15.11628 14.76906
2018-12-15 07:30:00 12 13.95349 14.76906
2018-12-15 07:40:00 12 13.95349 14.76906
2018-12-15 07:50:00 12 13.95349 14.76906
2018-12-15 08:00:00 12 13.95349 14.76906
2018-12-15 08:10:00 12 13.95349 14.76906
2018-12-15 08:20:00 12 13.95349 14.76906
2018-12-15 08:30:00 12 13.95349 14.76906
2018-12-15 08:40:00 12 13.95349 14.76906
2018-12-15 08:50:00 13 15.11628 14.76906
2018-12-15 09:00:00 12 13.95349 14.76906
2018-12-15 09:10:00 12 13.95349 14.76906
2018-12-15 09:20:00 11 12.79070 14.76906
2018-12-15 09:30:00 11 12.79070 14.76906
2018-12-15 09:40:00 11 12.79070 14.76906
2018-12-15 09:50:00 11 12.79070 14.76906
2018-12-15 10:00:00 11 12.79070 14.76906
2018-12-15 10:10:00 10 11.62791 14.76906
2018-12-15 10:20:00 12 13.95349 14.76906
2018-12-15 10:30:00 10 11.62791 14.76906
2018-12-15 10:40:00 11 12.79070 14.76906
2018-12-15 10:50:00 11 12.79070 14.76906
2018-12-15 11:00:00 11 12.79070 14.76906
2018-12-15 11:10:00 11 12.79070 14.76906
2018-12-15 11:20:00 11 12.79070 14.76906
2018-12-15 11:30:00 11 12.79070 14.76906
2018-12-15 11:40:00 12 13.95349 14.76906
2018-12-15 11:50:00 12 13.95349 14.76906
2018-12-15 12:00:00 12 13.95349 14.76906
2018-12-15 12:10:00 13 15.11628 14.76906
2018-12-15 12:20:00 12 13.95349 14.76906
2018-12-15 12:30:00 12 13.95349 14.76906
2018-12-15 12:40:00 11 12.79070 14.76906
2018-12-15 12:50:00 11 12.79070 14.76906
2018-12-15 13:00:00 11 12.79070 14.76906
2018-12-15 13:10:00 13 15.11628 14.76906
2018-12-15 13:20:00 12 13.95349 14.76906
2018-12-15 13:30:00 13 15.11628 14.76906
2018-12-15 13:40:00 13 15.11628 14.76906
2018-12-15 13:50:00 13 15.11628 14.76906
2018-12-15 14:00:00 13 15.11628 14.76906
2018-12-15 14:10:00 12 13.95349 14.76906
2018-12-15 14:20:00 14 16.27907 14.76906
2018-12-15 14:30:00 13 15.11628 14.76906
2018-12-15 14:40:00 13 15.11628 14.76906
2018-12-15 14:50:00 14 16.27907 14.76906
2018-12-15 15:00:00 11 12.79070 14.76906
2018-12-15 15:10:00 12 13.95349 14.76906
2018-12-15 15:20:00 13 15.11628 14.76906
2018-12-15 15:30:00 12 13.95349 14.76906
2018-12-15 15:40:00 13 15.11628 14.76906
2018-12-15 15:50:00 13 15.11628 14.76906
2018-12-15 16:00:00 13 15.11628 14.76906
2018-12-15 16:10:00 13 15.11628 14.76906
2018-12-15 16:20:00 13 15.11628 14.76906
2018-12-15 16:30:00 13 15.11628 14.76906
2018-12-15 16:40:00 13 15.11628 14.76906
2018-12-15 16:50:00 14 16.27907 14.76906
2018-12-15 17:00:00 14 16.27907 14.76906
2018-12-15 17:10:00 14 16.27907 14.76906
2018-12-15 17:20:00 14 16.27907 14.76906
2018-12-15 17:30:00 14 16.27907 14.76906
2018-12-15 17:40:00 14 16.27907 14.76906
2018-12-15 17:50:00 14 16.27907 14.76906
2018-12-15 18:00:00 14 16.27907 14.76906
2018-12-15 18:10:00 14 16.27907 14.76906
2018-12-15 18:20:00 14 16.27907 14.76906
2018-12-15 18:30:00 14 16.27907 14.76906
2018-12-15 18:40:00 14 16.27907 14.76906
2018-12-15 18:50:00 14 16.27907 14.76906
2018-12-15 19:00:00 14 16.27907 14.76906
2018-12-15 19:10:00 14 16.27907 14.76906
2018-12-15 19:20:00 14 16.27907 14.76906
2018-12-15 19:30:00 14 16.27907 14.76906
2018-12-15 19:40:00 13 15.11628 14.76906
2018-12-15 19:50:00 14 16.27907 14.76906
2018-12-15 20:00:00 14 16.27907 14.76906
2018-12-15 20:10:00 14 16.27907 14.76906
2018-12-15 20:20:00 14 16.27907 14.76906
2018-12-15 20:30:00 14 16.27907 14.76906
2018-12-15 20:40:00 14 16.27907 14.76906
2018-12-15 20:50:00 13 15.11628 14.76906
2018-12-15 21:00:00 14 16.27907 14.76906
2018-12-15 21:10:00 14 16.27907 14.76906
2018-12-15 21:20:00 15 17.44186 14.76906
2018-12-15 21:30:00 14 16.27907 14.76906
2018-12-15 21:40:00 14 16.27907 14.76906
2018-12-15 21:50:00 14 16.27907 14.76906
2018-12-15 22:00:00 15 17.44186 14.76906
2018-12-15 22:10:00 14 16.27907 14.76906
2018-12-15 22:20:00 15 17.44186 14.76906
2018-12-15 22:30:00 14 16.27907 14.76906
2018-12-15 22:40:00 13 15.11628 14.76906
2018-12-15 22:50:00 13 15.11628 14.76906
2018-12-15 23:00:00 14 16.27907 14.76906
2018-12-15 23:10:00 12 13.95349 14.76906
2018-12-15 23:20:00 12 13.95349 14.76906
2018-12-15 23:30:00 13 15.11628 14.76906
2018-12-15 23:40:00 13 15.11628 14.76906
2018-12-15 23:50:00 14 16.27907 14.76906

The density plot below shows the distribution of propActive (Proportion of Active Nodes) recorded at each time-interval of the day relative to the the network average - this average is taken as Score 3, which represents the average proportion of network active of the AoT network for SO2 concentration data on 2012-12-15.

dfSO23%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 3`), size = 1)+
  geom_text(aes(x= `Score 3`, y=0), label='Score 3:\n Average Proportion of\nNetwork Active', size = 4, vjust= -2, hjust=1.4)+
  labs(x='Proportions of active nodes',
       y='Density',
       title='Distribution of Proportions of Active Nodes')+
  xlim(0, 100)+
  plotTheme()

Constructing Score 4

To calculate the average proportion of Chicago area covered by the distribution of active nodes, we first begin by extracting the latitude and longitude locations of these nodes. This is done by obtaining all the locations at which every reliable data point is recorded, and compiling the unique latitude and longitude locations from this list.

The table below shows this result. In the table, the 41 latitude and longitude locations of the 41 nodes active at 12 midnight on 2012-12-15 are listed.

dfSO2%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> dfSO24

dfSO24%>%
  arrange(by10)%>%
  head(41)%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped'))%>%
  scroll_box(height='300px')
by10 node_id lat lon
2018-12-15 00:00:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:00:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:00:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:00:00 001e06113107 41.75114 -87.71299
2018-12-15 00:00:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:00:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:00:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:00:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:00:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:00:00 001e06114503 41.66608 -87.53937
2018-12-15 00:00:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:00:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:10:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:10:00 001e06113107 41.75114 -87.71299
2018-12-15 00:10:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:10:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:10:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:10:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:10:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:10:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:10:00 001e06114503 41.66608 -87.53937
2018-12-15 00:10:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:10:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:10:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:20:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:20:00 001e0610ba15 41.72246 -87.57535
2018-12-15 00:20:00 001e0610e537 41.96162 -87.66595
2018-12-15 00:20:00 001e06113cf1 41.88469 -87.62786
2018-12-15 00:20:00 001e061144c0 41.76412 -87.72242
2018-12-15 00:20:00 001e061146bc 41.91873 -87.66826
2018-12-15 00:20:00 001e06113107 41.75114 -87.71299
2018-12-15 00:20:00 001e06114503 41.66608 -87.53937
2018-12-15 00:20:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:20:00 001e0610ee43 41.78861 -87.59871
2018-12-15 00:20:00 001e0610f6db 41.79133 -87.59868
2018-12-15 00:20:00 001e0610ba46 41.87838 -87.62768
2018-12-15 00:20:00 001e06114fd4 41.79448 -87.61596
2018-12-15 00:30:00 001e0610ba13 41.75124 -87.71299
2018-12-15 00:30:00 001e06113107 41.75114 -87.71299
2018-12-15 00:30:00 001e0610f05c 41.92490 -87.68770
2018-12-15 00:30:00 001e0610ba15 41.72246 -87.57535

Using the latitude and longitude locations, we can obtain a point distribution of the active nodes in the network at any time-interval. However, as we are interested in the area of coverage by these nodes instead of their point locations, we construct a spatial bounding box around the active node points at every time interval. The ratio of the area of this spatial bounding box to the whole area of Chicago indicates the proportion of Chicago area covered (AreaProp) for each time interval. To find the average proportion of Chicago area covered (Score 4), the mean of the proportions calculated for each time interval is obtained. This is all presented in the table below.

chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

dfSO24a<-NULL

for(i in unique(dfSO24$by10)){
  
subset <- 
      dfSO24%>%
      filter(by10==i)

if(nrow(subset)>1){
  subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
  subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
  P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    dfSO24a<-rbind(dfSO24a, df1)
}else{
  df1<-NULL
  df1$by10<-i
  df1$AreaProp<-0
  df1<-as.data.frame(df1)
  dfSO24a<-rbind(dfSO24a, df1)
}
  
}

dfSO24a%>%
  mutate(`Score 4`= 100*mean(AreaProp))->dfSO24a
dfSO24a%>%
  head(144)%>%
  kable()%>%
  kable_styling(bootstrap_options = 'striped')%>%
  scroll_box(height = "300px")
by10 AreaProp Score 4
2018-12-15 00:00:00 0.5961605 57.4354
2018-12-15 00:10:00 0.5961605 57.4354
2018-12-15 00:20:00 0.5961605 57.4354
2018-12-15 00:30:00 0.5961605 57.4354
2018-12-15 00:40:00 0.5961605 57.4354
2018-12-15 00:50:00 0.5961605 57.4354
2018-12-15 01:00:00 0.5961605 57.4354
2018-12-15 01:10:00 0.5961605 57.4354
2018-12-15 01:20:00 0.5595557 57.4354
2018-12-15 01:30:00 0.5595557 57.4354
2018-12-15 01:40:00 0.5595557 57.4354
2018-12-15 01:50:00 0.5595557 57.4354
2018-12-15 02:00:00 0.5595557 57.4354
2018-12-15 02:10:00 0.5595557 57.4354
2018-12-15 02:20:00 0.5595557 57.4354
2018-12-15 02:30:00 0.5595557 57.4354
2018-12-15 02:40:00 0.5595557 57.4354
2018-12-15 02:50:00 0.5595557 57.4354
2018-12-15 03:00:00 0.5595557 57.4354
2018-12-15 03:10:00 0.5961605 57.4354
2018-12-15 03:20:00 0.5961605 57.4354
2018-12-15 03:30:00 0.5961605 57.4354
2018-12-15 03:40:00 0.5961605 57.4354
2018-12-15 03:50:00 0.5961605 57.4354
2018-12-15 04:00:00 0.5961605 57.4354
2018-12-15 04:10:00 0.5961605 57.4354
2018-12-15 04:20:00 0.5961605 57.4354
2018-12-15 04:30:00 0.5961605 57.4354
2018-12-15 04:40:00 0.5961605 57.4354
2018-12-15 04:50:00 0.5961605 57.4354
2018-12-15 05:00:00 0.5961605 57.4354
2018-12-15 05:10:00 0.5961605 57.4354
2018-12-15 05:20:00 0.5961605 57.4354
2018-12-15 05:30:00 0.5961605 57.4354
2018-12-15 05:40:00 0.5961605 57.4354
2018-12-15 05:50:00 0.5961605 57.4354
2018-12-15 06:00:00 0.5961605 57.4354
2018-12-15 06:10:00 0.5961605 57.4354
2018-12-15 06:20:00 0.5961605 57.4354
2018-12-15 06:30:00 0.5961605 57.4354
2018-12-15 06:40:00 0.5961605 57.4354
2018-12-15 06:50:00 0.5961605 57.4354
2018-12-15 07:00:00 0.5961605 57.4354
2018-12-15 07:10:00 0.5961605 57.4354
2018-12-15 07:20:00 0.5961605 57.4354
2018-12-15 07:30:00 0.5961605 57.4354
2018-12-15 07:40:00 0.5961605 57.4354
2018-12-15 07:50:00 0.5961605 57.4354
2018-12-15 08:00:00 0.5961605 57.4354
2018-12-15 08:10:00 0.5961605 57.4354
2018-12-15 08:20:00 0.5961605 57.4354
2018-12-15 08:30:00 0.5961605 57.4354
2018-12-15 08:40:00 0.5961605 57.4354
2018-12-15 08:50:00 0.5961605 57.4354
2018-12-15 09:00:00 0.5961605 57.4354
2018-12-15 09:10:00 0.5961605 57.4354
2018-12-15 09:20:00 0.5595557 57.4354
2018-12-15 09:30:00 0.5595557 57.4354
2018-12-15 09:40:00 0.5595557 57.4354
2018-12-15 09:50:00 0.5595557 57.4354
2018-12-15 10:00:00 0.5595557 57.4354
2018-12-15 10:10:00 0.3996714 57.4354
2018-12-15 10:20:00 0.4315833 57.4354
2018-12-15 10:30:00 0.3996714 57.4354
2018-12-15 10:40:00 0.5595557 57.4354
2018-12-15 10:50:00 0.5595557 57.4354
2018-12-15 11:00:00 0.5595557 57.4354
2018-12-15 11:10:00 0.5595557 57.4354
2018-12-15 11:20:00 0.5595557 57.4354
2018-12-15 11:30:00 0.5595557 57.4354
2018-12-15 11:40:00 0.5595557 57.4354
2018-12-15 11:50:00 0.5961605 57.4354
2018-12-15 12:00:00 0.5961605 57.4354
2018-12-15 12:10:00 0.5961605 57.4354
2018-12-15 12:20:00 0.5961605 57.4354
2018-12-15 12:30:00 0.5961605 57.4354
2018-12-15 12:40:00 0.4315833 57.4354
2018-12-15 12:50:00 0.4315833 57.4354
2018-12-15 13:00:00 0.4315833 57.4354
2018-12-15 13:10:00 0.5961605 57.4354
2018-12-15 13:20:00 0.4315833 57.4354
2018-12-15 13:30:00 0.5961605 57.4354
2018-12-15 13:40:00 0.5961605 57.4354
2018-12-15 13:50:00 0.5961605 57.4354
2018-12-15 14:00:00 0.5961605 57.4354
2018-12-15 14:10:00 0.4315833 57.4354
2018-12-15 14:20:00 0.6010409 57.4354
2018-12-15 14:30:00 0.5961605 57.4354
2018-12-15 14:40:00 0.5961605 57.4354
2018-12-15 14:50:00 0.6010409 57.4354
2018-12-15 15:00:00 0.4315833 57.4354
2018-12-15 15:10:00 0.5961605 57.4354
2018-12-15 15:20:00 0.6010409 57.4354
2018-12-15 15:30:00 0.5961605 57.4354
2018-12-15 15:40:00 0.6010409 57.4354
2018-12-15 15:50:00 0.6010409 57.4354
2018-12-15 16:00:00 0.6010409 57.4354
2018-12-15 16:10:00 0.6010409 57.4354
2018-12-15 16:20:00 0.6010409 57.4354
2018-12-15 16:30:00 0.6010409 57.4354
2018-12-15 16:40:00 0.6010409 57.4354
2018-12-15 16:50:00 0.6010409 57.4354
2018-12-15 17:00:00 0.6010409 57.4354
2018-12-15 17:10:00 0.6010409 57.4354
2018-12-15 17:20:00 0.6010409 57.4354
2018-12-15 17:30:00 0.6010409 57.4354
2018-12-15 17:40:00 0.6010409 57.4354
2018-12-15 17:50:00 0.6010409 57.4354
2018-12-15 18:00:00 0.6010409 57.4354
2018-12-15 18:10:00 0.6010409 57.4354
2018-12-15 18:20:00 0.6010409 57.4354
2018-12-15 18:30:00 0.6010409 57.4354
2018-12-15 18:40:00 0.6010409 57.4354
2018-12-15 18:50:00 0.6010409 57.4354
2018-12-15 19:00:00 0.6010409 57.4354
2018-12-15 19:10:00 0.6010409 57.4354
2018-12-15 19:20:00 0.6010409 57.4354
2018-12-15 19:30:00 0.6010409 57.4354
2018-12-15 19:40:00 0.5961605 57.4354
2018-12-15 19:50:00 0.6010409 57.4354
2018-12-15 20:00:00 0.6010409 57.4354
2018-12-15 20:10:00 0.6010409 57.4354
2018-12-15 20:20:00 0.6010409 57.4354
2018-12-15 20:30:00 0.6010409 57.4354
2018-12-15 20:40:00 0.6010409 57.4354
2018-12-15 20:50:00 0.4364638 57.4354
2018-12-15 21:00:00 0.6010409 57.4354
2018-12-15 21:10:00 0.6010409 57.4354
2018-12-15 21:20:00 0.6010409 57.4354
2018-12-15 21:30:00 0.6010409 57.4354
2018-12-15 21:40:00 0.6010409 57.4354
2018-12-15 21:50:00 0.6010409 57.4354
2018-12-15 22:00:00 0.6010409 57.4354
2018-12-15 22:10:00 0.6010409 57.4354
2018-12-15 22:20:00 0.6010409 57.4354
2018-12-15 22:30:00 0.6010409 57.4354
2018-12-15 22:40:00 0.4364638 57.4354
2018-12-15 22:50:00 0.4364638 57.4354
2018-12-15 23:00:00 0.6010409 57.4354
2018-12-15 23:10:00 0.4315833 57.4354
2018-12-15 23:20:00 0.4315833 57.4354
2018-12-15 23:30:00 0.5961605 57.4354
2018-12-15 23:40:00 0.4315833 57.4354
2018-12-15 23:50:00 0.6010409 57.4354

We can also plot out the area covered relative to the whole of Chicago for a visual observation. The 4 plots below are obtained for 00:00 (top left), 07:00 (top right), 13:00(bottom left), and 19:00(bottom right). In the case of the SO2 concentration network on this day, the area covered remained constant throughout the day - the 4 plots are therefore identical.

p<-list()

by10<-as.data.frame(unique(dfSO24$by10))
by10$no<-row_number(by10)
colnames(by10)<-c('by10', 'no')

dfSO24b<-merge(dfSO24, by10, by='by10', all.x=TRUE)

for(i in unique(dfSO24b$by10)){

subset <- 
      dfSO24b%>%
      filter(by10==i)
    
subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    
P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    
Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))

#clip using chicago
Ps2<-gIntersection(Ps2, chig, byid=FALSE)

j<-unique(subset$no)

p[[j]]<-spplot(chig, colorkey=FALSE, col.regions='red', 
       sp.layout=list(list(Ps2, fill='blue', first=FALSE)))
}

library(gridExtra)
grid.arrange(p[[1]], p[[7]], p[[13]], p[[19]])

In summary, for the AoT SO2 concentration Network on 2012-12-15,

  • Score 3 = 14.8 : At any given 10-minute time interval in any given node, an average 14.8% of the nodes in the network is collecting reliable data. This is a low score.
  • Score 4 = 57.4 : At any given 10-minute time interval in any given node, reliable data is collected for an average 57.4% of Chicago’s area. Interpreting this in consideration of Score 3, this indicates that the low number of nodes is dispersed apart from one another in Chicago.

3.5 Scoring Temporal Reliability

In this section, the method of scoring Temporal Reliability is presented for each data parameter type for the day of 2012-12-15. There is 1 score obtained for this criteria. In summary, this section determines the number of 10-minute time intervals during which reliable data is collected for each node, and scores it as a proportion of the whole day, which consists of 144 of such time-intervals. The average proportion of day-duration node is active (Score 5) is the mean-average of the proportions calculated for all the nodes in the network.

The flowchart below illustrates the scoring process in this section:

Click on the tabs below to view the scores constructed for each data parameter.

Temperature

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfTemp%>%
  filter(val_qual==1)%>%
  group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values='cornflowerblue',
                      name="Collecting Reliable Data",
                      labels=c("Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfTemp%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfTemp5

dfTemp5%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 144 24.00000 100.00000 99.15312
001e0610ba15 144 24.00000 100.00000 99.15312
001e0610ba46 144 24.00000 100.00000 99.15312
001e0610bbf9 144 24.00000 100.00000 99.15312
001e0610bc10 144 24.00000 100.00000 99.15312
001e0610bc12 144 24.00000 100.00000 99.15312
001e0610e532 144 24.00000 100.00000 99.15312
001e0610e537 144 24.00000 100.00000 99.15312
001e0610e538 134 22.33333 93.05556 99.15312
001e0610e835 144 24.00000 100.00000 99.15312
001e0610ee33 144 24.00000 100.00000 99.15312
001e0610ee36 144 24.00000 100.00000 99.15312
001e0610ee43 144 24.00000 100.00000 99.15312
001e0610ee5d 144 24.00000 100.00000 99.15312
001e0610eef2 144 24.00000 100.00000 99.15312
001e0610eef4 144 24.00000 100.00000 99.15312
001e0610ef27 144 24.00000 100.00000 99.15312
001e0610f05c 144 24.00000 100.00000 99.15312
001e0610f6db 144 24.00000 100.00000 99.15312
001e0610f703 144 24.00000 100.00000 99.15312
001e0610f732 144 24.00000 100.00000 99.15312
001e0610f8f4 144 24.00000 100.00000 99.15312
001e0610fb4c 144 24.00000 100.00000 99.15312
001e061130f4 144 24.00000 100.00000 99.15312
001e06113107 133 22.16667 92.36111 99.15312
001e061135cb 144 24.00000 100.00000 99.15312
001e06113a48 144 24.00000 100.00000 99.15312
001e06113ace 144 24.00000 100.00000 99.15312
001e06113cf1 144 24.00000 100.00000 99.15312
001e06113d22 134 22.33333 93.05556 99.15312
001e06113dbc 144 24.00000 100.00000 99.15312
001e06113f54 144 24.00000 100.00000 99.15312
001e061144c0 133 22.16667 92.36111 99.15312
001e06114500 136 22.66667 94.44444 99.15312
001e06114503 144 24.00000 100.00000 99.15312
001e0611462f 144 24.00000 100.00000 99.15312
001e061146ba 144 24.00000 100.00000 99.15312
001e061146bc 144 24.00000 100.00000 99.15312
001e06114fd4 144 24.00000 100.00000 99.15312
001e0611536c 144 24.00000 100.00000 99.15312
001e0611537d 144 24.00000 100.00000 99.15312

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for temperature data on 2012-12-15.

dfTemp5%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT Temperature Network on 2012-12-15,

  • Score 5 = 99.1 : In any given node during the day, reliable data is collected for an average 99.1% of the time.

Humidity

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfHumidity%>%
 group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfHumidity%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfHumidity5
## Adding missing grouping variables: `date`
dfHumidity5%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 144 24.00000 100.00000 99.57562
001e0610ba46 144 24.00000 100.00000 99.57562
001e0610bbf9 144 24.00000 100.00000 99.57562
001e0610bc12 144 24.00000 100.00000 99.57562
001e0610e532 144 24.00000 100.00000 99.57562
001e0610e537 144 24.00000 100.00000 99.57562
001e0610ee33 144 24.00000 100.00000 99.57562
001e0610ee36 144 24.00000 100.00000 99.57562
001e0610ee43 144 24.00000 100.00000 99.57562
001e0610ee5d 144 24.00000 100.00000 99.57562
001e0610f6db 144 24.00000 100.00000 99.57562
001e0610f732 144 24.00000 100.00000 99.57562
001e061130f4 144 24.00000 100.00000 99.57562
001e06113107 133 22.16667 92.36111 99.57562
001e06113a48 144 24.00000 100.00000 99.57562
001e06113cf1 144 24.00000 100.00000 99.57562
001e06113dbc 144 24.00000 100.00000 99.57562
001e06113f54 144 24.00000 100.00000 99.57562

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for humidity data on 2012-12-15.

dfHumidity5%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT humidity Network on 2012-12-15,

  • Score 5 = 99.6 : In any given node during the day, reliable data is collected for an average 99.6% of the time.

Pressure

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfPressure%>%
  group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfPressure%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfPressure5
## Adding missing grouping variables: `date`
dfPressure5%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 144 24.00000 100.00000 99.07407
001e0610ba15 144 24.00000 100.00000 99.07407
001e0610ba46 144 24.00000 100.00000 99.07407
001e0610bbf9 144 24.00000 100.00000 99.07407
001e0610bc10 144 24.00000 100.00000 99.07407
001e0610e532 144 24.00000 100.00000 99.07407
001e0610e537 144 24.00000 100.00000 99.07407
001e0610e538 134 22.33333 93.05556 99.07407
001e0610ee33 144 24.00000 100.00000 99.07407
001e0610ee36 144 24.00000 100.00000 99.07407
001e0610ee43 144 24.00000 100.00000 99.07407
001e0610ee5d 144 24.00000 100.00000 99.07407
001e0610eef4 144 24.00000 100.00000 99.07407
001e0610f05c 144 24.00000 100.00000 99.07407
001e0610f6db 144 24.00000 100.00000 99.07407
001e0610f732 144 24.00000 100.00000 99.07407
001e061130f4 144 24.00000 100.00000 99.07407
001e06113107 133 22.16667 92.36111 99.07407
001e06113a48 144 24.00000 100.00000 99.07407
001e06113cf1 144 24.00000 100.00000 99.07407
001e06113dbc 144 24.00000 100.00000 99.07407
001e06113f54 144 24.00000 100.00000 99.07407
001e061144c0 133 22.16667 92.36111 99.07407
001e0611537d 144 24.00000 100.00000 99.07407

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for Pressure data on 2012-12-15.

dfPressure5%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT Pressure Network on 2012-12-15,

  • Score 5 = 99.1 : In any given node during the day, reliable data is collected for an average 99.1% of the time.

PM2.5 Concentration

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfPM25%>%
  group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfPM25%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfPM255

dfPM255%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610bc10 68 11.33333 47.22222 69.79167
001e06113107 133 22.16667 92.36111 69.79167

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for PM 2.5 Concentration data on 2012-12-15.

dfPM255%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT PM 2.5 Concentration Network on 2012-12-15,

  • Score 5 = 69.8 : In any given node during the day, reliable data is collected for an average 69.8% of the time.

CO Concentration

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfCO%>%
  group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfCO%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfCO5

dfCO5%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 127 21.166667 88.19444 81.79012
001e0610ba15 58 9.666667 40.27778 81.79012
001e0610ba46 44 7.333333 30.55556 81.79012
001e0610bc10 134 22.333333 93.05556 81.79012
001e0610e537 134 22.333333 93.05556 81.79012
001e0610ee43 142 23.666667 98.61111 81.79012
001e0610eef2 144 24.000000 100.00000 81.79012
001e0610f05c 137 22.833333 95.13889 81.79012
001e0610f6db 94 15.666667 65.27778 81.79012
001e061130f4 135 22.500000 93.75000 81.79012
001e06113107 114 19.000000 79.16667 81.79012
001e06113ace 142 23.666667 98.61111 81.79012
001e06113cf1 140 23.333333 97.22222 81.79012
001e061144c0 56 9.333333 38.88889 81.79012
001e06114500 136 22.666667 94.44444 81.79012
001e06114503 105 17.500000 72.91667 81.79012
001e061146bc 141 23.500000 97.91667 81.79012
001e06114fd4 137 22.833333 95.13889 81.79012

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for CO concentration data on 2012-12-15.

dfCO5%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT CO concentration Network on 2012-12-15,

  • Score 5 = 81.8 : In any given node during the day, reliable data is collected for an average 81.8% of the time.

H2S Concentration

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfH2S%>%
  group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfH2S%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfH2S5

dfH2S5%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 144 24.000000 100.00000 94.63735
001e0610ba15 110 18.333333 76.38889 94.63735
001e0610ba46 144 24.000000 100.00000 94.63735
001e0610bc10 144 24.000000 100.00000 94.63735
001e0610e537 144 24.000000 100.00000 94.63735
001e0610ee43 144 24.000000 100.00000 94.63735
001e0610eef2 144 24.000000 100.00000 94.63735
001e0610f05c 144 24.000000 100.00000 94.63735
001e0610f6db 144 24.000000 100.00000 94.63735
001e061130f4 144 24.000000 100.00000 94.63735
001e06113107 132 22.000000 91.66667 94.63735
001e06113ace 144 24.000000 100.00000 94.63735
001e06113cf1 144 24.000000 100.00000 94.63735
001e061144c0 59 9.833333 40.97222 94.63735
001e06114500 136 22.666667 94.44444 94.63735
001e06114503 144 24.000000 100.00000 94.63735
001e061146bc 144 24.000000 100.00000 94.63735
001e06114fd4 144 24.000000 100.00000 94.63735

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for H2S Concentration data on 2012-12-15.

dfH2S5%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT H2S Concentration Network on 2012-12-15,

  • Score 5 = 94.6 : In any given node during the day, reliable data is collected for an average 94.6% of the time.

NO2 Concentration

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfNO2%>%
 group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfNO2%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfNO25

dfNO25%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 142 23.66667 98.61111 95.56327
001e0610ba15 144 24.00000 100.00000 95.56327
001e0610ba46 144 24.00000 100.00000 95.56327
001e0610bc10 127 21.16667 88.19444 95.56327
001e0610e537 78 13.00000 54.16667 95.56327
001e0610ee43 144 24.00000 100.00000 95.56327
001e0610eef2 144 24.00000 100.00000 95.56327
001e0610f05c 144 24.00000 100.00000 95.56327
001e0610f6db 144 24.00000 100.00000 95.56327
001e061130f4 144 24.00000 100.00000 95.56327
001e06113107 133 22.16667 92.36111 95.56327
001e06113ace 144 24.00000 100.00000 95.56327
001e06113cf1 144 24.00000 100.00000 95.56327
001e061144c0 133 22.16667 92.36111 95.56327
001e06114500 136 22.66667 94.44444 95.56327
001e06114503 144 24.00000 100.00000 95.56327
001e061146bc 144 24.00000 100.00000 95.56327
001e06114fd4 144 24.00000 100.00000 95.56327

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for NO2 Concentration data on 2012-12-15.

dfNO25%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT NO2 Concentration Network on 2012-12-15,

  • Score 5 = 95.6 : In any given node during the day, reliable data is collected for an average 95.6% of the time.

O3 Concentration

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfO3%>%
 group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfO3%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfO35

dfO35%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 144 24.00000 100.00000 98.37963
001e0610ba15 144 24.00000 100.00000 98.37963
001e0610ba46 144 24.00000 100.00000 98.37963
001e0610bc10 144 24.00000 100.00000 98.37963
001e0610e537 144 24.00000 100.00000 98.37963
001e0610ee43 144 24.00000 100.00000 98.37963
001e0610eef2 144 24.00000 100.00000 98.37963
001e0610f05c 144 24.00000 100.00000 98.37963
001e0610f6db 144 24.00000 100.00000 98.37963
001e061130f4 144 24.00000 100.00000 98.37963
001e06113107 133 22.16667 92.36111 98.37963
001e06113ace 144 24.00000 100.00000 98.37963
001e06113cf1 144 24.00000 100.00000 98.37963
001e061144c0 133 22.16667 92.36111 98.37963
001e06114500 131 21.83333 90.97222 98.37963
001e06114503 137 22.83333 95.13889 98.37963
001e061146bc 144 24.00000 100.00000 98.37963
001e06114fd4 144 24.00000 100.00000 98.37963

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for O3 concentration data on 2012-12-15.

dfO35%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT O3 concentration Network on 2012-12-15,

  • Score 5 = 98.4 : In any given node during the day, reliable data is collected for an average 98.4% of the time.

SO2 Concentration

Constructing Score 5

The figure below presents the node activity of each node during the day of 2012-12-15. Each tile strip represents a 10-minute time interval period. It can be observed from the figure that most of the nodes are collecting reliable data throughout all the time intervals during the day. However, some experience periods of inactivity, where no data is collected at all.

dfSO2%>%
 group_by(node_id, by10)%>%
  mutate(Active=ifelse(sum(val_qual)>0, 1, 0))%>%
  ggplot()+
  geom_tile(aes(x=by10, y=node_id, fill=as.factor(Active)), col='grey90')+
  scale_fill_manual(values=c('indianred1','cornflowerblue'),
                      name="Collecting Reliable Data",
                      labels=c("No", "Yes"))+
  labs(x = "Time", title= paste('2018-12-15 | Node activity by time')) +
  scale_x_discrete(expand = c(0, 0)) +
  scale_y_discrete(expand = c(0, 0))+
  theme(
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

The code block below translates the figure above into numerical proportions (propActive). The table then presents the duration in terms of hours for which each node is collecting reliable data (durationActive), the duration in terms of proportion of the day (propActive), and the average of these proportions (Score 5).

dfSO2%>%
  filter(val_qual==1)%>%
  select(node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))->dfSO25

dfSO25%>%
  kable()%>%
  kable_styling(bootstrap_options = c('striped', 'hover'))%>%
  scroll_box(height = "300px")
node_id duration durationActive propActive Score 5
001e0610ba13 144 24.0000000 100.000000 84.67593
001e0610ba15 144 24.0000000 100.000000 84.67593
001e0610ba46 144 24.0000000 100.000000 84.67593
001e0610e537 144 24.0000000 100.000000 84.67593
001e0610ee43 144 24.0000000 100.000000 84.67593
001e0610eef2 48 8.0000000 33.333333 84.67593
001e0610f05c 144 24.0000000 100.000000 84.67593
001e0610f6db 144 24.0000000 100.000000 84.67593
001e061130f4 4 0.6666667 2.777778 84.67593
001e06113107 133 22.1666667 92.361111 84.67593
001e06113cf1 144 24.0000000 100.000000 84.67593
001e061144c0 119 19.8333333 82.638889 84.67593
001e06114503 129 21.5000000 89.583333 84.67593
001e061146bc 144 24.0000000 100.000000 84.67593
001e06114fd4 100 16.6666667 69.444444 84.67593

The density plot below shows the distribution of propActive (Proportion of Active Duration) recorded for each node relative to the network average - this average is taken as Score 5, which represents the average proportion of day-duration node is active of the AoT network for SO2 concentration data on 2012-12-15.

dfSO25%>%
  ggplot()+
  geom_density(aes(propActive), fill='indianred', col='indianred', alpha=0.1)+
  geom_vline(aes(xintercept = `Score 5`), size = 1)+
  geom_text(aes(x= `Score 5`, y=0), label='Score 5:\nAverage Proportion of\nactive duration', size = 4, vjust= -2, hjust=1.1)+
  labs(x='Proportions of active duration', 
       y='Density',
       title='Distribution of proportions of active duration')+
  xlim(0, 100)+
  plotTheme()

In summary, for the AoT SO2 concentration Network on 2012-12-15,

  • Score 5 = 84.6 : In any given node during the day, reliable data is collected for an average 84.6% of the time.

3.6 Scoring Overall Reliability

In this section, the 5 component scores evaluating the 3 different reliability criteria are averaged to score the Overall Reliability of the AoT network for the day of 2018-12-15, for each data parameter type.

Click on the tabs below to view the overall scores for each data parameter.

Temperature

tempScore<-as.data.frame(c(mean(dfTemp1$NodeMeanX), 
                           dfTemp2$`Score 2`, 
                           dfTemp3$`Score 3`, 
                           dfTemp4a$`Score 4`, 
                           dfTemp5$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(tempScore)<-'Component Scores'
tempScore<-rbind(tempScore, mean(tempScore$`Component Scores`))

tempScore$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")

tempScore%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - Temperature', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

Humidity

HumidityScore<-as.data.frame(c(mean(dfHumidity1$NodeMeanX), 
                           dfHumidity2$`Score 2`, 
                           dfHumidity3$`Score 3`, 
                           dfHumidity4a$`Score 4`, 
                           dfHumidity5$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(HumidityScore)<-'Component Scores'
HumidityScore<-rbind(HumidityScore, mean(HumidityScore$`Component Scores`))

HumidityScore$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")

HumidityScore%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - Humidity', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

Pressure

PressureScore<-as.data.frame(c(mean(dfPressure1$NodeMeanX), 
                           dfPressure2$`Score 2`, 
                           dfPressure3$`Score 3`, 
                           dfPressure4a$`Score 4`, 
                           dfPressure5$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(PressureScore)<-'Component Scores'
PressureScore<-rbind(PressureScore, mean(PressureScore$`Component Scores`))

PressureScore$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")


PressureScore%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - Pressure', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

PM2.5 Concentration

PM25Score<-as.data.frame(c(mean(dfPM251$NodeMeanX), 
                           dfPM252$`Score 2`, 
                           dfPM253$`Score 3`, 
                           dfPM254a$`Score 4`, 
                           dfPM255$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(PM25Score)<-'Component Scores'
PM25Score<-rbind(PM25Score, mean(PM25Score$`Component Scores`))

PM25Score$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")

PM25Score%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - PM 2.5 Concentration', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

CO Concentration

COScore<-as.data.frame(c(mean(dfCO1$NodeMeanX), 
                           dfCO2$`Score 2`, 
                           dfCO3$`Score 3`, 
                           dfCO4a$`Score 4`, 
                           dfCO5$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(COScore)<-'Component Scores'
COScore<-rbind(COScore, mean(COScore$`Component Scores`))

COScore$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")


COScore%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - CO concentration', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

H2S Concentration

H2SScore<-as.data.frame(c(mean(dfH2S1$NodeMeanX), 
                           dfH2S2$`Score 2`, 
                           dfH2S3$`Score 3`, 
                           dfH2S4a$`Score 4`, 
                           dfH2S5$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(H2SScore)<-'Component Scores'
H2SScore<-rbind(H2SScore, mean(H2SScore$`Component Scores`))

H2SScore$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")


H2SScore%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - H2S Concentration', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

NO2 Concentration

NO2Score<-as.data.frame(c(mean(dfNO21$NodeMeanX), 
                           dfNO22$`Score 2`, 
                           dfNO23$`Score 3`, 
                           dfNO24a$`Score 4`, 
                           dfNO25$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(NO2Score)<-'Component Scores'
NO2Score<-rbind(NO2Score, mean(NO2Score$`Component Scores`))

NO2Score$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")


NO2Score%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - NO2 Concentration', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

O3 Concentration

O3Score<-as.data.frame(c(mean(dfO31$NodeMeanX), 
                           dfO32$`Score 2`, 
                           dfO33$`Score 3`, 
                           dfO34a$`Score 4`, 
                           dfO35$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(O3Score)<-'Component Scores'
O3Score<-rbind(O3Score, mean(O3Score$`Component Scores`))

O3Score$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")


O3Score%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - O3 Concentration', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

SO2 Concentration

SO2Score<-as.data.frame(c(mean(dfSO21$NodeMeanX), 
                           dfSO22$`Score 2`, 
                           dfSO23$`Score 3`, 
                           dfSO24a$`Score 4`, 
                           dfSO25$`Score 5`))%>%
  unique()%>%
  as.data.frame()

colnames(SO2Score)<-'Component Scores'
SO2Score<-rbind(SO2Score, mean(SO2Score$`Component Scores`))

SO2Score$Score<-c('Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5', 'Overall')

caption<-"Indicator of Sensor Value Reliability - Score 1: Average proportion of reliable data collected by each node\n
Indicator of Sensor Value Reliablity - Score 2: Overall consistency in sensor value reliability\n
Indicator of Spatial Coverage - Score 3: Average proportion of network 'active'\n
Indicator of Spatial Coverage - Score 4: Average proportion of Chicago area covered\n
Indicator of Temporal Coverage - Score 5: Average proportion of day-duration node is 'active'"
caption <- paste0(strwrap(caption, 50), sep="", collapse="\n")


SO2Score%>%
  arrange(Score)%>%
  ggplot()+
  geom_col(aes(y=`Component Scores`, x=Score), 
           fill=c('indianred4', 
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred',
                  'indianred'))+
  geom_text(aes(y=`Component Scores`, x=Score, label=round(`Component Scores`, digits=1)), vjust=-0.5)+
  labs(x='Score type',
       y='Score value',
       title='Distribution of overall score and its 5 component scores - SO2 Concentration', 
       caption=caption)+
  geom_hline(yintercept=100, col='grey50', size=1)+
  theme_classic()

3.6 The scores for December 2018

We apply the scoring method above to the data collected by the AoT network for the whole month of December. In the code block below, we have provided the general code required to apply the scoring method for a month’s worth of data.

This section is computed separately and the scores for each data type is compiled separately as well. To obtain a full set of scores for the whole of December for all the data types, we then combined these separate data files to obtain dfScoresAll.rds. For the plots in the sections below, we will be using this compiled set of scores.

Here, df is the dataset retrieved using the retrieval function provided in Section 3.2.

1.Scoring

#score 1
df%>%
  group_by(date, node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(date, node_id, by10, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  mutate(NodeMeanX = sum(X)/144)%>%
  select(date, node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date)%>%
  mutate(`Score 1`=mean(NodeMeanX))%>%
  select(date,`Score 1` )%>%
  unique()%>%
  as.data.frame()->score1


#score 2
df%>%
  group_by(date, node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(date, node_id, by10, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  mutate(NodeMeanX = sum(X)/144)%>%
  select(date, node_id, by10, NodeMeanX, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>NodeMeanX, 
                       NodeMeanX,
                       abs(sd(X))))%>%
  group_by(date, node_id)%>%
  mutate(NodeSDScore= ifelse(NodeMeanX==0, 
                             0, 
                             ifelse(NodeMeanX<50, 
                                    abs(100-abs(100-100*(NodeSD/NodeMeanX))),
                                    abs(100-100*(NodeSD/NodeMeanX)))))%>%
  select(date, node_id, NodeSDScore)%>%
  na.omit()%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date)%>%
  mutate(`Score 2`= mean(NodeSDScore))%>%
  select(date,`Score 2`)%>%
  unique()%>%
  as.data.frame()->score2

#Score 3

df%>%
  filter(val_qual==1)%>%
  select(date, node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))%>%
  select(date, `Score 3`)%>%
  unique()%>%
  as.data.frame()->score3

#Score 4
df%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> df4

chig<-readOGR('.', 'chigBound')
chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

df4a<-NULL
for(i in unique(df4$by10)){
  
  subset <- 
    df4%>%
    filter(by10==i)
  
  if(nrow(subset)>1){
    subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
    subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    df4a<-rbind(df4a, df1)
  }else{
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-0
    df1<-as.data.frame(df1)
    df4a<-rbind(df4a, df1)
  }
  
}

df4a%>%
  mutate(date=date(by10))%>%
  group_by(date)%>%
  mutate(`Score 4`= 100*mean(AreaProp))%>%
  select(date, `Score 4`)%>%
  unique()%>%
  as.data.frame()->score4



#Score 5
df%>%
  filter(val_qual==1)%>%
  select(date,node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))%>%
  select(date, `Score 5`)%>%
  unique()%>%
  as.data.frame()->score5

scoreDec<-merge(score1, score2, by='date')
scoreDec<-merge(scoreDec, score3, by='date')
scoreDec<-merge(scoreDec, score4, by='date')
scoreDec<-merge(scoreDec, score5, by='date')
colnames(scoreDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
scoreDec$Overall<-(scoreDec$`Score 1` + scoreDec$`Score 2`+ scoreDec$`Score 3` + scoreDec$`Score 4`+ scoreDec$`Score 5`)/5
saveRDS(scoreDec, 'temperatureScoresDecember.rds') #change name of file according to the data type

2. Compiling scores for different data types

Temp<-readRDS('temperatureScoresDecember.rds')
Temp$Data<-'Temperature'
Humidity<-readRDS('humidityScoresDecember.rds')
Humidity$Data<-'Humidity'
Pressure<-readRDS('pressureScoresDecember.rds')
Pressure$Data<-'Pressure'
PM25<-readRDS('pm25ScoresDecember.rds')
PM25$Data<-'PM25'
CO<-readRDS('coScoresDecember.rds')
CO$Data<-'CO'
H2S<-readRDS('h2sDecember.rds')
H2S$Data<-'H2S'
NO2<-readRDS('no2sDecember.rds')
NO2$Data<-'NO2'
O3<-readRDS('o3December.rds')
O3$Data<-'O3'
SO2<-readRDS('so2December.rds')
SO2$Data<-'SO2'

dfScoresAll<-rbind(Temp, Humidity, Pressure, PM25, CO, H2S, NO2, O3, SO2)
saveRDS(dfScoresAll, 'dfScoresAll.rds')

For the sections below, we use the final compiled set of scores we obtained separately using the code blocks presented above in this section.

dfScoresAll<-readRDS('dfScoresAll.rds')

Below, we visualise how the scores for different data types during the month of December. The calendar plot was produced by applying openair’s calendarPlot function - this function was created specifically to plot variations in air pollutant in the calendar format. Therefore, you might notice that the parameter for which the function plots for is named pollutant. In our case, since we are plotting the overall scores for each day, pollutant is specified as Overall, the variable that is the overall score for the day.

Click on each tab to view how the scores vary for different days of the month for different data types.

Temperature

Based on the plot, it seems that temperature data collected by the AoT network is slightly unreliable for all the days of December 2018.

dfScoresAll%>%
  filter(Data=='Temperature')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='Temperature') 

Looking at how the component score values vary throughout the month, it could be observed that the temporal reliability of temperature data is consistently high for the whole month. Spatial coverage is also considered reliable for the month as well. However, the overall score is observably lowered by poor reliability in terms of sensor value reliability (Score 1 and Score 2).

dfScoresAll%>%
  filter(Data=='Temperature')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - Temperature')+
  theme_bw()

Humidity

Based on the plot, it seems that humidity data collected by the AoT network is slightly unreliable for all the days of December 2018.

dfScoresAll%>%
  filter(Data=='Humidity')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
            labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='Humidity')

Looking at how the component score values vary throughout the month, it could be observed that the temporal reliability of humidity data is consistently high for the whole month. However, the overall score is observably lowered by poor reliability in terms of sensor value reliability (Score 1 and Score 2) and spatial reliability (Score 3 and Score 4).

dfScoresAll%>%
  filter(Data=='Humidity')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - Humidity')+
  theme_bw()

Pressure

Based on the plot, it seems that Pressure data collected by the AoT network is reliable for most the days of December 2018, with the exception of 4 days during the first 2 weeks where the data is slightly unreliable.

dfScoresAll%>%
  filter(Data=='Pressure')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='Pressure')

Looking at how the component score values vary throughout the month, it could be observed that the temporal reliability of humidity data is consistently high for the whole month. However, the overall score is observably lowered by poor reliability in terms of spatial reliability (Score 3).

dfScoresAll%>%
  filter(Data=='Pressure')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - Pressure')+
  theme_bw()

PM2.5 Concentration

Based on the plot, it seems that PM 2.5 Concentration data collected by the AoT network is unreliable or slightly unreliable for most of the days in December 2018. For 5 days of the month, the data is even worse and is classified as highly unreliable.

dfScoresAll%>%
  filter(Data=='PM25')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='PM 2.5')

Looking at how the component score values vary throughout the month, it could be observed that the spatial reliability of humidity data is consistently low for the whole month. Sensor value and temporal reliability varies more widely throughout the month.

dfScoresAll%>%
  filter(Data=='PM25')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - PM2.5 Concentration')+
  theme_bw()

CO Concentration

Based on the plot, it seems that CO concentration data collected by the AoT network is slightly unreliable for most days of December 2018. However, there are 5 consecutive days of reliable data collected.

dfScoresAll%>%
  filter(Data=='CO')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='CO')

Looking at how the component score values vary throughout the month, it could be observed that sensor value, spatial, and temporal reliability varies erratically across the whole month.

dfScoresAll%>%
  filter(Data=='CO')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - CO Concentration')+
  theme_bw()

H2S Concentration

Based on the plot, it seems that H2S concentrtion data collected by the AoT network is slightly unreliable for most days of December 2018.

dfScoresAll%>%
  filter(Data=='H2S')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
              labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='H2S')

Looking at how the component score values vary throughout the month, it could be observed that the temporal reliability of H2S concentration data is consistently high for the whole month. However, the overall score is observably lowered by poor reliability in terms of spatial reliability (Score 3).

dfScoresAll%>%
  filter(Data=='H2S')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - H2S Concentration')+
  theme_bw()

NO2 Concentration

Based on the plot, it seems that NO2 Concentration data collected by the AoT network is reliable for all the days of December 2018, except one.

dfScoresAll%>%
  filter(Data=='NO2')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='NO2')

Looking at how the component score values vary throughout the month, it could be observed that the temporal reliability of NO2 concentration data is consistently high for the whole month. However, the overall score is observably lowered by poor reliability in terms of spatial reliability (Score 3).

dfScoresAll%>%
  filter(Data=='NO2')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - NO2 Concentration')+
  theme_bw()

O3 Concentration

Based on the plot, it seems that O3 Concentration data collected by the AoT network is reliable for all the days of December 2018.

dfScoresAll%>%
  filter(Data=='O3')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='O3')

Looking at how the component score values vary throughout the month, it could be observed that the temporal reliability of O3 concentration data is consistently high for the whole month. However, the overall score is observably lowered by poor reliability in terms of spatial reliability (Score 3).

dfScoresAll%>%
  filter(Data=='O3')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - O3 Concentration')+
  theme_bw()

SO2 Concentration

Based on the plot, it seems that SO2 concentration data collected by the AoT network is slightly unreliable for all the days of December 2018.

dfScoresAll%>%
  filter(Data=='SO2')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='SO2')

Looking at how the component score values vary throughout the month, it could be observed that the temporal reliability of SO2 concentration data is consistently high for the whole month. However, the overall score is observably lowered by poor reliability in terms of sensor value reliability (Score 1 and Score 2) and spatial reliability (Score 3 and Score 4).

dfScoresAll%>%
  filter(Data=='SO2')%>%
  select(-Overall, -Data)%>%
  gather(key='Score Component', value='value', -date)%>%
  ggplot()+
  geom_line(aes(x=date, y=value, col=`Score Component`), size=2, alpha=0.5)+
  geom_area(aes(x=date, y=value, fill=`Score Component`), col=NA,alpha=0.25)+
  geom_point(aes(x=date, y=value, col=`Score Component`), size=2)+
    facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  scale_color_brewer(palette = 'Dark2')+
  scale_fill_brewer(palette='Dark2')+
  labs(x='Date', y='Score value',
       title='Variations in component score values for December - SO2 Concentration')+
  theme_bw()

4. Scoring Data Reliability After Imputation

To assess imputability, we compare the reliability scores of the data before and after we apply the imputation method described in Section 2.5.

The flow chart below illustrates the workflow we adopt for scoring data reliability for each day in this section - here, it can be observed that the imputation procedure is incorporated after data retrieval.

The codeblocks below contain the code required to carry out the workflow above.

1. Impute and update dataset with imputed values

#imputation

maindf<-dfSO2
maindf$value_hrf[maindf$val_qual==0]<-NA #convert invalid values to NA

maindf%>%
  as.data.frame()%>%
  select(by10, node_id, lat, lon, value_hrf)%>%
  group_by(by10, node_id, lat, lon)%>%
  mutate(temp=mean(value_hrf, na.rm = TRUE))%>%
  unique()%>%
  as.data.frame()->maindf #aggregate valid values for each 10-minute interval


maindf %>%
  mutate(time = ymd_hms(by10)) %>%
  select(-by10)->maindf


#create unique timestamps
timestamps <- unique(maindf$time) %>% as.data.frame() %>% rename('time' = '.')

#create empty data frame 
complete_temp_group <- c()

#fill in empty data frame
for (i in unique(maindf$node_id)) {
  longitude = df[maindf$node_id == i, ][1, 'lon'] %>% as.numeric()
  latitude = df[maindf$node_id == i, ][1, 'lat'] %>% as.numeric()
  
  sample_node <-
    maindf %>%
    filter(node_id == i) %>%
    right_join(., timestamps, by = 'time')%>%
    mutate(node_id = i,
           lon = longitude,
           lat = latitude)
  
  complete_temp_group <-
    rbind(complete_temp_group, sample_node)
}

rm(timestamps)

#set up
all_temp_nodes <-
  complete_temp_group %>%
  group_by(node_id) %>%
  summarise(lat = first(lat),
            lon = first(lon)) %>%
  st_as_sf(coords = c('lon', 'lat'), crs = 4326, agr = 'constant')

all_temp_nodes_harn <-
  all_temp_nodes %>%
  st_transform(crs = 102641)

all_temp_nodes_harn_xy <-
  all_temp_nodes_harn %>%
  cbind(.,st_coordinates(all_temp_nodes_harn))  %>%
  st_set_geometry(NULL) %>%
  dplyr::select(X,Y) %>%
  as.matrix()


nn6 <-   
  get.knnx(all_temp_nodes_harn_xy, all_temp_nodes_harn_xy, 2)$nn.dist %>%
  as.data.frame() %>%
  rename(distance = V2) %>%
  select(distance)

buffer_dist <- max(nn6$distance) + 10 

# Know which nodes are inside buffer
all_temp_nodes_harn_buffer_intersect <-
  st_buffer(all_temp_nodes_harn, buffer_dist) %>%
  st_join(all_temp_nodes_harn, join = st_intersects) %>%
  st_set_geometry(NULL) %>%
  select(node_id.x, node_id.y) %>%
  rename(node_id = node_id.x, 
         inside_buffer = node_id.y) %>%
  filter(node_id != inside_buffer)

# Join temperature 10 minutes ago of nodes in buffer
df_buffer <-
  left_join(complete_temp_group, all_temp_nodes_harn_buffer_intersect, by = 'node_id') %>%
  left_join(complete_temp_group %>%
              mutate(time_lag = time + 600) %>%
              select(node_id, time_lag,temp),
            by = c('time' = 'time_lag',
                   'inside_buffer' = 'node_id')) %>%
  mutate(buffer_temp = temp.y,
         temp = temp.x) %>%
  select(node_id, inside_buffer, time, lon, lat, temp, buffer_temp)


df_buffer <-
  df_buffer %>%
  group_by(node_id, time) %>%
  summarise(lon = first(lon),
            lat = first(lat),
            temp = first(temp),
            avg_buffer_temp = mean(buffer_temp, na.rm = TRUE))



#identify 5 closest neighbours
# Get the node IDs of nearest 5
nn7 <-   
  get.knnx(all_temp_nodes_harn_xy, all_temp_nodes_harn_xy, 6)$nn.index %>%
  as.data.frame() %>%
  rename(N1 = V2, N2 = V3, N3 = V4, N4 = V5, N5 = V6) %>%
  left_join(all_temp_nodes_harn %>%
              mutate(index = as.numeric(row.names(.))), 
            by = c('V1' = 'index')) %>%
  select(node_id, V1, N1, N2, N3, N4, N5) 

nn8 <-
  left_join(nn7, nn7 %>%
              select(node_id, V1),
            by = c('N1' = 'V1')) %>%
  left_join(nn7 %>%
              select(node_id, V1),
            by = c('N2' = 'V1')) %>%
  left_join(nn7 %>%
              select(node_id, V1),
            by = c('N3' = 'V1')) %>%
  left_join(nn7 %>%
              select(node_id, V1),
            by = c('N4' = 'V1')) %>%
  left_join(nn7 %>%
              select(node_id, V1),
            by = c('N5' = 'V1')) %>%
  select(node_id.x, node_id.y, node_id.x.x, node_id.y.y, 
         node_id.x.x.x, node_id.y.y.y) %>%
  rename(node_id = node_id.x,
         nearest_1 = node_id.y,
         nearest_2 = node_id.x.x,
         nearest_3 = node_id.y.y,
         nearest_4 = node_id.x.x.x,
         nearest_5 = node_id.y.y.y)

# Get the average temperature 10 minutes ago of nearest five
dat_buffer_nearest5 <-
  left_join(df_buffer, nn8, by = 'node_id')%>%
  left_join(df_buffer %>%
              mutate(time_lag = time + 600) %>%
              select(node_id, time_lag, temp),
            by = c('time' = 'time_lag',
                   'nearest_1' = 'node_id')) %>%
  left_join(df_buffer %>%
              mutate(time_lag = time + 600) %>%
              select(node_id, time_lag, temp),
            by = c('time' = 'time_lag',
                   'nearest_2' = 'node_id')) %>%
  left_join(df_buffer %>%
              mutate(time_lag = time + 600) %>%
              select(node_id, time_lag, temp),
            by = c('time' = 'time_lag',
                   'nearest_3' = 'node_id')) %>%
  left_join(df_buffer %>%
              mutate(time_lag = time + 600) %>%
              select(node_id, time_lag, temp),
            by = c('time' = 'time_lag',
                   'nearest_4' = 'node_id')) %>%
  left_join(df_buffer %>%
              mutate(time_lag = time + 600) %>%
              select(node_id, time_lag, temp),
            by = c('time' = 'time_lag',
                   'nearest_5' = 'node_id')) %>%
  select(node_id, lon, lat, time, temp.x, avg_buffer_temp, temp.y, temp.x.x, 
         temp.y.y, temp.x.x.x, temp.y.y.y) %>%
  rename(temp = temp.x,
         nearest_1 = temp.y,
         nearest_2 = temp.x.x,
         nearest_3 = temp.y.y,
         nearest_4 = temp.x.x.x,
         nearest_5 = temp.y.y.y)

dat_buffer_nearest5 <-
  dat_buffer_nearest5 %>%
  gather(nearest, value, nearest_1:nearest_5) %>%
  group_by(node_id, time) %>%
  summarise(lon = first(lon),
            lat = first(lat), 
            temp = first(temp),
            avg_buffer_temp = first(avg_buffer_temp),
            avg_nearby_temp = mean(value, na.rm = TRUE))

#identifying from 10 minutes
# Get temps from 10 minutes ago
dat_whole <-
  left_join(dat_buffer_nearest5,
            dat_buffer_nearest5 %>%
              mutate(time_lag = time + 600) %>%
              select(node_id, time_lag, temp),
            by = c('time' = 'time_lag',
                   'node_id' = 'node_id')) %>%
  rename(temp = temp.x,
         temp_10m = temp.y)

rm(dat_buffer_nearest5, nn6, nn7, nn8, all_temp_nodes_harn_buffer_intersect, all_temp_nodes_harn_xy, all_temp_nodes_harn)

#fit models
dat_timemodel <- dat_whole[!(is.na(dat_whole$temp) | is.na(dat_whole$temp_10m)), ]
dat_nbormodel <- dat_whole[!(is.na(dat_whole$temp) | is.na(dat_whole$avg_nearby_temp)), ]

# Time model
mod7 <- lm(temp ~ temp_10m, data = dat_timemodel)

# Nearest 5 model
mod9 <- lm(temp ~ avg_nearby_temp, data = dat_nbormodel)


#metrics
mod7PredValues <-
  data.frame(node_id = dat_whole$node_id,
             lon = dat_whole$lon,
             lat = dat_whole$lat,
             time = dat_whole$time,
             observed = dat_whole$temp,
             predicted = predict(mod7, dat_whole)) 
mod9PredValues <-
  data.frame(node_id = dat_whole$node_id,
             lon = dat_whole$lon,
             lat = dat_whole$lat,
             time = dat_whole$time,
             observed = dat_whole$temp,
             predicted = predict(mod9, dat_whole))
#predict
## Create a complete dataset which includes all imputed values
predicted_set = c()

for (i in 1:dim(mod7PredValues)[1]){
  if (!is.na(mod7PredValues[i, 'predicted'])){
    predicted_set <- rbind(predicted_set, mod7PredValues[i, 'predicted'])
  }
  
  else if (is.na(mod7PredValues[i, 'predicted']) & !is.na(mod9PredValues[i, 'predicted'])) {
    predicted_set <- rbind(predicted_set, mod9PredValues[i, 'predicted'])
  }
  
  else {
    predicted_set <- rbind(predicted_set, mod9PredValues[i, 'predicted'])
  }
  
}

complete_set <-
  data.frame(node_id = dat_whole$node_id,
             time = dat_whole$time,
             lon = dat_whole$lon,
             lat = dat_whole$lat,
             observed = dat_whole$temp,
             predicted = predicted_set)


#sub in the values
complete_set$final<-ifelse(is.na(complete_set$observed), 
                           complete_set$predicted, 
                           complete_set$observed)

#compute valqual
complete_set$val_qual<-ifelse(is.na(complete_set$final), 0,1)
complete_set$date<-date(complete_set$time)
complete_set$by10<-cut(complete_set$time, breaks='10 min')

2. Apply scoring procedures on updated dataset

#define parameters


df<-complete_set

#score 1
df%>%
  group_by(date, node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(date, node_id, by10, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  mutate(NodeMeanX = sum(X)/144)%>%
  select(date, node_id, NodeMeanX)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date)%>%
  mutate(`Score 1`=mean(NodeMeanX))%>%
  select(date,`Score 1` )%>%
  unique()%>%
  as.data.frame()->temp1


#score 2
df%>%
  group_by(date, node_id, by10)%>%
  mutate(X = 100*sum(val_qual)/n())%>%
  select(date, node_id, by10, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  mutate(NodeMeanX = sum(X)/144)%>%
  select(date, node_id, by10, NodeMeanX, X)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  mutate(NodeSD=ifelse(abs(sd(X))>NodeMeanX, 
                       NodeMeanX,
                       abs(sd(X))))%>%
  group_by(date, node_id)%>%
  mutate(NodeSDScore= ifelse(NodeMeanX==0, 
                             0, 
                             ifelse(NodeMeanX<50, 
                                    abs(100-abs(100-100*(NodeSD/NodeMeanX))),
                                    abs(100-100*(NodeSD/NodeMeanX)))))%>%
  select(date, node_id, NodeSDScore)%>%
  na.omit()%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date)%>%
  mutate(`Score 2`= mean(NodeSDScore))%>%
  select(date,`Score 2`)%>%
  unique()%>%
  as.data.frame()->temp2

#Score 3

df%>%
  filter(val_qual==1)%>%
  select(date, node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, by10)%>%
  summarise(count=n())%>%
  mutate(propActive=(count*100)/86)%>%
  mutate(`Score 3`=mean(propActive))%>%
  select(date, `Score 3`)%>%
  unique()%>%
  as.data.frame()->temp3

#Score 4
df%>%
  filter(val_qual==1)%>%
  select(by10, node_id, lat, lon)%>%
  mutate(lat=as.numeric(lat), 
         lon=as.numeric(lon))%>%
  unique()%>%
  as.data.frame()-> df4

chig<-readOGR('.', 'chigBound')
chig<-spTransform(chig, CRS('+init=EPSG:3435'))
chigArea<-gArea(chig)

df4a<-NULL
for(i in unique(df4$by10)){
  
  subset <- 
    df4%>%
    filter(by10==i)
  
  if(nrow(subset)>1){
    subset_nodes_sp <- SpatialPointsDataFrame(subset[, c('lon','lat')], subset, proj4string = CRS('+init=EPSG:4326'))
    subset_nodes_trnsfrmd <- spTransform(subset_nodes_sp, CRS('+init=EPSG:3435'))
    P2 <- matrix(c(subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,2],
                   subset_nodes_trnsfrmd@bbox[1,2], subset_nodes_trnsfrmd@bbox[2,1],
                   subset_nodes_trnsfrmd@bbox[1,1], subset_nodes_trnsfrmd@bbox[2,1]),
                 ncol = 2, byrow = TRUE) %>%
      Polygon()
    Ps2 <- SpatialPolygons(list(Polygons(list(P2), ID = "a")), proj4string=CRS('+init=EPSG:3435'))
    #clip using chicago
    Ps2<-gIntersection(Ps2, chig, byid=FALSE)
    Ps2AreaProp<-gArea(Ps2)/chigArea
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-Ps2AreaProp
    df1<-as.data.frame(df1)
    df4a<-rbind(df4a, df1)
  }else{
    df1<-NULL
    df1$by10<-i
    df1$AreaProp<-0
    df1<-as.data.frame(df1)
    df4a<-rbind(df4a, df1)
  }
  
}

df4a%>%
  mutate(date=date(by10))%>%
  group_by(date)%>%
  mutate(`Score 4`= 100*mean(AreaProp))%>%
  select(date, `Score 4`)%>%
  unique()%>%
  as.data.frame()->temp4

#Score 5
df%>%
  filter(val_qual==1)%>%
  select(date,node_id, by10)%>%
  unique()%>%
  as.data.frame()%>%
  group_by(date, node_id)%>%
  summarise(duration=n())%>%
  mutate(durationActive = (duration*10)/60)%>%
  mutate(propActive=100*durationActive/24)%>%
  mutate(`Score 5` = mean(propActive))%>%
  select(date, `Score 5`)%>%
  unique()%>%
  as.data.frame()->temp5


#temperature
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedtemperatureScoresDecember.rds')

#humidity
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedhumidityScoresDecember.rds')

#pressure
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedpressureScoresDecember.rds')

#pm25
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedpm25ScoresDecember.rds')

#CO
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedcoDecember.rds')

#h2s
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedh2sDecember.rds')

#no2
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedno2December.rds')

#o3
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedo3December.rds')

#so2
tempDec<-merge(temp1, temp2, by='date')
tempDec<-merge(tempDec, temp3, by='date')
tempDec<-merge(tempDec, temp4, by='date')
tempDec<-merge(tempDec, temp5, by='date')
colnames(tempDec)<-c('date', 'Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5')
tempDec$Overall<-(tempDec$`Score 1` + tempDec$`Score 2`+ tempDec$`Score 3` + tempDec$`Score 4`+ tempDec$`Score 5`)/5
saveRDS(tempDec, 'imputedso2December.rds')


Temp<-readRDS('imputedtemperatureScoresDecember.rds')
Temp$Data<-'Temperature'
Humidity<-readRDS('imputedhumidityScoresDecember.rds')
Humidity$Data<-'Humidity'
Pressure<-readRDS('imputedpressureScoresDecember.rds')
Pressure$Data<-'Pressure'
PM25<-readRDS('imputedpm25ScoresDecember.rds')
PM25$Data<-'PM25'
CO<-readRDS('imputedcoDecember.rds')
CO$Data<-'CO'
H2S<-readRDS('imputedh2sDecember.rds')
H2S$Data<-'H2S'
NO2<-readRDS('imputedno2December.rds')
NO2$Data<-'NO2'
O3<-readRDS('imputedo3December.rds')
O3$Data<-'O3'
SO2<-readRDS('imputedso2December.rds')
SO2$Data<-'SO2'

dfScoresAll<-rbind(Temp, Humidity, Pressure, PM25, CO, H2S, NO2, O3, SO2)
saveRDS(dfScoresAll, 'imputeddfScoresAll.rds')

For the sections belw, we use the final compiled set of imputed scores we obtained separately using the code blocks presented above in this section.

dfScoresImputed<-readRDS('imputeddfScoresAll.rds')
dfScoresImputed$Type<-'After imputation'

dfScoresAll$Type<-'Before imputation'
dfScoresImputed%>%
  bind_rows(dfScoresAll)%>%
  select(-Overall)%>%
  gather(key='Score Component', value='value', -date, -Data, -Type)%>%
  spread(key=Type, value=value)%>%
  mutate(segment=findInterval(`Before imputation`, c(!is.na(`After imputation`))))%>%
  mutate(change=ifelse(`After imputation`-`Before imputation`>0, 
                       'Improved', 
                       ifelse(`After imputation`-`Before imputation`==0, 
                              'No change','Worsened')))->dfScoresImputedAll

Below, we visualise how the scores for different data types vary during the month of December after imputation.

Click on each tab to view how the scores vary for different days of the month for different data types, and a discussion of how they compare to the scores obtained before imputation.

Temperature

Based on the plot, the imputed temperature data is reliable for all the days of December 2018. This is a marked improvement from the scores obtained before imputation - the data was slightly unreliable for all the days.

dfScoresImputed%>%
  filter(Data=='Temperature')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='Temperature')

From the plots below, it could be observed how imputation has affected different reliability metrics to create this overall improvement in reliability. Imputation has substantially improved the sensor value reliability scores (Score 1 and Score 2), but slightly worsened the other scores.

ggplot(data=subset(dfScoresImputedAll, Data=='Temperature'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'indianred1', 'grey90'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - Temperature')+
  theme_bw()

Humidity

Based on the plot, the imputed humidity data is still slightly unreliable for all the days of December 2018. However, reliability improved for 3 separate days of the month. This is slight improvement from the scores obtained before imputation - the data was slightly unreliable for all the days.

dfScoresImputed%>%
  filter(Data=='Humidity')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='Humidity')

From the plots below, it could be observed how imputation has affected different reliability metrics to create this overall improvement in reliability. It has slightly improved sensor value reliability (Score 1 and Score 2) and spatial reliability(Score 3 and Score 4). In terms of temporal reliability (Score 5), imputation has slightly worsened this instead.

ggplot(data=subset(dfScoresImputedAll, Data=='Humidity'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'indianred1', 'grey90'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - Humidity')+
  theme_bw()

Pressure

Based on the plot, the imputed pressure data is reliable for all the days of December 2018. This is slight improvement from the scores obtained before imputation - the data was slightly unreliable for 3 of the days.

dfScoresImputed%>%
  filter(Data=='Pressure')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='Pressure')

From the plots below, it could be observed how imputation has affected different reliability metrics.It has slightly improved sensor value reliability (Score 1 and Score 2) and spatial reliability(Score 3).

ggplot(data=subset(dfScoresImputedAll, Data=='Pressure'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'grey90', 'indianred'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - Pressure')+
  theme_bw()

PM 2.5 Concentration

Based on the plot, the imputed PM 2.5 concentration data is unreliable or highly unreliable for all the days of December 2018. This means that reliability got worse from the scores obtained before imputation - the data was at least only slightly unreliable for 8 of the days.

dfScoresImputed%>%
  filter(Data=='PM25')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='PM 2.5')

From the plots below, it could be observed how imputation has affected different reliability metrics to result in the overall worsening of reliability. It has substantially worsened sensor value reliability (Score 1 and Score 2) for the second half of the month.

ggplot(data=subset(dfScoresImputedAll, Data=='PM25'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'grey90', 'indianred1'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - PM 2.5 Concentration')+
  theme_bw()

CO Concentration

Based on the plot, the imputed CO concentration data is reliable for half of the days of December 2018. This is marked improvement from the scores obtained before imputation - the data was slightly unreliable for most of the days. However, reliability for one of the days worsened after imputation.

dfScoresImputed%>%
  filter(Data=='CO')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='CO')

From the plots below, it could be observed how imputation has affected different reliability metrics.It has slightly improved sensor value reliability (Score 1). Its effect on Score 2 is more variable depending on the day of the month. In terms of temporal reliability (Score 5), imputation has slightly worsened this instead.

ggplot(data=subset(dfScoresImputedAll, Data=='CO'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'grey90', 'indianred1'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - CO Concentration')+
  theme_bw()

H2S Concentration

Based on the plot, the imputed H2S concentration data is reliable for most days of December 2018. This is marked improvement from the scores obtained before imputation - the data was slightly unreliable for all days except three.

dfScoresImputed%>%
  filter(Data=='H2S')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='H2S')

From the plots below, it could be observed how imputation has affected different reliability metrics.It has slightly improved sensor value reliability (Score 1 and Score 2).

ggplot(data=subset(dfScoresImputedAll, Data=='H2S'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'grey90', 'indianred1'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - H2S Concentration')+
  theme_bw()

NO2 Concentration

Based on the plot, the imputed NO2 concentration data is still reliable for most days of December 2018.

dfScoresImputed%>%
  filter(Data=='NO2')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='NO2')

From the plots below, it could be observed how imputation has affected different reliability metrics.It has slightly worsened sensor value reliability (Score 2).

ggplot(data=subset(dfScoresImputedAll, Data=='NO2'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'grey90', 'indianred1'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - NO2 Concentration')+
  theme_bw()

O3 Concentration

Based on the plot, the imputed O3 concentration data is still reliable for all the days of December 2018.

dfScoresImputed%>%
  filter(Data=='O3')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='O3')

From the plots below, it could be observed how imputation has affected different reliability metrics.It has slightly worsened sensor value reliability (Score 1 and Score 2).

ggplot(data=subset(dfScoresImputedAll, Data=='O3'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'grey90', 'indianred1'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - O3 Concentration')+
  theme_bw()

SO2 Concentration

Based on the plot, the imputed SO2 concentration data is still reliable for half of the days of December 2018. This is an improvement from the scores obtained before imputation - the data was slightly unreliable for all the days.

dfScoresImputed%>%
  filter(Data=='SO2')%>%
  select(date, Overall)%>%
  unique()%>%
  as.data.frame()%>%
  calendarPlot(pollutant = 'Overall', year=2018, month=12,
             annotate = 'date', cols='RdYlBu', limits=c(0, 100), 
             labels=c('Highly unreliable', 'Unreliable', 'Slightly unreliable','Reliable', 'Highly reliable'),
             breaks = c(0,20,40,60,80,100),
             main='SO2')

From the plots below, it could be observed how imputation has affected different reliability metrics.It has slightly improved sensor value reliability (Score 1 and Score 2).

ggplot(data=subset(dfScoresImputedAll, Data=='SO2'), 
       aes(x=date, ymin=`Before imputation`, ymax=`After imputation`))+
  geom_ribbon(aes(fill=factor(change)), alpha=0.25)+
  scale_fill_manual('Imputation effect',
                    values=c('cornflowerblue', 'grey90', 'indianred1'))+
  geom_path(aes(y=`Before imputation`), colour='red', size=2, alpha=0.5)+
  geom_path(aes(y=`After imputation`), colour='blue', size=2, alpha=0.5)+
  geom_point(aes(y=`Before imputation`), colour='red', size=2)+
  geom_point(aes(y=`After imputation`), colour='blue', size=2)+
  facet_wrap(~`Score Component`, nrow=3)+
  ylim(0,100)+
  labs(x='Date', y='Score value',
       title='Effect of imputation on data reliability component - SO2 Concentration')+
  theme_bw()