ATQ: Assesing Evaluation Metrics for Timely Epidemic Detection Models

Background

Madeline A. Ward published a paper on methods for detecting seasonal influenza epidemics using school absenteeism data. The premise is that there is a school absenteeism surveillance system established in Wellington-Dufferin-Guelph which uses a threshold-based (10% of students absent) approach to raise alarms for school illness outbreaks to implement mitigating measures. Two metrics (FAR and ADD) are proposed in that study that were found to be more accurate.

Based on the work of Madeline, in 2021 Kayla Vanderkruk along with Drs. Deeth and Feng wrote a paper on improved metrics, namely ATQ (alert time quality), that are more accurate and timely than the FAR-selected models. This package is based off Kayla’s work that can be found here. ATQ study assessed alarms on a gradient, where alarms raised incrementally before or after an optimal date were penalized for lack of timeliness.

This ATQ package will allow users to perform simulation studies and evaluate ATQ & FAR based metrics from them. The simulation study will require information from census data for a region such as distribution of number of household members, households with and without children, and age category, etc.

This package is still a work in progress and future considerations include streamlining simulation study workflow and generalizing evaluation of metrics to include real data sets.

Installation:

You can install the development version of ATQ from Github

#install.packages("devtools")
library(devtools)
install_github("vjoshy/ATQ_Surveillance_Package")

Key Functions

The ATQ package includes the following main functions:

  1. catchment_sim()
  2. elementary_pop()
  3. subpop_children()
  4. subpop_noChildren()
  5. simulate_households()
  6. ssir()
  7. compile_epi()
  8. eval_metrics()
  9. plot() and summary()

Additionally, the package implements S3 methods for generic functions:

  1. plot() and summary()

These functions and methods work together to facilitate comprehensive epidemic simulation and evaluation of detection models

Please see example below:

# Load the ATQ package
library(ATQ)

#Simulate number of elementary schools in each catchment
catch_df <- catchment_sim(16, 5, shape = 4.68, rate = 3.01)

# Simulate elementary school populations for each catchment area
elementary_df <- elementary_pop(catch_df, shape = 5.86, rate = 0.01)

# Simulate households with children
house_children <- subpop_children(elementary_df, n = 2,
                                  prop_parent_couple = 0.7,
                                  prop_children_couple = c(0.3, 0.5, 0.2),
                                  prop_children_lone = c(0.4, 0.4, 0.2),
                                  prop_elem_age = 0.6)

# Simulate households without children
house_nochildren <-  subpop_noChildren(house_children, elementary_df,
                                   prop_house_size = c(0.2, 0.3, 0.25, 0.15, 0.1),
                                   prop_house_Children = 0.3)

# Combine household simulations and generate individual-level data
simulation <- simulate_households(house_children, house_nochildren)

# Extract individual-level data
individuals <- simulation$individual_sim

# Simulate epidemic using SSIR (Stochastic Susceptible-Infectious-Recovered) model
epidemic <- ssir(nrow(individuals), T = 300, alpha = 0.298, inf_init = 32, rep = 10)

# Summarize and plot the epidemic simulation results
summary(epidemic)
#> SSIR Epidemic Summary (Multiple Simulations):
#> Number of simulations: 10 
#> 
#> Average total infected: 53980.5 
#> Average total reported cases: 1067.4 
#> Average peak infected: 4007.8 
#> 
#> Model parameters:
#> $N
#> [1] 175730
#> 
#> $T
#> [1] 300
#> 
#> $alpha
#> [1] 0.298
#> 
#> $inf_period
#> [1] 4
#> 
#> $inf_init
#> [1] 32
#> 
#> $report
#> [1] 0.02
#> 
#> $lag
#> [1] 7
#> 
#> $rep
#> [1] 10
plot(epidemic)


# Compile absenteeism data based on epidemic simulation and individual data
absent_data <- compile_epi(epidemic, individuals)

# Display structure of absenteeism data
dplyr::glimpse(absent_data)
#> Rows: 3,000
#> Columns: 28
#> $ Date        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
#> $ ScYr        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ pct_absent  <dbl> 0.05009972, 0.05229680, 0.05238570, 0.04842380, 0.04934321…
#> $ absent      <dbl> 800, 839, 816, 770, 767, 838, 811, 811, 765, 789, 792, 802…
#> $ absent_sick <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ new_inf     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ lab_conf    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Case        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ sinterm     <dbl> 0.01720158, 0.03439806, 0.05158437, 0.06875541, 0.08590610…
#> $ costerm     <dbl> 0.9998520, 0.9994082, 0.9986686, 0.9976335, 0.9963032, 0.9…
#> $ window      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ ref_date    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ lag0        <dbl> 0.05009972, 0.05229680, 0.05238570, 0.04842380, 0.04934321…
#> $ lag1        <dbl> NA, 0.05009972, 0.05229680, 0.05238570, 0.04842380, 0.0493…
#> $ lag2        <dbl> NA, NA, 0.05009972, 0.05229680, 0.05238570, 0.04842380, 0.…
#> $ lag3        <dbl> NA, NA, NA, 0.05009972, 0.05229680, 0.05238570, 0.04842380…
#> $ lag4        <dbl> NA, NA, NA, NA, 0.05009972, 0.05229680, 0.05238570, 0.0484…
#> $ lag5        <dbl> NA, NA, NA, NA, NA, 0.05009972, 0.05229680, 0.05238570, 0.…
#> $ lag6        <dbl> NA, NA, NA, NA, NA, NA, 0.05009972, 0.05229680, 0.05238570…
#> $ lag7        <dbl> NA, NA, NA, NA, NA, NA, NA, 0.05009972, 0.05229680, 0.0523…
#> $ lag8        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 0.05009972, 0.05229680, 0.…
#> $ lag9        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05009972, 0.05229680…
#> $ lag10       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05009972, 0.0522…
#> $ lag11       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05009972, 0.…
#> $ lag12       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05009972…
#> $ lag13       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0500…
#> $ lag14       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.…
#> $ lag15       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

# Evaluate alarm metrics for epidemic detection
alarm_metrics <- eval_metrics(absent_data, thres = seq(0.1,0.6,by = 0.05),
                              ScYr = c(2:10), yr.weights = c(1:9)/sum(c(1:9)))

# Plot various alarm metrics
plot(alarm_metrics$metrics, "FAR")   # False Alert Rate

plot(alarm_metrics$metrics, "FATQ")  # First Alert Time Quality

plot(alarm_metrics$metrics, "AATQ")  # Average Alert Time Quality

plot(alarm_metrics$metrics, "WAATQ") # Weighted Average Alert Time Quality

plot(alarm_metrics$metrics, "WFATQ") # Weighted First Alert Time Quality

plot(alarm_metrics$metrics, "ADD")   # Accumulated Delay Days



# Summarize alarm metrics
summary(alarm_metrics$results)
#> Alarm Metrics Summary
#> =====================
#> 
#> FAR :
#>   Mean: 0.7644 
#>   Variance: 0.0056 
#>   Best lag: 1 
#>   Best threshold: 0.2 
#>   Best value: 0.5429 
#> 
#> ADD :
#>   Mean: 13.6054 
#>   Variance: 8.2501 
#>   Best lag: 1 
#>   Best threshold: 0.1 
#>   Best value: 3.5556 
#> 
#> AATQ :
#>   Mean: 0.5082 
#>   Variance: 0.004 
#>   Best lag: 1 
#>   Best threshold: 0.2 
#>   Best value: 0.2871 
#> 
#> FATQ :
#>   Mean: 0.7039 
#>   Variance: 0.0055 
#>   Best lag: 1 
#>   Best threshold: 0.2 
#>   Best value: 0.5468 
#> 
#> WAATQ :
#>   Mean: 0.5184 
#>   Variance: 0.0068 
#>   Best lag: 1 
#>   Best threshold: 0.2 
#>   Best value: 0.2177 
#> 
#> WFATQ :
#>   Mean: 0.7298 
#>   Variance: 0.0112 
#>   Best lag: 1 
#>   Best threshold: 0.2 
#>   Best value: 0.5214 
#> 
#> Reference Dates:
#>    epidemic_years ref_dates
#> 1               1        41
#> 2               2        27
#> 3               3        39
#> 4               4        72
#> 5               5        33
#> 6               6        39
#> 7               7        20
#> 8               8        74
#> 9               9        88
#> 10             10        30
#> 
#> Best Prediction Dates:
#> FAR :
#>  [1] NA NA 33 30 13  6  6 10 34 18
#> 
#> ADD :
#>  [1] NA NA 27 30  1  6  6  7 21  1
#> 
#> AATQ :
#>  [1] NA NA 33 30 13  6  6 10 34 18
#> 
#> FATQ :
#>  [1] NA NA 33 30 13  6  6 10 34 18
#> 
#> WAATQ :
#>  [1] NA NA 33 30 13  6  6 10 34 18
#> 
#> WFATQ :
#>  [1] NA NA 33 30 13  6  6 10 34 18

# Generate and display plots for alarm metrics across epidemic years
alarm_plots <- plot(alarm_metrics$plot_data)
for(i in seq_along(alarm_plots)) { 
  print(alarm_plots[[i]]) 
}

#> Warning: Removed 35 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 18 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 105 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 164 rows containing missing values or values outside the scale range
#> (`geom_col()`).

The final output ‘region_metric’ will be a list of 6 matrices and 6 data frames. The matrices describe the values of metrics for respective lag and thresholds.

An optimal lag and threshold value that minimizes each metric is selected and these optimal parameters are used to generate the 6 data frames associated with the metrics. Simulated information like number of lab confirmed cases, number of students absent, etc, are also included in the output.