ATQ Guide

Vinay Joshy

ATQ: Using Absenteeism Data to Detect Onset of Epidemics

Introduction

The ATQ package provides tools for public health institutions to detect epidemics using school absenteeism data. It offers functions to simulate regional populations of households, elementary schools, and epidemics, and to calculate alarm metrics from these simulations.

This package builds on the work of Ward et al. and Vanderkruk et al. It introduces the Alert Time Quality (ATQ) metrics such as the Average ATQ (AATQ) and First ATQ (FATQ), to evaluate the timeliness and accuracy of epidemic alerts. This vignette demonstrates the package’s use through a simulation study based on Vanderkruk et al., modeling yearly influenza epidemics and their alarm metrics in the Wellington-Dufferin-Guelph public health unit, Canada.

To use the package, install and load it with:

library(devtools)
#> Warning: package 'devtools' was built under R version 4.2.2
#> Loading required package: usethis
#> Warning: package 'usethis' was built under R version 4.2.3
install_github("vjoshy/ATQ_Surveillance_Package")
#> Downloading GitHub repo vjoshy/ATQ_Surveillance_Package@HEAD
#> rlang  (1.1.1 -> 1.1.4) [CRAN]
#> glue   (1.6.2 -> 1.7.0) [CRAN]
#> farver (2.1.1 -> 2.1.2) [CRAN]
#> cli    (3.6.1 -> 3.6.3) [CRAN]
#> utf8   (1.2.3 -> 1.2.4) [CRAN]
#> fansi  (1.0.4 -> 1.0.6) [CRAN]
#> dplyr  (1.1.2 -> 1.1.4) [CRAN]
#> Installing 7 packages: rlang, glue, farver, cli, utf8, fansi, dplyr
#> Installing packages into 'C:/Users/Vinay/AppData/Local/Temp/RtmpWox1d1/Rinst21d8de745'
#> (as 'lib' is unspecified)
#> 
#>   There are binary versions available but the source versions are later:
#>        binary source needs_compilation
#> rlang   1.1.3  1.1.4              TRUE
#> farver  2.1.1  2.1.2              TRUE
#> cli     3.6.2  3.6.3              TRUE
#> 
#> package 'glue' successfully unpacked and MD5 sums checked
#> package 'utf8' successfully unpacked and MD5 sums checked
#> package 'fansi' successfully unpacked and MD5 sums checked
#> package 'dplyr' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\Vinay\AppData\Local\Temp\RtmpAfjApK\downloaded_packages
#> installing the source packages 'rlang', 'farver', 'cli'
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#>   
   
#> 
  
   C:\Users\Vinay\AppData\Local\Temp\RtmpAfjApK\file499449b12fd4>doskey make=mingw32-make.exe 
#> 
  
  
  
   checking for file 'C:\Users\Vinay\AppData\Local\Temp\RtmpAfjApK\remotes49947b965a63\vjoshy-ATQ_Surveillance_Package-0f3cf6f/DESCRIPTION' ...
  
   checking for file 'C:\Users\Vinay\AppData\Local\Temp\RtmpAfjApK\remotes49947b965a63\vjoshy-ATQ_Surveillance_Package-0f3cf6f/DESCRIPTION' ... 
  
✔  checking for file 'C:\Users\Vinay\AppData\Local\Temp\RtmpAfjApK\remotes49947b965a63\vjoshy-ATQ_Surveillance_Package-0f3cf6f/DESCRIPTION'
#> 
  
  
  
─  preparing 'ATQ':
#>    checking DESCRIPTION meta-information ...
  
   checking DESCRIPTION meta-information ... 
  
✔  checking DESCRIPTION meta-information
#> 
  
  
  
─  checking for LF line-endings in source and make files and shell scripts
#> 
  
  
  
─  checking for empty or unneeded directories
#> 
  
   Omitted 'LazyData' from DESCRIPTION
#> 
  
  
  
─  building 'ATQ_0.2.0.tar.gz'
#> 
  
   
#> 
#> Installing package into 'C:/Users/Vinay/AppData/Local/Temp/RtmpWox1d1/Rinst21d8de745'
#> (as 'lib' is unspecified)

library(ATQ)

The following sections will guide you through population simulation, epidemic modeling, and alarm metric calculation using the ATQ package.

Methods

ATQ provides a simulation model that consists of three sequential parts: 1) a population of individuals, 2) annual influenza epidemics, 3) school absenteeism and laboratory confirmed influenza case data. The final part of this section will include alarm metrics evaluation.

Population simulation

To simulate the population of the Wellington-Dufferin-Guelph (WDG) region in Ontario, Canada, the package offers the following functions:

  • catchment_sim: Simulates catchment areas using a default gamma distribution for the number of schools in each area. The dist_func argument allows for specifying other distributions.
  • elementary_pop: Simulates elementary school enrollment and assigns students to catchments using a default gamma distribution. This function requires the output of catchment_sim. The dist_func argument can be modified for other distributions.
  • subpop_children: Simulates households with children using the output of elementary_pop. It requires specifying population proportions such as coupled parents, number of children per household type, and proportion of elementary school-age children. Distributions for parent, child, and age simulations can be specified.
  • subpop_noChildren: Simulates households without children using the outputs of subpop_children and elementary_pop. It requires specifying proportions of household sizes and the overall proportion of households with children.
  • simulate_households: Creates a list containing two simulated populations: households and individuals.

If population proportions are not provided to subpop_children and subpop_noChildren, the functions will prompt the user for input.


# Simulate 16 catchments of 80x80 squares and the number of schools they contain
catchment <- catchment_sim(16, 80, dist_func = stats::rgamma, shape = 4.313, rate = 3.027)

# Simulate population size of elementary schools 
elementary<- elementary_pop(catchment, dist_func = stats::rgamma, shape = 5.274, rate = 0.014)

# Simulate households with children
house_children <- subpop_children(elementary, n = 5,
                                  prop_parent_couple = 0.7668901,
                                  prop_children_couple = c(0.3634045, 0.4329440, 0.2036515),
                                  prop_children_lone = c(0.5857832, 0.3071523, 0.1070645),
                                  prop_elem_age = 0.4976825)

# Simulate households without children using pre-specified proportions
house_noChild <- subpop_noChildren(house_children, elementary,
                                   prop_house_size = c(0.23246269, 0.34281716, 0.16091418, 0.16427239, 0.09953358),
                                   prop_house_Children = 0.4277052)

# Combine household simulations and generate individual-level data
households <- simulate_households(house_children, house_noChild)

Epidemic and Laboratory Confirmed Cases simulation

The package simulates epidemics using a stochastic Susceptible-Infected-Removed (SIR) framework. This approach differs from Vanderkruk et al., who used a spatial and network-based individual-level model.

Simulation Process

  • Initialization: The population is divided into S (Susceptible), I (Infectious), and R (Removed) compartments. Initially, most individuals are susceptible, a few are infectious, and none are removed.
  • Start Date: A random start date for the epidemic is chosen based on specified average and minimum start dates. Time Steps: The simulation proceeds in discrete time steps. For each step:
    1. Transmission Probability (p_inf): Calculated as \(1 - e^{-\alpha {\frac{I[t-1]}{N}}}\), where \(\alpha\) is the transmission rate, \(I[t-1]\) is the number of infectious individuals at the previous time step, and \(N\) is the total population.

    2. New Infections (new_inf): Determined by drawing from a binomial distribution with parameters n (number of susceptible individuals) and p (transmission probability).

    3. Compartment Updates:

      • Susceptible (S): Decreases by new infections.
      • Infectious (I): Increases by new infections, decreases by recoveries/deaths.
      • Removed (R): Increases by recoveries/deaths.
    4. Reported Cases: A subset of new infections is reported based on the reporting rate, with delays added using an exponential distribution to reflect reporting lag.

The summary and plot methods can be used to visualize and summarize the simulated epidemics:


# isolate individuals data
individuals <- households$individual_sim

# simulate epidemics for 10 years, each with a period of 300 days and 32 individuals infected initially
# infection period of 4 days 
epidemic <- ssir(nrow(individuals), T = 300, alpha = 0.298, avg_start = 45, 
                 min_start = 20, inf_period = 4, inf_init = 32, report = 0.02, lag = 7, rep = 10)

# Summarize and plot the epidemic simulation results
summary(epidemic)
#> SSIR Epidemic Summary (Multiple Simulations):
#> Number of simulations: 10 
#> 
#> Average total infected: 25012.3 
#> Average total reported cases: 490.1 
#> Average peak infected: 1888.5 
#> 
#> Model parameters:
#> $N
#> [1] 83350
#> 
#> $T
#> [1] 300
#> 
#> $alpha
#> [1] 0.298
#> 
#> $inf_period
#> [1] 4
#> 
#> $inf_init
#> [1] 32
#> 
#> $report
#> [1] 0.02
#> 
#> $lag
#> [1] 7
#> 
#> $rep
#> [1] 10

plot(epidemic)

Absenteeism simulation

The compile_epi function in this code compiles and processes epidemic data, simulating school absenteeism using epidemic and individual data. It creates a data set for actual cases, absenteeism and laboratory confirmed cases, this data set will also include a “True Alarm Window”, reference dates for each epidemic year and seasonal lag values.

Absenteeism data is simulated as follows: * For each day, the proportion of infected individuals based on new infection over the past few days * Whether each child is absent or not is determined using the logic: * 95% of infected children stay home * 5% of healthy children are absent for reasons other than sickness

The data is aggregated across all schools.


absent_data <- compile_epi(epidemic, individuals)

dplyr::glimpse(absent_data)
#> Rows: 3,000
#> Columns: 28
#> $ Date        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
#> $ ScYr        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ pct_absent  <dbl> 0.05106053, 0.04938854, 0.04847492, 0.05357633, 0.05066380…
#> $ absent      <dbl> 466, 465, 463, 484, 467, 461, 483, 440, 478, 437, 473, 458…
#> $ absent_sick <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ new_inf     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ lab_conf    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Case        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ sinterm     <dbl> 0.01720158, 0.03439806, 0.05158437, 0.06875541, 0.08590610…
#> $ costerm     <dbl> 0.9998520, 0.9994082, 0.9986686, 0.9976335, 0.9963032, 0.9…
#> $ window      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ ref_date    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ lag0        <dbl> 0.05106053, 0.04938854, 0.04847492, 0.05357633, 0.05066380…
#> $ lag1        <dbl> NA, 0.05106053, 0.04938854, 0.04847492, 0.05357633, 0.0506…
#> $ lag2        <dbl> NA, NA, 0.05106053, 0.04938854, 0.04847492, 0.05357633, 0.…
#> $ lag3        <dbl> NA, NA, NA, 0.05106053, 0.04938854, 0.04847492, 0.05357633…
#> $ lag4        <dbl> NA, NA, NA, NA, 0.05106053, 0.04938854, 0.04847492, 0.0535…
#> $ lag5        <dbl> NA, NA, NA, NA, NA, 0.05106053, 0.04938854, 0.04847492, 0.…
#> $ lag6        <dbl> NA, NA, NA, NA, NA, NA, 0.05106053, 0.04938854, 0.04847492…
#> $ lag7        <dbl> NA, NA, NA, NA, NA, NA, NA, 0.05106053, 0.04938854, 0.0484…
#> $ lag8        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 0.05106053, 0.04938854, 0.…
#> $ lag9        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05106053, 0.04938854…
#> $ lag10       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05106053, 0.0493…
#> $ lag11       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05106053, 0.…
#> $ lag12       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.05106053…
#> $ lag13       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0510…
#> $ lag14       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.…
#> $ lag15       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Alarm Metrics Evaluation

The eval_metrics function assesses the performance of epidemic alarm systems across various lags and thresholds using school absenteeism data. It evaluates the following key metrics: * False Alarm Rate (FAR): Proportion of alarms raised outside the true alarm window. * Added Days Delayed (ADD): Measures how many days after the optimal alarm day the first true alarm was raised. * Average Alarm Time Quality (AATQ): Mean quality of all alarms raised, considering their timing relative to the optimal alarm day. * First Alarm Time Quality (FATQ): Quality of the first alarm raised, based on its timing. * Weighted versions (WAATQ, WFATQ): Apply year-specific weights to AATQ and FATQ.

A logistic regression model with lagged absenteeism and fixed seasonal terms given by: $ ({tj} = 0 + 1x{tj} + 2x{(t-1)j} + … + {l+1}x{t-l}j + + _{l+3}cos_j())$

where \(t\) is time, \(j\) is the school year, \(l\) is the lag, and \(T^*\) is the period for seasonal terms.

eval_metrics also identifies the best model parameters (lag & threshold) for each metric. The output is a list with three main components: * metrics: An object containing: * matrices of each metric (FAR, ADD, AATQ, FATQ, WAATQ, WFATQ) for all lag and threshold combinations. * Best models according to each metric, including lag and threshold values. * plot_data: plot object to visualize epidemic data and the best model for each metric * `results``: provides summary statistics

In the example provided, alarms are calculated for school years 2 to 10, considering lags up to 15 days and threshold values ranging from 0.1 to 0.6 in 0.05 increments. Year weights are assigned proportionally to the school year number.

# Evaluate alarm metrics for epidemic detection
# lag of 15
alarms <- eval_metrics(absent_data, ScYr = 2:10, maxlag = 15, thres = seq(0.1,0.6,by = 0.05), 
                      yr.weights = c(2:10)/sum(c(2:10)))

summary(alarms$results)
#> Alarm Metrics Summary
#> =====================
#> 
#> FAR :
#>   Mean: 0.5795 
#>   Variance: 0.006 
#>   Best lag: 1 
#>   Best threshold: 0.35 
#>   Best value: 0.3175 
#> 
#> ADD :
#>   Mean: 20.8525 
#>   Variance: 20.6562 
#>   Best lag: 1 
#>   Best threshold: 0.1 
#>   Best value: 9.1111 
#> 
#> AATQ :
#>   Mean: 0.5706 
#>   Variance: 0.0134 
#>   Best lag: 1 
#>   Best threshold: 0.3 
#>   Best value: 0.2937 
#> 
#> FATQ :
#>   Mean: 0.5762 
#>   Variance: 0.0099 
#>   Best lag: 1 
#>   Best threshold: 0.35 
#>   Best value: 0.2972 
#> 
#> WAATQ :
#>   Mean: 0.5304 
#>   Variance: 0.022 
#>   Best lag: 1 
#>   Best threshold: 0.3 
#>   Best value: 0.1893 
#> 
#> WFATQ :
#>   Mean: 0.5432 
#>   Variance: 0.0163 
#>   Best lag: 1 
#>   Best threshold: 0.35 
#>   Best value: 0.1943 
#> 
#> Reference Dates:
#>    epidemic_years ref_dates
#> 1               1        56
#> 2               2        34
#> 3               3        52
#> 4               4        23
#> 5               5        28
#> 6               6        47
#> 7               7        77
#> 8               8        47
#> 9               9        35
#> 10             10        52
#> 
#> Best Prediction Dates:
#> FAR :
#>  [1] NA NA 45 NA 28 41 51 41 32 46
#> 
#> ADD :
#>  [1] NA NA 30 NA 24 14 20 33  6 15
#> 
#> AATQ :
#>  [1] NA NA 34 NA 28 37 50 41 32 46
#> 
#> FATQ :
#>  [1] NA NA 45 NA 28 41 51 41 32 46
#> 
#> WAATQ :
#>  [1] NA NA 34 NA 28 37 50 41 32 46
#> 
#> WFATQ :
#>  [1] NA NA 45 NA 28 41 51 41 32 46

# Plot various alarm metrics values
plot(alarms$metrics, "FAR")    # False Alert Rate

plot(alarms$metrics, "ADD")    # Accumulated Days Delayed

plot(alarms$metrics, "FATQ")   # First Alert Time Quality

plot(alarms$metrics, "AATQ")   # Average ATQ

plot(alarms$metrics, "WFATQ")  # Weighted FATQ

plot(alarms$metrics, "WAATQ")  # Weighted Average ATQ


# visualization of epidemics with alarms raised.
alarm_plots <- plot(alarms$plot_data)
for(i in seq_along(alarm_plots)) { 
  print(alarm_plots[[i]]) 
}

#> Warning: Removed 23 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 18 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 90 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 7 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 7 rows containing missing values or values outside the scale range
#> (`geom_col()`).

#> Warning: Removed 19 rows containing missing values or values outside the scale range
#> (`geom_col()`).

References

Vanderkruk, K.R., Deeth, L.E., Feng, Z. et al. ATQ: alert time quality, an evaluation metric for assessing timely epidemic detection models within a school absenteeism-based surveillance system. BMC Public Health 23, 850 (2023). https://doi.org/10.1186/s12889-023-15747-z