| Title: | A Comprehensive Tool for Early Warning in Infectious Disease |
|---|---|
| Description: | Infectious disease surveillance requires early outbreak detection. This package provides statistical tools for analyzing time-series monitoring data through three core methods: a) EWMA (Exponentially Weighted Moving Average) b) Modified-CUSUM (Modified Cumulative Sum) c) Adjusted-Serfling models Methodologies are based on: - Wang et al. (2010) <doi:10.1016/j.jbi.2009.08.003> - Wang et al. (2015) <doi:10.1371/journal.pone.0119923> Designed for epidemiologists and public health researchers working with disease surveillance systems. |
| Authors: | Xiaoli Wang [aut], Mingyue Pan [aut, cre] |
| Maintainer: | Mingyue Pan <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.0 |
| Built: | 2026-05-24 09:46:06 UTC |
| Source: | https://github.com/pan-mingyue/warnepi |
Adjusted Serfling regression for periodic disease surveillance, automating epidemic baseline estimation through iterative threshold optimization. Enhances traditional Serfling models by objectively determining epidemic periods and improving peak detection accuracy.
aSerfling(data, col_name, cycles)aSerfling(data, col_name, cycles)
data |
A data frame containing the warning indicator columns, arranged in time-based order. |
col_name |
A column name for the warning indicator (character). |
cycles |
A numeric vector of disease cycles (e.g., c(52,26) for weekly annual + semi-annual patterns) |
Implements an iterative periodic regression for time series with at least 2 full cycles. Key features:
Dynamic Epidemic Filtering:
Automatically excludes outbreak points via iterative prediction-CI comparison
Terminates when adjusted R-squared stabilizes (maximized model fit)
Flexible Seasonality Modeling:
Supports multiple cycles via cycles parameter (e.g., c(52,26) for weekly annual + semi-annual patterns)
Self-adapts to pathogen seasonality shifts
Peak-Centric Alerting:
Flags peaks via optimized threshold (final model's 95% CI upper bound)
Avoids subjective epidemic-onset definitions
A list containing:
output: Full dataset with warning flags (1=alert, 0=normal)
best_fit: Final lm model object
fit_times: Iteration count for convergence
cycles: Input cycle parameters
Wang X, Wu S, MacIntyre CR, et al. Using an adjusted Serfling regression model to improve the early warning at the arrival of peak timing of influenza in Beijing. PLoS One, 2015,10(3):e0119923.
## modeling data(sample_ili) sf <- aSerfling(data = sample_ili, 'case', cycles = c(52, 26)) sf ## visualize alerts output <- sf$output plot(output$date, output$case, type = "l") points(output$date[output$warning == 1], output$case[output$warning == 1], col = "red")## modeling data(sample_ili) sf <- aSerfling(data = sample_ili, 'case', cycles = c(52, 26)) sf ## visualize alerts output <- sf$output plot(output$date, output$case, type = "l") points(output$date[output$warning == 1], output$case[output$warning == 1], col = "red")
Projects an existing Serfling model onto new temporally contiguous data to detect epidemic signals. Requires test data to immediately follow training data chronologically to maintain periodicity.
aSerfling_predict(sf, df_test)aSerfling_predict(sf, df_test)
sf |
Model object from |
df_test |
New data frame with identical structure to training data, containing subsequent time points. Must include the response variable column used in original modeling. |
This function extends the surveillance capability of an established aSerfling model by:
Automatically generating time indices continuing from the training set
Preserving all terms from the original model fit
Calculating prediction intervals using the trained coefficients
Flagging values exceeding the 95% upper prediction bound as warnings
Critical requirements:
Test data must maintain the same time resolution (weekly/monthly) as training data
The first test observation must be the immediate next time point after the last training observation
Column names and cycle parameters must match the original model specification
A data frame containing warning results. The value of the warning column is 1 for warning and 0 for no warning.
Wang X, Wu S, MacIntyre CR, et al. Using an adjusted Serfling regression model to improve the early warning at the arrival of peak timing of influenza in Beijing. PLoS One, 2015,10(3):e0119923.
data(sample_ili) ## Split into sequential training/test sets df_train <- sample_ili[1:150,] df_test <- sample_ili[151:200,] ## modeling sf <- aSerfling(df_train, 'case', cycles = c(52, 26)) ## apply the model to test set pre <- aSerfling_predict(sf, df_test) ## visualize alerts plot(pre$date, pre$case, type = "l") points(pre$date[pre$warning == 1], pre$case[pre$warning == 1], col = "red")data(sample_ili) ## Split into sequential training/test sets df_train <- sample_ili[1:150,] df_test <- sample_ili[151:200,] ## modeling sf <- aSerfling(df_train, 'case', cycles = c(52, 26)) ## apply the model to test set pre <- aSerfling_predict(sf, df_test) ## visualize alerts plot(pre$date, pre$case, type = "l") points(pre$date[pre$warning == 1], pre$case[pre$warning == 1], col = "red")
Detects anomalies in infectious disease surveillance data using an Exponentially Weighted Moving Average (EWMA) algorithm. Designed for time series data, it flags potential outbreaks by smoothing past observations with decayed weights and comparing against control thresholds.
EWMA(data, column, lambda = 0.5, k = 3, move_t, ignore_t = 2)EWMA(data, column, lambda = 0.5, k = 3, move_t, ignore_t = 2)
data |
A data frame containing the warning indicator columns, arranged in time-based order. |
column |
A column name or column number, used to specify the warning indicator. |
lambda |
The weight factor |
k |
The standard deviation coefficient |
move_t |
The moving period |
ignore_t |
The number of nearest time units to be ignored by the model, |
Let be an observed time series of disease case counts,
where represents the aggregated counts at time (e.g., daily, weekly, or monthly observations).
We assume for the underlying distribution.
The EWMA (Exponentially Weighted Moving Average) model is defined as:
where:
: The EWMA statistic at time , representing an exponentially weighted average of current and past observations.
: Weight factor (), higher values prioritize recent observations
: Standard deviation coefficient (typically 2-3)
: Upper Control Limit at time , forming a dynamic threshold for anomaly detection.
: Estimated from moving window
An alarm is triggered when , with the alarm set defined as:
A data frame containing warning results. The value of the warning column is 1 for warning and 0 for no warning.
Wang X, Zeng D, Seale H, et al. Comparing early outbreak detection algorithms based on their optimized parameter values. J Biomed Inform, 2010,43(1):97-103.
## simulate reported cases set.seed(123) cases <- c(round(rnorm(10, 10, 1)), seq(12,21,3), seq(15,5,-5)) dates <- seq(as.Date("2025-01-01"), by = "7 days", length.out = length(cases)) data_frame <- data.frame(date = dates, case = cases) ## modeling output <- EWMA(data_frame,'case',lambda = 0.5, k = 3, move_t = 4, ignore_t = 2) output ## visualize alerts plot(output$date, output$case, type = "l") points(output$date[output$warning == 1], output$case[output$warning == 1], col = "red")## simulate reported cases set.seed(123) cases <- c(round(rnorm(10, 10, 1)), seq(12,21,3), seq(15,5,-5)) dates <- seq(as.Date("2025-01-01"), by = "7 days", length.out = length(cases)) data_frame <- data.frame(date = dates, case = cases) ## modeling output <- EWMA(data_frame,'case',lambda = 0.5, k = 3, move_t = 4, ignore_t = 2) output ## visualize alerts plot(output$date, output$case, type = "l") points(output$date[output$warning == 1], output$case[output$warning == 1], col = "red")
Modified CUSUM method for outbreak detection in infectious disease surveillance data. Implements three variants (C1', C2', C3') with dynamic thresholds for time series analysis.
mCUSUM(data, column, k = 1, h = 2, move_t)mCUSUM(data, column, k = 1, h = 2, move_t)
data |
A data frame containing the warning indicator columns, arranged in time-based order. |
column |
A column name or column number, used to specify the warning indicator. |
k |
The standard deviation coefficient |
h |
The threshold coefficient |
move_t |
The moving period |
Let be an observed time series of disease case counts,
where represents the aggregated counts at time (e.g., daily, weekly, or monthly observations).
We assume for the underlying distribution.
The modified CUSUM models accumulate excess cases beyond control limits:
where:
: Standard deviation coefficient (typical range 0.5–1.5), adjusts sensitivity to deviations
: Threshold coefficient (typical range 2–5), controls alarm stringency
: Threshold
Model specifications:
C1': Baseline estimated from
C2': Baseline estimated from to avoid recent outbreaks
C3': 3-day cumulative sum of C2' values
Alarms trigger when for each model (x = 1,2,3)
A data frame containing C1', C2' and C3' warning results. The value of the warning column is 1 for warning and 0 for no warning.
Wang X, Zeng D, Seale H, et al. Comparing early outbreak detection algorithms based on their optimized parameter values. J Biomed Inform, 2010,43(1):97-103.
## simulate reported cases set.seed(123) cases <- c(round(rnorm(10, 10, 1)), seq(12,21,3), seq(15,5,-5)) dates <- seq(as.Date("2025-01-01"), by = "7 days", length.out = length(cases)) data_frame <- data.frame(date = dates, case = cases) ## modeling output <- mCUSUM(data_frame, 'case', k = 1, h = 2.5, move_t = 4) output ## visualize alerts ### C1' plot(output$date, output$case, type = "l") points(output$date[output$C1_prime_warning == 1], output$case[output$C1_prime_warning == 1], col = "red") ### C2' plot(output$date, output$case, type = "l") points(output$date[output$C2_prime_warning == 1], output$case[output$C2_prime_warning == 1], col = "red") ### C3' plot(output$date, output$case, type = "l") points(output$date[output$C3_prime_warning == 1], output$case[output$C3_prime_warning == 1], col = "red")## simulate reported cases set.seed(123) cases <- c(round(rnorm(10, 10, 1)), seq(12,21,3), seq(15,5,-5)) dates <- seq(as.Date("2025-01-01"), by = "7 days", length.out = length(cases)) data_frame <- data.frame(date = dates, case = cases) ## modeling output <- mCUSUM(data_frame, 'case', k = 1, h = 2.5, move_t = 4) output ## visualize alerts ### C1' plot(output$date, output$case, type = "l") points(output$date[output$C1_prime_warning == 1], output$case[output$C1_prime_warning == 1], col = "red") ### C2' plot(output$date, output$case, type = "l") points(output$date[output$C2_prime_warning == 1], output$case[output$C2_prime_warning == 1], col = "red") ### C3' plot(output$date, output$case, type = "l") points(output$date[output$C3_prime_warning == 1], output$case[output$C3_prime_warning == 1], col = "red")
A dataset containing 200 weeks of simulated influenza-like illness case counts.
data(sample_ili)data(sample_ili)
A data frame with 200 rows and 2 variables:
date: Date of observation (weekly)
case: Integer count of reported cases