--- title: "Automated Detection of Seasonal Epidemic Onset and Burden Levels in R" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Automated Detection of Seasonal Epidemic Onset and Burden Levels in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, warning=FALSE, message=FALSE} library(aedseo) ``` ## Introduction The `aedseo` package performs automated and early detection of seasonal epidemic onsets and estimates the breakpoints for burden levels from time series data stratified by season. The seasonal onset (`seasonal_onset()`) estimates growth rates for consecutive time intervals and calculates the sum of cases. The burden levels (`seasonal_burden_levels()`) use the previous seasons to estimate the burden levels of the current season. The algorithm allows for surveillance of pathogens, by alarming when the observations have significant growth in the selected time interval and based on the disease_specific threshold, while also evaluating the burden of current observations based on previous seasons. ### Seasonal data To apply the `aedseo` algorithm, data needs to be transformed into a `tsd` object. If you have your own data, the `to_time_series()` function can be used with the arguments: `observation`, `time`, `time_interval`. In the following section, the application of the algorithm is shown with simulated data created with the `generate_seasonal_data()`function. More information about the function can be found in the `vignette("generate_seasonal_wave")` ```{r, include = FALSE} withr::local_seed(222) # Construct an 'tsd' object with time series data tsd_data <- generate_seasonal_data( years = 3, start_date = as.Date("2021-10-18"), amplitude = 2000, mean = 2000, phase = 0, trend_rate = 1.001, noise_sd = 100, time_interval = "week" ) ``` In the following figure simulated data (solid circles) are visualised as individual observations. The solid line connects these points, representing the underlying mean trend over three years of weekly data. ```{r, dpi=300} plot(tsd_data) ``` ### Determining season Respiratory viruses can circulate in different seasons based on the location. In the nordic hemisphere they mostly circulate in the fall and winter seasons, hence surveillance is intensified from week 40 to week 20 in the following year. To include all data, the season in the example is set from week 21 to week 20 in the following year. ### Determining the disease specific threshold When observations are low there is a risk that randomness will result in significant growth estimates in isolated periods. To increase the robustness of the method a disease_specific threshold is introduced. It should be set such that subsequent estimates of growth are likely to be significant as well. The disease_specific threshold can be determined by examining continuous periods with sustained significant growth, and determine at what number of observations these events occur. In this example the disease_specific threshold is determined based on consecutive significant observations from all available previous seasons. Significant observations are defined as those with a significant positive growth rate. To capturing short-term changes and fluctuations in the data, a rolling window of size $k = 5$ is used to create subsets of the data for model fitting, and the `quasipoisson` family is used to account for overdispersion. The `seasonal_onset()` function can be used for this purpose, without providing the disease-specific threshold. ```{r, dpi=300} growth_warning_algo <- seasonal_onset( tsd = tsd_data, k = 5, family = "quasipoisson", na_fraction_allowed = 0.4, season_start = 21, # Season starts in week 21 season_end = 20, # Season ends in week 20 the following year only_current_season = FALSE ) significant_vs_obs <- growth_warning_algo |> dplyr::mutate( Counter = cumsum(growth_warning == TRUE & !is.na(growth_warning)) * growth_warning ) |> dplyr::mutate(Counter = dplyr::if_else(growth_warning, Counter, NA)) |> # Identify where NA occurs in the Counter column dplyr::mutate( ChangeFlag = is.na(Counter), GroupID = cumsum(ChangeFlag) # Make a Group for each NA ) |> dplyr::ungroup() |> dplyr::group_by(GroupID) |> dplyr::mutate( SequentialCounter = dplyr::if_else( growth_warning == TRUE, # Increment of numbers for Significant == TRUE rev(cumsum(rev(!is.na(Counter)))), NA # Reverse numbers for visibility on plot ) ) |> dplyr::ungroup() significant_vs_obs$season <- factor( significant_vs_obs$season, levels = sort(unique(significant_vs_obs$season), decreasing = FALSE, method = "auto") ) significant_vs_obs |> dplyr::filter(!is.na(SequentialCounter)) |> ggplot2::ggplot(mapping = ggplot2::aes(x = sum_of_cases)) + ggplot2::geom_line( mapping = ggplot2::aes( y = SequentialCounter, group = GroupID, linetype = "Consecutive Weeks", color = season ) ) + ggplot2::scale_color_manual( values = c("#ebac23", "#006e00", "#008cf9", "#b80058"), breaks = unique(significant_vs_obs$season), labels = unique(significant_vs_obs$season) ) + ggplot2::scale_x_log10( breaks = scales::log_breaks(base = 10, n = 10), labels = scales::label_comma() ) + ggplot2::labs( y = "Number of subsequent significant observations", linetype = "", x = "Rolling sum of positive cases (5 weeks)" ) + ggplot2::guides(linetype = "none") + ggplot2::theme_bw() + ggplot2::theme( legend.text = ggplot2::element_text(size = 11, color = "black"), axis.text = ggplot2::element_text(size = 9, color = "black"), axis.title.x = ggplot2::element_text(size = 11, color = "black"), axis.title.y = ggplot2::element_text(size = 11, color = "black") ) ``` To declare a seasonal onset, the number of cases must exceed a threshold within the five week window. This threshold is determined by dividing the sum of cases at the onset of continuous significant observations by five, resulting in a disease_specific threshold for each time step. From the analysis plot above, it is observed that the significant growth rate begins at approximately 700 observations. Therefore, the disease_specific threshold is calculated as: $\frac{700}{5} = 140$. In other words, a five week average observation count surpassing 140 alongside with a significant positive growth rate defines the onset of the season. Inspect the exact conditions around each detected season start ```{r} significant_vs_obs |> dplyr::filter(!is.na(SequentialCounter)) |> dplyr::group_by(season) |> dplyr::filter(SequentialCounter == max(SequentialCounter)) |> dplyr::mutate(disease_threshold = sum_of_cases / 5, week = ISOweek::ISOweek(reference_time)) |> dplyr::select(season, week, disease_threshold) ``` By inspecting the output from the above code, we observe that the first season initiates significantly later compared to the subsequent seasons. This delayed start was also evident in the plot, hence we determine the disease_specific threshold based on the complete seasons spanning from *2022/2023* to *2024/2025*. As a result, the disease_specific threshold is established at `140`. ## Applying the main algorithm The primary function of the `aedseo` package is the `combined_seasonal_output()` which integrates the `seasonal_onset()` and `seasonal_burden_levels()` functions to deliver a comprehensive seasonal analysis. Detailed information about each function and their respective arguments can be found in the `vignette("seasonal_onset")` and `vignette("burden_levels")`. ```{r} seasonal_output <- combined_seasonal_output( tsd = tsd_data, disease_threshold = 140, method = "intensity_levels", family = "quasipoisson" ) ``` The default function estimates onset and burden levels for the current season. If it is desired to see calculations for all previous seasons, the `only_current_season` argument should be set to `FALSE`. *Note: *Burden levels can not be estimated for the first season and needs at least two seasons of data as the estimations are based on data from previous seasons.\\ The `aedseo` package implements S3 methods including the `plot()`, `predict()` and `summary()` functions specifically designed for objects of the `aedseo` package. `predict()` is only relevant for `tsd_onset` objects. An example of using the `summary()` S3 method with `tsd_onset` and `tsd_burden_level` objects is shown here. Seasonal onset output can be extracted by: ```{r} summary(seasonal_output$onset_output) ``` Seasonal burden output can be extracted by: ```{r} summary(seasonal_output$burden_output) ``` ### Plot the comprehensive seasonal analysis The `plot()` S3 method for `tsd_combined_seasonal_output` objects allows you to get a complete visualisation of the `combined_seasonal_output()` analysis of the current season. ```{r, dpi=300} # Adjust y_lower_bound dynamically to remove noisy small values disease_threshold <- 140 y_lower_bound <- ifelse(disease_threshold < 10, 1, 5) plot( x = seasonal_output, y_lower_bound = y_lower_bound, time_interval = "3 weeks" ) ``` Using the `intensity_levels` method to define burden levels, the seasonal onset is likely to fall within the `low` or `medium` category. This is because the `very low` breakpoint is the disease-specific threshold, and season onset is only identified if the five-week average of the observations exceed this threshold along with a significant positive growth rate.