To provide a concise overview of how the
seasonal_burden_levels()
algorithm operates, we utilize the
same example data presented in the vignette("aedseo")
. The
plot below illustrates the two methods available for estiming burden
levels in the combined_seasonal_output()
function:
intensity_levels
: Intended for
within-season classification of observations.peak_levels
: Intended for comparing
the height of peaks between seasons.The disease-specific threshold is the very low
breakpoint for both methods. Breakpoints are named as the upper bounds
for the burden levels, and are visualised on following plot.
The methodology used to define the burden levels of seasonal epidemics is based on observations from previous seasons. Historical data from all available seasons is used to establish the levels for the current season. This is done by:
n
highest (peak) observations from each
season.n
peak
observations. The selected distribution with the fitted parameters is
used to calculate percentiles to be used as breakpoints.peak_levels
which models the height of the seasonal
peaks. Using the log-normal distribution without weights is similar to
the default in mem.intensity_levels
which models the within season levels.
The highest breakpoint is identical with the peak_levels
method. Intermediate breakpoints are evenly distributed on a logaritmic
scale, between the very low
and high
breakpoints, to give the same relative difference between the
breakpoints.The model is implemented in the seasonal_burden_levels()
function of the aedseo
package. In the following sections
we will describe the arguments for the function and how the model is
build.
The n_peak
argument defines the number of highest
observations to be included from each season. The default of
n_peak
is 6
- corresponding with the
mem
defaults of using 30 observations across the latest
five seasons.
The decay_factor
argument is implemented to give more
weight to recent seasons, as they are often more indicative of current
and future trends. As time progresses, the relevance of older seasons
may decrease due to changes in factors like testing recommendations,
population immunity, virus mutations, or intervention strategies.
Weighting older seasons less reflects this reduced relevance. From
time-series analysis, $\frac{1}{1-\text{decay_factor}}$ is often
used as an approximate “effective memory”. Hence, with the default
decay_factor
= 0.8 the effective memory is five seasons.
(See mentioned by Hyndman
& Athanasopoulos for an introduction to simple exponential
smoothing) The default decay_factor
allows the model to be
responsive to recent changes without being overly sensitive to
short-term fluctuations. The optimal decay_factor
can vary
depending on the variability and trends within the data. For datasets
where seasonal patterns are highly stable, a higher
decay_factor
(i.e. longer memory) may be appropriate.
Conversely, if the data exhibit dramatic shifts from one season to the
next, a lower decay_factor
may improve predictions.
The family
argument is used to select which distribution
the n_peak
observations should be fitted to, users can
choose between lnorm
, weibull
and
exp
distributions. The log-normal distribution
theoretically aligns well with the nature of epidemic data, which often
exhibits multiplicative growth patterns. In our optimisation process, we
evaluated the distributions to determine their performance in fitting
Danish non-sentinel cases and hospitalisation data for RSV, SARS-CoV-2
and Influenza (A and B). All three distributions had comparable weighted
likelihood values during optimisation, hence we did not see any
statistical significant difference in their performance.
The model uses the fit_percentiles()
function which
employs the stats::optim
for estimating the parameters that
maximizes the weighted likelihood. The optim_method
argument can be passed to seasonal_burden_levels()
, default
is Nelder-Mead
but other methods can be selected, see
?fit_percentiles
.
The method
argument is used to select one of the two
methods intensity_levels
(default) and
peak_levels
. Both methods return percentile(s) from the
fitted distribution which are used to define the breakpoins for the
burden levels. Breakpoints are named very low
,
low
, medium
and high
and define
the upper bound of the corresponding burden level.
intensity_levels
takes one percentile as argument,
representing the highest breakpoint. The default is set at a 95%
percentile. The disease-specific threshold determines the
very low
breakpoint. The low
and
medium
breakpoints are calculated to give identical
relative increases between the very low
and
high
breakpoints.
peak_levels
takes three percentiles as argument,
representing the low
, medium
and
high
breakpoints. The default percentiles are 40%, 90%, and
97.5% to align with the parameters used in mem
. The
disease-specific threshold defines the very low
breakpoint.
seasonal_burden_levels()
algorithmThe same data is created with following combinations:
These combinations are selected as it is realistic for real world data to have noise, and differentiation between trend can occur declining or inclining between seasons. Breakpoints for season 2024/2025 are calculated based on the three previous seasons.
To apply the seasonal_burden_levels()
algorithm, data
needs to be transformed into a tsd
object. The
disease-specific threshold is determined for all data combinations with
use of the method described in vignette("aedseo")
. The
disease-specific threshold should be revised before a new season starts,
especially if the data has a trend.
The seasonal_burden_levels()
function provides a
tsd_burden_level
object, which can be used in the
summary()
S3 method. This provides a concise summary of
your comprehensive seasonal burden level analysis, including breakpoints
for the current season.
intensity_levels_n <- seasonal_burden_levels(
tsd = tsd_data_noise,
disease_threshold = 10,
method = "intensity_levels",
conf_levels = 0.975
)
summary(intensity_levels_n)
#> Summary of tsd_burden_levels object
#>
#> Breakpoint estimates:
#> very low : 10.000000
#> low: 52.439546
#> medium: 274.990603
#> high: 1442.038250
#>
#> The season for the burden levels:
#> 2025/2026
#>
#> Model settings:
#> Disease specific threshold: 10
#> Called using distributional family: lnorm
mem
uses the n
highest observations from
each previous epidemic period to fit the parameters of the distribution,
where n = 30/seasons
. The data has four previous seasons,
to align with mem, we use n_peak = 8
peak_levels_n <- seasonal_burden_levels(
tsd = tsd_data_noise,
disease_threshold = 10,
method = "peak_levels",
conf_levels = c(0.4, 0.9, 0.975),
n_peak = 8
)
summary(peak_levels_n)
#> Summary of tsd_burden_levels object
#>
#> Breakpoint estimates:
#> very low : 10.000000
#> low: 1215.540917
#> medium: 1356.324373
#> high: 1423.638454
#>
#> The season for the burden levels:
#> 2025/2026
#>
#> Model settings:
#> Disease specific threshold: 10
#> Called using distributional family: lnorm
mem is run with default arguments.
# Remove current season such as previous seasons predict for newest season
previous_seasons <- tsd_data_all |>
dplyr::mutate(season = epi_calendar(time)) |>
dplyr::filter(season != "2024/2025") |>
dplyr::select(-season)
# Run mem algorithm
mem_thresholds <- previous_seasons |>
dplyr::group_by(Data) |>
dplyr::group_modify(~ {
mem_data <- .x |>
dplyr::mutate(season = aedseo::epi_calendar(time),
week = lubridate::isoweek(time)) |>
dplyr::select(-time) |>
tidyr::pivot_wider(names_from = season, values_from = observation) |>
dplyr::select(-week)
# Run mem
mem_result <- mem::memmodel(mem_data)
# Extract thresholds
mem_thresholds <- tibble::tibble(
`epidemic threshold \n (mem)` = mem_result$epidemic.thresholds[1],
`medium` = mem_result$intensity.thresholds[1],
`high` = mem_result$intensity.thresholds[2],
`very high` = mem_result$intensity.thresholds[3]
)
})
In the plots:
Upon examining all methods and data combinations, it becomes clear
that the intensity_levels
approach establishes levels
covering the entire set of observations from previous seasons. In
contrast, the peak_levels
and mem
methods
define levels solely based on the highest observations within each
season, and are thus only relevant for comparing the height of peaks
between seasons.
The highest observations for the 2024/2025 season for each data set are:
Data | Observation |
---|---|
Noise | 1471 |
Noise and positive trend | 2065 |
Noise and negative trend | 535 |
In relation to these highest observations and upon further examination, we observe the following:
Plots with Noise and Noise with Positive Trend:
peak_levels
and mem
estimate very
high breakpoints. This occurs because observations remain consistently
elevated across all three seasons, causing these methods to overlook the
remaining observations.Data with Noise and Positive Trend:
Data with Noise and Negative Trend:
As observations exponentially decrease between seasons (with the
highest observation this season being 781), we expect the breakpoints to
be the lowest of these examples. This expectation is met across all
three methods. However, the weighting of seasons in
intensity_levels
and peak_levels
leads to
older seasons having less impact on the breakpoints, as we progress
forward in time. On the other hand, mem
includes all high
observations from the previous 10 seasons without diminishing the
importance of older seasons, which results in sustained very high
breakpoints.
Notably, in the mem
method, the epidemic threshold
is positioned above the medium
burden level. This means
that the epidemic period begins only when observations reach the height
of the seasonal peaks observed in previous seasons.
In conclusion, the peak_levels
and mem
methods allows us to compare the height of peaks between seasons,
whereas the intensity_levels
method supports continuous
monitoring of observations in the current season.