Introduction
Dengue fever is one of the most significant mosquito-borne viral infections worldwide, with an estimated 390 million infections occurring annually, of which nearly 96 million manifest clinically [
1]. Over the past 2 decades, the global incidence of dengue has increased markedly, driven by rapid urbanization, globalization, climate variability, and the expanding geographic range of
Aedes aegypti and
Aedes albopictus mosquitoes [
2,
3]. The World Health Organization now regards dengue as one of the fastest-spreading arboviral diseases, with more than 100 countries reporting endemic transmission [
4].
In Malaysia, dengue continues to pose a substantial public health burden, with recurrent epidemics occurring despite longstanding control efforts. Between 2015 and 2023, the country recorded hundreds of thousands of cases annually, and dengue remains a leading cause of morbidity and hospitalization, particularly in highly urbanized states such as Selangor, Johor, and the federal territories [
5]. Multiple factors contribute to sustained dengue transmission, including unplanned urban growth, human mobility, rainfall-driven vector proliferation, and cyclical variations in population immunity [
6,
7]. The pronounced spatial and temporal heterogeneity of dengue across Malaysia underscores the importance of surveillance systems capable of capturing state-level dynamics, rather than relying solely on national aggregates.
Time-series methods have been widely applied in dengue research to characterize epidemic cycles, quantify seasonality, and forecast future outbreaks. Traditional approaches, including autoregressive integrated moving average (ARIMA) and seasonal ARIMA (SARIMA) models, remain among the most commonly used due to their interpretability and robustness for short-term forecasting [
8]. For example, ARIMA/SARIMA models have been successfully applied in endemic settings such as Singapore, Thailand, and Brazil to predict dengue incidence with reasonable accuracy [
9,
10]. In recent years, hybrid approaches and machine learning models have increasingly been explored to enhance predictive performance [
11,
12]. However, despite their promise, many such studies focus on national-level summaries or single high-burden districts. This emphasis on aggregated data risks obscuring important state-level differences, where local climatic conditions, demographic structures, and vector ecology may shape dengue transmission in distinct ways [
13].
To better understand these localized variations, decomposition approaches provide a valuable complement to classical time-series models. Seasonal-trend decomposition using LOESS (STL) is particularly useful because it separates observed data into trend, seasonal, and irregular components, thereby clarifying the relative contributions of long-term changes and recurrent cycles [
14]. When combined with ARIMA/SARIMA modeling, STL enables a more comprehensive assessment of dengue dynamics by supporting both descriptive interpretation of temporal structure and predictive modeling of case counts.
This study therefore applied STL decomposition alongside ARIMA and SARIMA models to weekly dengue case data from Malaysia spanning 2022 to 2024. By examining both national- and state-level time series, the analysis seeks to capture heterogeneity in temporal patterns and to provide evidence supporting localized forecasting and vector control strategies. Such insights are essential for strengthening early warning systems and optimizing resource allocation in Malaysia’s ongoing efforts to combat dengue.
Materials and Methods
Study Design and Data Sources
This cross-sectional study utilized weekly dengue case counts for each state in Malaysia from January 1, 2022 to December 31, 2024, obtained from the Dengue Case Registry maintained by the Ministry of Health Malaysia. This registry systematically compiles all laboratory- and clinically confirmed dengue cases, providing a reliable data source for monitoring disease burden at the state level. Prior to analysis, the data were cleaned to remove duplicate records, correct inconsistencies, and assess the presence of missing values. Weekly case counts were then aggregated to generate a continuous time series for each state.
Study Area
This study was conducted in Malaysia, a country located in Southeast Asia with an approximate land area of 329,847 km². Malaysia comprises 13 states and 3 federal territories, which are further subdivided into a total of 160 administrative districts. Geographically, the nation is divided into Peninsular Malaysia, which borders Thailand, and East Malaysia, located on the island of Borneo. The country typically experiences a tropical rainforest climate, with environmental conditions influenced by 2 seasonal monsoons [
15].
Data Analysis
The temporal structure of dengue cases was first examined using STL, which decomposes a time series into 3 additive components:
Yt = Tt + St+ Rt,
where
Yt represents the observed number of dengue cases at week t,
Tt reflects long-term changes,
St denotes the seasonal component representing recurrent cycles in transmission (e.g., monsoon-driven peaks), and
Rt is the irregular component capturing short-term fluctuations not explained by trend or seasonality [
14]. STL decomposition was performed using the STL() function from the
feasts package in R software [
16]. The seasonal component was extracted using a periodic seasonal window (season(window="periodic")), which enforces stable annual seasonality across the series. All other STL parameters followed the default
feasts implementation, including automatic LOESS span selection for trend extraction, standard low-pass filtering of the seasonal component, and iterative robustness updates to reduce the influence of outliers [
14,
17]. This approach ensured reproducibility and consistency across all state-level analyses.
ARIMA and SARIMA models were applied as linear Gaussian time-series approximations to weekly dengue case counts, an approach commonly used in epidemiological surveillance analyses when counts are sufficiently large to justify approximate normality assumptions [
18]. Model identification followed a systematic diagnostic process. Prior to model estimation, stationarity and linearity properties of each time series were evaluated using the augmented Dickey-Fuller (ADF), Phillips-Perron (PP), and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests for mean stationarity; Goldfeld-Quandt and Breusch-Pagan tests for variance stationarity; and Ramsey Regression Equation Specification Error Test (RESET), Terasvirta, and McLeod-Li tests to assess linearity and Autoregressive Conditional Heteroscedasticity (ARCH) effects [
14]. Based on these diagnostics, appropriate differencing orders (d and D) or variance-stabilizing transformations were applied as needed. Parameter exploration was conducted using the auto.arima() algorithm from the
fable package, which performs stepwise model selection across the ARIMA/SARIMA parameter space and evaluates competing models using the Akaike information criterion (AIC), AIC with finite-sample correction (AICc), and Bayesian information criterion (BIC). This procedure ensured a systematic and reproducible model search while maintaining transparency in the identification process [
14]. The best-fitting model for each state was selected based on the lowest information criterion values. Given that the study period from 2022 to 2024 yielded a moderate number of weekly observations per state, the AICc was prioritized as the primary determinant of the optimal model. AICc adjusts for finite sample sizes and reduces the risk of overfitting associated with conventional AIC. BIC was used as a secondary reference to ensure that model parsimony was not compromised, as BIC applies a stronger penalty for model complexity. Final model selection therefore balanced goodness-of-fit, sample-size adjustment, and simplicity, with AICc serving as the principal selection metric [
14].
To assess the adequacy of the fitted ARIMA/SARIMA models, post-estimation residual diagnostics were performed. Residual autocorrelation was evaluated using the Ljung-Box test, applied at 24 lags to capture short- and medium-term dependence in weekly data while avoiding the loss of statistical power associated with testing a full 52-week seasonal cycle. This choice aligns with methodological practice in weekly epidemiological time-series analyses, where approximately half-season windows provide a practical balance between sensitivity and parsimony. Residual autocorrelation was further examined through visual inspection of residual autocorrelation function (ACF) plots [
14]. Homoscedasticity was assessed using the Breusch-Pagan test in conjunction with residual-versus-fitted value plots. Residual normality was evaluated using the Shapiro-Wilk test and supported by inspection of Q-Q plots to assess distributional shape. These diagnostic procedures were used as supplementary verification tools to determine whether key modeling assumptions were reasonably satisfied, rather than as formal proofs of model sufficiency [
14].
All analyses were performed using R ver. 4.5.1 (R Foundation for Statistical Computing). Data wrangling and preprocessing were conducted using the tidyverse suite, and date handling employed the lubridate package. STL decomposition was carried out using the feasts and fabletools packages, while ARIMA and SARIMA models were implemented using the forecast package. Visualization of decomposition results and model outputs was performed with ggplot2 and arranged using patchwork.
Ethics Statement
The research was conducted in accordance with the principles of the Declaration of Helsinki and received ethical clearance from the Human Research Ethics Committee of Universiti Sains Malaysia under approval code USM/JEPeM/KK/25010107. Additional approval was granted by the Malaysia Medical Research and Ethics Committee under the code NMRR ID-25-00144-OVW. This research was also supported by a Universiti Sains Malaysia Medical Research Grant 2025 (Project No: R501-LR-GPP001-0000002020-0000).
Results
Figure 1 summarizes the monthly distribution of dengue cases in Malaysia from 2022 to 2024 and demonstrates clear temporal variability across states. Overall, most states exhibited distinct epidemic peaks in 2023, with the state of Selangor consistently recording the highest number of cases, surpassing 10,000 cases at its peak. Other states, including Johor, Sabah, and Terengganu, showed pronounced surges in mid-2023, whereas smaller states such as Perlis displayed lower case counts with less marked fluctuations. Seasonal recurrence of dengue cases was evident in nearly all states, with multiple peaks observed across the study period.
STL Decomposition
The STL decomposition illustrated in
Figure 2 quantified the relative contributions of trend and seasonality to dengue transmission dynamics for each state (
Table 1). Seasonal strength values ranged from 0.27 in Sarawak, indicating weak seasonal influence, to 0.68 in Kelantan and Kedah, where seasonality played a more prominent role. In contrast, trend strength values were generally high across most states, exceeding 0.80 in Johor, Melaka, Selangor, and Perak, which suggests that long-term changes in transmission contributed more strongly to observed case counts than seasonal patterns. The timing of seasonal peaks and troughs varied between states, although many states exhibited recurrent peaks around mid-year. STL decomposition plots for all states are provided in
Figure S1.
ARIMA/SARIMA Models
Stationarity and linearity diagnostics were conducted prior to model estimation to verify that ARIMA and SARIMA assumptions were satisfied (
Table S1). Mean stationarity was assessed using the ADF, PP, and KPSS tests. The states of Johor, Melaka, Negeri Sembilan, Perak, Terengganu, and Wilayah Persekutuan demonstrated mean stationarity (
p<0.05 for ADF and PP tests and
p≥0.05 for KPSS) and therefore did not require additional non-seasonal differencing (d=0). Variance stationarity, evaluated using the Goldfeld-Quandt, Breusch-Pagan, Bartlett, and Levene tests, indicated heteroscedasticity in most states. Linearity diagnostics, including the Ramsey RESET, Terasvirta neural network, and McLeod-Li tests, detected significant nonlinearity or ARCH effects in the majority of states (
p<0.05), particularly those experiencing rapid epidemic fluctuations, such as Kelantan, Sabah, and Selangor. These findings supported the use of differencing and seasonal modeling to stabilize residual dynamics. Given the annual epidemiological cycle of dengue in Malaysia and the consistent seasonal components identified through STL decomposition, seasonal differencing (D=1 with a 52-week seasonal period) was applied uniformly across all states. This adjustment is supported by strong STL seasonal strength values in several states, including Kelantan, Kedah, and Pulau Pinang, and ensures removal of yearly recurrence in transmission patterns. Collectively, these diagnostic outcomes informed the differencing strategy, summarized as follows:
d=1 for 9 states with non-stationary mean
d=0 for the remaining 5 states, and
d=1 for all states due to consistent annual seasonal patterns identified in STL decomposition.
Post-differencing diagnostics demonstrated that ADF and PP tests were significant for all states (
p=0.01), confirming mean stationarity after differencing (
Table S2). KPSS tests were non-significant (
p=0.10) across all states, further supporting stationarity. No ARCH effects were detected, indicating stable residual variance and no requirement for GARCH-type model extensions. Overall, these findings confirm that the selected differencing orders were appropriate and that all state-level series satisfied the assumptions required for SARIMA modeling. The non-seasonal differencing order (d), seasonal differencing order (D), and post-differencing stationarity diagnostics are summarized in
Table 2.
Model selection using ARIMA and SARIMA approaches identified the most suitable specification for each state based on AIC, AICc, and BIC values, as summarized in
Table 3. Residual diagnostics revealed departures from ideal distributional assumptions in several states. Shapiro-Wilk tests indicated statistically significant deviations from residual normality, and Breusch-Pagan tests identified evidence of heteroscedasticity in some states (
Table S3). Inspection of Q-Q plots (
Figure S2) demonstrated mild tail departures from normality. Importantly, residual independence was preserved across all fitted models. Ljung-Box tests at lag 24 showed no significant residual autocorrelation (
Table 4), and residual ACF plots confirmed white-noise behavior across all states (
Figure S3), indicating that the ARIMA/SARIMA models adequately captured the temporal dependence structure of dengue incidence. Although residual non-normality and heteroscedasticity may affect the precision of prediction intervals, these features are commonly observed in infectious disease surveillance time series and do not invalidate model consistency or short-term forecasting performance. Accordingly, forecast uncertainty should be interpreted with appropriate caution in states exhibiting greater residual variability.
Discussion
This study provides a state-level temporal analysis of dengue incidence in Malaysia from 2022 to 2024, revealing pronounced heterogeneity in transmission dynamics across states. Seasonal-Trend decomposition using STL demonstrated that both long-term trends and seasonal fluctuations contributed to observed dengue patterns, although the relative influence of these components varied substantially between states.
Most states exhibited high trend strength values (>0.80), indicating that sustained structural determinants such as urbanization, demographic growth, and vector ecological adaptation exerted greater influence on dengue dynamics than short-term seasonal drivers. For example, Johor, Melaka, Perak, and Selangor all demonstrated strong trend dominance, consistent with prior studies showing that long-term environmental change and urban expansion increasingly shape dengue risk across Malaysia and Southeast Asia [
13,
15,
19].
In contrast, states such as Kelantan displayed strong seasonal signatures, with seasonal strength values reaching 0.68. This state exhibited recurrent epidemic peaks during specific months, particularly around mid-year, consistent with rainfall-driven mosquito breeding cycles associated with the southwest monsoon [
20–
22]. In Kelantan, for instance, seasonal peaks were most prominent during the third quarter of each year, with troughs occurring earlier in the year. Such seasonal cycles echo findings from Thailand and Vietnam, where rainfall and temperature seasonality remain critical triggers for epidemic onset [
23,
24].
The weak seasonality observed in the Borneo region (Sabah and Sarawak) provides a contrasting pattern. In these states, dengue incidence was less tightly linked to seasonal climatic cycles and more influenced by gradual long-term increases. This pattern suggests that factors such as population mobility, land-use change, or sustained vector adaptation may play stronger roles than short-term rainfall variability. Studies from the Borneo region and eastern Indonesia have similarly documented weaker seasonal dependence, reflecting ecological differences in forested and peri-urban environments [
25–
27]. Taken together, the STL results emphasize that while dengue dynamics in Malaysia are broadly trend-dominated, seasonal cycles continue to exert strong influence in a subset of states. This variability reinforces the need for locally tailored early warning systems that integrate both long-term and seasonal signals.
ARIMA and SARIMA modeling provided further insight into the temporal complexity of dengue transmission across Malaysian states. Model selection indicated that no single model structure was adequate for all states, confirming that dengue dynamics in Malaysia cannot be represented within a single temporal framework. The observed variation in optimal model structures likely reflects underlying environmental, demographic, and epidemiological heterogeneity across regions. States such as Kelantan (seasonal strength=0.679) and Pahang (0.523) demonstrated strong recurring annual cycles, with dengue case peaks typically occurring between June and September. These peak periods coincide with Malaysia’s inter-monsoon transition, which is characterized by intermittent rainfall, reduced water flushing, and warmer temperatures that collectively enhance Aedes breeding productivity [
28]. In contrast to heavy rainfall during major monsoon seasons, which can wash away larvae and disrupt breeding habitats, inter-monsoon conditions promote the persistence of clean, stagnant water environments that favor
A. aegypti proliferation and viral transmission [
19,
21,
22]. This ecological mechanism supports the seasonal dependence detected in these states and justifies the inclusion of SARIMA terms in the optimal models—Kelantan: ARIMA(0,1,1)(0,0,1)(s=7) and Pahang: ARIMA(0,0,0)(1,0,0) (s=7) These findings align with previous studies in Malaysia demonstrating dengue amplification during inter-monsoon periods, reflecting predictable rainfall–breeding dynamics that sustain annual dengue resurgence [
21,
28].
Conversely, highly urbanized states such as Johor, Penang, and Selangor, despite recording high dengue case burdens, demonstrated weaker seasonal structuring. Continuous human mobility, extensive built environments, and relatively stable microclimatic conditions in dense urban settings reduce the dependence of Aedes proliferation on seasonal rainfall cycles, thereby attenuating predictable intra-annual fluctuations [
28,
29]. Consequently, dengue transmission in these states may be sustained year-round through persistent artificial breeding sites, including water storage containers, construction areas, and drainage systems, as well as uninterrupted mosquito–human contact [
2,
9]. This epidemiological profile supports the selection of non-SARIMA models for these settings.
Several states were best represented by highly parsimonious non-seasonal models, including ARIMA(0,0,0) for Melaka, Perak, Penang, and Terengganu. These simple model structures suggest that short-term dengue variability in these states is dominated by irregular fluctuations rather than persistent autocorrelation or strong seasonal cycles. Importantly, such models do not imply an absence of epidemiological drivers but instead reflect relatively weak temporal structure in the weekly surveillance data after differencing. In contrast, states with stronger seasonal or autocorrelated dynamics required more complex ARIMA or SARIMA specifications, underscoring heterogeneity in temporal dependence across Malaysia. While parsimony yielded the lowest information criterion values in the present analysis, future research incorporating cross-validation or out-of-sample predictive assessments would be valuable for evaluating the robustness and forecasting stability of these simpler models.
These findings mirror evidence from other Asian settings in which substantial subnational variability limits the applicability of a single forecasting framework and underscores the need for localized modeling approaches [
30,
31]. Although ARIMA and SARIMA models provide robust representations of historical temporal patterns and perform well for short-term forecasting, their reliance on past observations constrains predictive performance under conditions of abrupt structural change, such as climate anomalies or major public health interventions. In addition, prediction intervals derived from ARIMA/SARIMA models should be interpreted as approximate, particularly in states exhibiting greater residual variance, as uncertainty may not be fully captured by univariate specifications. Although extensive diagnostic procedures were conducted, these checks are intended to support model adequacy rather than to establish sufficiency, and the findings should therefore be interpreted in light of the known limitations of linear time-series models applied to surveillance data. Consequently, while these models remain valuable tools for near-term surveillance and planning, they should be complemented by integrated modeling approaches that incorporate climatic, entomological, and mobility-related covariates to improve outbreak prediction and uncertainty characterization [
32].
This study advances dengue surveillance research in Malaysia by combining STL decomposition with ARIMA/SARIMA modeling to jointly characterize long-term trends and seasonal cycles, and by analyzing all 14 states individually to capture granular heterogeneity that is often obscured in national-level analyses. These state-specific insights can inform more targeted intervention strategies. For example, states with pronounced seasonal peaks, such as Kelantan and Pahang, may benefit from strengthened vector control measures implemented ahead of predictable surges, whereas states characterized by year-round transmission may require sustained, structurally oriented vector management and urban sanitation efforts.
Nevertheless, several limitations should be acknowledged. Dengue case notifications may be affected by underreporting and diagnostic misclassification [
33]. In addition, the models applied were univariate and did not incorporate climatic, entomological, or sociodemographic covariates known to influence dengue transmission. Previous studies have demonstrated that rainfall, temperature, humidity, vector indices, and population dynamics strongly shape dengue patterns in Malaysia and across the region [
2,
6]. The ARIMA/SARIMA framework adopted in this study assumes linearity and approximately Gaussian errors and therefore represents a statistical approximation when applied to count-based dengue surveillance data. While this approach is widely used to model temporal dependence and support short-term forecasting in settings with moderate to high case counts, it may not fully capture the mean–variance relationships intrinsic to count processes [
34]. Future research should therefore explore variance-adaptive and count-based modeling approaches, including Poisson or negative binomial time-series models, generalized linear autoregressive models, or Bayesian state-space frameworks, which may better accommodate overdispersion and nonlinearity in dengue transmission dynamics.
Despite these limitations, the findings have important practical implications for dengue surveillance and control. STL-derived seasonal signatures and ARIMA/SARIMA-based forecasts can be integrated into operational decision-support systems and automated early warning dashboards to trigger alerts when observed or projected case counts exceed expected thresholds. Such systems could facilitate the timely mobilization of vector control resources, hospital preparedness, and targeted community engagement ahead of anticipated transmission peaks. Embedding these analytical tools within district- and state-level surveillance infrastructure would further support more proactive, data-driven dengue control strategies across Malaysia.