Introduction
The rapid spread of influenza-like illnesses (ILIs) poses a significant public health challenge worldwide. According to the World Health Organization (WHO), approximately 1 billion cases of ILIs occur annually, including 3 to 5 million severe cases, resulting in an estimated 290,000 to 650,000 respiratory deaths globally [
1,
2]. In developing countries, ILIs account for over 99% of deaths among children under 5 years of age. These illnesses typically spread as seasonal epidemics during winter; however, in tropical regions, they circulate year-round. Common symptoms include cough, headache, severe malaise, and runny nose, and transmission occurs primarily through respiratory droplets [
3]. Each year, over 2 million individuals gather in confined spaces for the Hajj pilgrimage, significantly increasing infection transmission during this period [
4]. Seasonal variability and mass gatherings, especially during the Hajj season, complicate the monitoring and forecasting of ILIs in Saudi Arabia [
5–
8].
Traditional Forecasting
Forecasting is essential for combating the spread of ILIs as it facilitates effective resource allocation and the timely implementation of appropriate public health interventions. Accurate forecasting reduces uncertainty by predicting flu activity in advance, including when and where outbreaks are likely to occur. This predictive capability supports efficient resource allocation and improved planning before flu-related hospitalizations surge [
9]. Previously, influenza surveillance data collected from clinics, diagnostic laboratories, general practitioners, and public health organizations were used to track ILIs and other respiratory infections. However, this approach often led to delays of 1 week or more in report issuance, with frequent retrospective modifications [
10]. Additionally, unreported and unnoticed cases posed significant challenges in accurately predicting influenza incidence. The 2014 WHO Global Epidemiological Surveillance Standards for Influenza propose a “sentinel surveillance” approach, involving the regular collection of epidemiological and virological data from selected monitoring sites [
11,
12]. Although this approach is typically efficient in gathering high-quality data, it has several inherent limitations. First, it only captures influenza cases in individuals who seek medical attention, representing merely a fraction of the actual disease burden. Additionally, there is typically a delay between symptom onset and the decision to seek healthcare, compounded by a further delay of 1 to 2 weeks between data collection and its reporting. Moreover, the effectiveness of sentinel surveillance heavily depends on factors such as the availability of quality laboratory resources and skilled personnel, which may not always be accessible. Lastly, the selection of sentinel sites often prioritizes locations serving large, easily accessible populations, introducing potential sampling bias [
13]. Mathematical models for predicting infectious disease outbreaks generally fall into 2 primary categories. The first category includes mechanistic models, which predict the future course of an ongoing epidemic. These models rely on understanding disease transmission mechanisms and their influencing factors, such as the depletion of susceptible populations or climate-related impacts on transmission rates. Mechanistic models may function at the population level (e.g., compartmental models) or the individual level (e.g., agent-based models) [
14]. In the United States, Ginsberg et al. [
15] were the first to use Google query data to predict ILI rates, creating an influenza trend model that monitored real-time data from millions of search engine queries. However, this model exhibited limitations common to many previous surveillance systems. Other research teams have subsequently utilized various online platforms, including Twitter, Weibo, Baidu, Yahoo, and Google, to develop models for improved accuracy in identifying influenza trends [
16]. These limitations of traditional methods highlight the need for more advanced approaches, particularly in managing complex patterns and nonlinear dynamics.
Advancing Machine Learning Models
Recent advancements in machine learning have demonstrated exceptional performance in predicting influenza trends, particularly through models that analyze high-dimensional and nonlinear data. Notable examples include recurrent neural networks (RNNs), random forests, and support vector machines [
17–
19]. Prior research has confirmed the effectiveness of machine learning methods for predicting various epidemic diseases across different countries. Among these, deep learning techniques—especially long short-term memory (LSTM) networks—have emerged as powerful tools for forecasting complex time series data [
20]. LSTM models excel at modeling long-term temporal dependencies and irregular patterns, significantly surpassing traditional forecasting methods like seasonal autoregressive integrated moving average (SARIMA) and Holt-Winters in accurately predicting ILI trends [
21,
22]. Bidirectional LSTM networks extend the capabilities of LSTM by capturing patterns in both forward and backward temporal directions, providing superior predictive accuracy for both short- and long-term trends, particularly when temporal correlations are complex [
23]. However, achieving optimal performance with these advanced models requires data of sufficient quality and quantity, which can be a notable limitation.
Despite these technological advancements, Saudi Arabia and other Middle Eastern countries continue to underutilize deep learning models for ILI forecasting. Most research in the region still relies heavily on conventional statistical methods, which effectively identify seasonal trends but fall short of capturing intricate asymmetrical dynamics [
24]. Moreover, earlier studies generally overlooked exogenous variables—such as vaccination rates, climatic conditions, and population mobility—which are crucial for enhancing predictive accuracy. One study in the Asia-Pacific region incorporated migration trends and demographic data to significantly improve predictive performance [
25]. Additionally, other researchers have demonstrated the advantages of employing hybrid forecasting models, combining statistical and machine learning methods for predicting ILI trends across various regions [
26,
27]. Gomez-Cravioto et al. [
28] and Budiharto [
18] highlighted the efficiency of LSTM models in predicting nonlinear patterns associated with seasonal variability. By integrating key factors such as climate variations, demographic patterns, population migration, and vaccination rates, the current study addresses existing knowledge gaps and aims to enhance the forecasting accuracy of ILIs in Saudi Arabia.
Emerging Machine Learning in the Middle East
Various methods have been applied by Middle Eastern countries for forecasting ILIs, but these efforts remain in an ongoing developmental stage [
24]. A limited number of studies have employed traditional forecasting methods, such as autoregressive integrated moving average (ARIMA) and SARIMA models, particularly during pandemics. Recently, a novel extreme gradient boosting (XGBoost) model demonstrated improved accuracy in forecasting monthly influenza case numbers in Saudi Arabia, surpassing traditional methods in pandemic scenarios [
29]. Another study conducted in Syria compared multiple forecasting models using weekly ILI data collected via the Early Warning Alert and Response System. Although this study observed forecasting improvements, it noted significant limitations regarding data quality and availability [
30].
Table 1 provides a comparative overview of traditional and machine learning methods. During the coronavirus disease 2019 (COVID-19) pandemic, Saudi Arabia leveraged machine-learning models for outbreak prediction, enhancing early disease detection, screening, prognosis, and facilitating the development of proactive strategies to mitigate disease impacts [
8]. These findings underscore the underutilized potential of advanced machine-learning approaches in Middle Eastern countries, highlighting the need to address this gap.
Hybrid models, which integrate the strengths of traditional statistical approaches and machine learning techniques, offer the potential to develop more accurate and interpretable forecasting systems. Despite advancements in ILI forecasting, there has been limited comparative analysis between traditional statistical methods—such as the Holt-Winters model—and advanced deep learning techniques, particularly LSTM networks, in Saudi Arabia. Most research has focused exclusively on individual modeling approaches, often neglecting the influence of seasonal fluctuations and mass gatherings, such as Hajj and Umrah, on forecasting performance. An improved understanding of the factors that improve predictive accuracy could significantly optimize public health interventions and enhance early warning systems.
The aims of this study were as follows: (1) To compare the performance of the traditional Holt-Winters statistical model and the deep learning-based LSTM model in forecasting weekly seasonal ILI incidences in Saudi Arabia. (2) To evaluate the predictive accuracy of both models using the root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) metrics. (3) To enhance public health awareness by identifying an adaptable, data-driven forecasting model tailored to the unique epidemiological trends and public health challenges in Saudi Arabia.
Materials and Methods
Study Design and Data Collection
This study evaluates the performance of the Holt-Winters and LSTM models for forecasting weekly ILI cases in Saudi Arabia. Data covering a 6-year period from 2017 to 2022 were obtained from the Saudi Ministry of Health and the WHO. Weekly data were collected from all regions of Saudi Arabia. Exogenous variables, such as climatic conditions, population mobility trends, and vaccination rates, were included to improve predictive accuracy.
Holt Linear Model
The Holt-Winters forecasting method enhances univariate time series modeling [
31–
33]. As a statistical approach, the Holt-Winters method effectively captures seasonal and trend components, making it particularly suitable for short-term forecasting of seasonal data due to its computational efficiency and ease of implementation. Depending on the characteristics of the seasonal component, either additive or multiplicative Holt-Winters models are utilized [
34–
36]. Additive models are appropriate for data exhibiting linear or stable exponential trends without significant growth over time [
37].
The additive Holt-Winters model is mathematically represented as follows [
29,
30]:
Forecasting:
where:
x^t+ht: Forecasted value for time t+h,
Lt: Level component at time t,
Tt: Trend component at time t,
St-m+1+(h-1)(m): Seasonal component for the forecasted period,
m: Length of the seasonal cycle.
Level (Lt):
Trend (Tt):
Seasonality (St):
where:
xt: Observed value at time t,
St-m: Seasonal component from m periods ago,
α, β, γ: Smoothing parameters for level, trend, and seasonality, 0<α, β, γ<1,
The multiplicative model is formally expressed as follows [
24,
25]:
Forecasting:
x^t+ht: Forecasted value for time t+h,
Lt: Level component at time t,
Tt: Trend component at time t,
St-m+1+(h-1)(m): Seasonal component for the forecasted period,
m: Length of the seasonal cycle.
Level (Lt):
Trend (Tt):
Seasonality (St):
where:
xt: Observed value at time t,
St-m: Seasonal component from m periods ago,
α, β, γ: Smoothing parameters for level, trend and seasonality, 0<α, β, γ<1
Long Short-Term Memory
LSTM is a specific type of RNN. It enhances traditional RNNs through an improved structure containing 3 gate mechanisms—forget, input, and output gates—as well as a cell state [
38]. This enhanced structure is crucial for epidemiological forecasting, as it effectively manages abrupt changes and irregular patterns driven by climatic variations, public health interventions, or mass gatherings. LSTM models overcome many limitations of traditional statistical methods by better capturing long-term dependencies and nonlinear interactions.
Figure 1 illustrates the architecture of the LSTM model. The following paragraphs detail the architecture and gate mechanisms of the LSTM model [
25].
The forget gate reviews the current time step’s input information, denoted as
st, and the previous time step’s output information, denoted as
ht-1. When
ft=0 , the gate discards the read information. Conversely, when
ft=1, it retains the read information. The formula for
ft is as follows [
39]:
This gate determines which new input information to store in the neuron. It creates a candidate cell state
C¯t, and the input gate updates the candidate cell state. The new information is subsequently added to the cell state. The specific formula is as follows [
40]:
In the above formulas, wc. is the weight matrix for the cell state, bc is the bias coefficient for the cell state, wi is the weight matrix for the input gate, and bi is the bias coefficient for the input gate.
The output gate determines the final output
ht using the cell state. It starts by processing the current input information
xt and the previous output information
ht-1. Then, it multiplies these values by the cell state processed by the tanh layer to obtain the final output
ht. The specific formula is as follows [
41]:
in these formulas, wo is the weight matrix for the output gate, and bo is the bias coefficient for the output gate.
Both models are consistently employed throughout this study, leveraging combined statistical and machine learning strengths. Combining Holt-Winters and LSTM methods potentially enhances forecasting accuracy by better adjusting for seasonal variations. Optimizing hyperparameters is crucial for enhancing model performance, particularly when predicting unseen or new data [
42,
43]. Hyperparameter tuning involves selecting and adjusting parameters such as the optimizer type, number of epochs, number of neurons, activation functions, and loss functions. Appropriate optimization levels significantly improve the accuracy and generalizability of forecasting outcomes.
Cross-Validation for Model Optimization
Cross-validation is a systematic method used to explore combinations of hyperparameters, helping to identify the optimal model configuration. This approach is especially valuable for time-sensitive applications such as disease forecasting [
44]. It is widely employed to estimate prediction errors and focuses explicitly on preventing model overfitting. A commonly used cross-validation method for optimizing LSTM models is Monte Carlo cross-validation [
45]. For LSTM models, the optimizer’s learning rate (e.g., Adam or RMSprop) regulates the step size for updating model parameters (
Figure 2). Overfitting is typically mitigated by incorporating a dropout rate, which randomly disables neurons during training. Additionally, the number of epochs and batch size play crucial roles: increasing epochs can capture complex patterns but risk overfitting, whereas batch size influences both training duration and model generalization [
42].
Frequently used metrics for evaluating prediction accuracy include MAE, mean squared error (MSE), RMSE, and the coefficient of determination (R
2) [
42,
43]. The MAE quantifies the average magnitude of prediction errors, representing the typical error magnitude expected from forecasts [
46–
50]:
Where: n is the number of errors, and |Zj-Z| the absolute errors.
Eq. (15) defines the RMSE, which represents the standard deviation of the prediction errors.
R2 quantifies the proportion of the predicted weekly ILI cases that can be explained by the predictor variables, as defined in
Eq. (16).
where χ⏞i represents the predicted value of the ith sample and represents the matching real value for the whole n samples.
PBIAS (%)
The percent bias (PBIAS) quantifies the amount of bias in the model by assessing the cumulative discrepancy between observed and predicted values relative to the total observed values:
where,
χi is the observed value,
χ⏞i is the predicted value,
n is the number of observations.
The magnitude of the errors that may be predicted from the average forecast is denoted by PBIAS [
51–
53].
Willmott’s Index of Agreement
Willmott’s index of agreement (WI) is a dimensionless statistic used to evaluate the magnitude and direction of predictive error across various models [
52,
54,
55]. It is particularly useful for comparing predictive accuracy between models:
where:
χi is the observed value,
χ⏞i is the predicted value,
n is the number of observations.
χı ¯ is the mean of observed values.
Results
Data Characterization
Temporal epidemiological analysis was performed on ILI trends, highlighting variability in disease incidence across different seasons and years. Weekly data on ILIs in Saudi Arabia, spanning from January 2017 to December 2022, were collected from the Saudi Ministry of Health and the WHO. The dataset encompasses all regions of Saudi Arabia, providing comprehensive national-level insight into ILI dynamics.
Data Preparation
To ensure data accuracy, consistency, and readiness for analysis, the following pre-processing steps were implemented: (1) Missing data points were imputed using interpolation. (2) Data were normalized using min-max scaling to ensure compatibility with the LSTM algorithms, rescaling ILI case counts to (0, 1). This supports faster convergence in LSTM. (3) Data were split into training (80%), validation (10%), and testing (10%) sets.
Characteristics of weekly ILI cases
Figure 4 illustrates clear, recurrent peaks at regular intervals, demonstrating seasonal patterns in weekly ILI incidence. The peaks correspond with defined seasons, particularly winter, and significant mass events such as the Hajj pilgrimage season. A noticeable decline in ILI cases occurred between 2020 and 2021, reflecting stringent preventive measures implemented in response to COVID-19. This decline reflects the impact of public health interventions on disease spread.
Forecasting weekly ILI cases utilizing the Holt-Winters model
The Holt-Winters forecasting models were developed and assessed using additive and multiplicative seasonal components based on the training dataset. A systematic grid search identified the optimal model configuration, selecting the model with the lowest RMSE for the test dataset. Comprehensive evaluation identified the Holt-Winters multiplicative model as the best-performing configuration, and multiple metrics were computed to evaluate its performance (
Table 2). These metrics affirmed the model’s capability to capture seasonal trends effectively. The smoothing parameters for level, trend, and seasonality were automatically optimized based on data characteristics.
Figure 5 presents the actual and predicted values generated by the Holt-Winters model. The graph demonstrates that the model effectively captured general seasonal patterns and long-term trends; however, deviations are evident during periods of high volatility. Notably, the model substantially underestimated ILI cases during seasonal peaks, such as the Hajj season, and struggled to adapt to the sudden case declines during 2020–2021 attributed to COVID-19 prevention measures
Forecasting weekly ILI cases utilizing the LSTM model
For the LSTM model, data normalization was performed using the min-max Scaler, and sequences were generated using a 10-week timestep to provide adequate temporal context. The model was trained across 20 epochs using a batch size of 32, employing the Adam optimizer and MSE as the loss function to reduce forecasting inaccuracies. Model performance was evaluated, as summarized in
Table 1.
Figures 5 and
6 emphasize discrepancies between observed and predicted values, clearly showing that the Holt-Winters model struggled with high-variability periods, while the LSTM model maintained greater stability.
Figure 5 demonstrates the Holt-Winters model’s forecast for weekly ILI cases from 2017 to 2023, comparing training and test datasets with predictions. Although the Holt-Winters model captured seasonality and general trends reasonably well, it struggled with sudden spikes during events such as the Hajj season and significant variability during the COVID-19 pandemic. These deviations reflect the model’s inability to fully account for nonlinear patterns, suggesting that incorporating exogenous variables or using hybrid modeling approaches could enhance forecast performance.
Figure 6 shows that the LSTM model effectively captured both seasonal variations and long-term trends in ILI incidence. The graph compares observed data from the training set, test set, and LSTM predictions, clearly illustrating the model’s capacity to discern patterns accurately and generate reliable predictions during the testing period. While the model generally exhibited strong performance, minor inconsistencies were observed during periods of rapid variation in ILI incidence, indicating opportunities for further improvement through feature engineering or additional hyperparameter optimization. LSTM predictions closely tracked actual values, showing only slight discrepancies during abrupt incidence changes. Incorporating exogenous variables such as mobility patterns, climatic factors, and vaccination rates, along with hyperparameter fine-tuning, could further enhance predictive accuracy. The LSTM model notably demonstrated superior accuracy in handling temporal dependencies and complex epidemiological patterns.
Figure 7 highlights a sudden peak during the Hajj season, illustrating that the LSTM model predicted these spikes more accurately than the Holt-Winters model. The LSTM’s superior performance is attributed to its advanced capacity for handling temporal dependencies and retaining historical information. However, brief periods of high volatility showed slight deviations in LSTM predictions, suggesting that additional feature engineering may be beneficial (
Table 2).
To comprehensively assess forecasting performance, statistical metrics were used, including RMSE, MAPE, R
2, PBIAS [111], and the WI [111]. RMSE and MAPE assess prediction accuracy by quantifying absolute and relative errors, respectively, while R² indicates the proportion of variance explained by the model. PBIAS provides insights into the direction and magnitude of systematic model bias, determining if predictions are generally overestimated or underestimated relative to observations, particularly beneficial for long-term forecasts. The WI evaluates overall predictive performance, quantifying the proximity between observed and predicted values, further illustrated by the Taylor diagram and box plot of prediction errors (
Figures 8,
9). As summarized in
Table 2, the LSTM model significantly outperformed the Holt-Winters model, with notably lower RMSE (34.07 vs. 82.57), reduced MAPE (0.18 vs. 0.38), higher R
2 (0.93 vs. 0.58), and substantially improved PBIAS (+5.8% vs. +14.2%) and WI (0.48 vs. 105.79) scores.
These findings strongly support the conclusion that LSTM models are more suitable than traditional statistical approaches for modeling complex, nonlinear, and seasonally varying time-series data related to ILI trends.
Discussion
This study compared the predictive performance of Holt-Winters and LSTM models for forecasting weekly ILI cases in Saudi Arabia, emphasizing the inclusion of exogenous variables such as climatic conditions, population mobility, and vaccination coverage. The results demonstrated that the LSTM model, particularly when incorporating exogenous variables, consistently outperformed the Holt-Winters model across all evaluation metrics. Specifically, the LSTM model achieved an RMSE of 28.55 and an R² of 0.96, highlighting its superior capability to capture nonlinear trends and significant peaks in disease incidence, particularly during high-variability periods like the Hajj season. Both additive and multiplicative seasonal components were methodically examined using a grid search methodology to determine the optimal model configuration. This systematic evaluation allowed exploration of multiple parameter combinations, ensuring the selection of the most effective model configuration by balancing computational efficiency and accuracy. Although the Holt-Winters model provides an intuitive framework for addressing seasonality, differences emerged in forecasts during periods of elevated ILI variability. These discrepancies highlight that, despite its proficiency in capturing general trends and predictable seasonal patterns, the Holt-Winters model struggles with nonlinear dynamics and abrupt changes typical of epidemiological data. A previous study indicated similar challenges, noting that the model may fail to predict sudden spikes or drops in disease incidence due to unforeseen events or interventions. Moreover, Holt-Winters models inherently assume consistent seasonal patterns, an assumption frequently invalid in epidemiological contexts where seasonality may vary considerably [
34].
In contrast, the LSTM model demonstrated enhanced performance across all assessment criteria, including a statistically significant reduction in RMSE (34.07 vs. 82.57), reduced MAPE (0.18 vs. 0.38), and increased R
2 (0.93 vs. 0.58). These metrics underscore the practical advantage of LSTM, facilitating more accurate disease forecasts and enabling timely public health interventions. The improved metrics also reflect the LSTM model's strong capability in learning sequential dependencies and complex nonlinear interactions inherent in epidemiological time series data. By utilizing 2 stacked LSTM layers combined with fully connected layers, the model effectively captured intricate temporal patterns. Stacked LSTM architecture enhances the model’s ability to recognize complex temporal dynamics and nonlinear data dependencies. Each additional layer of LSTM allows for deeper, more abstract representations of the data, significantly improving performance on forecasting tasks [
56]. Despite these advancements, the LSTM model encountered challenges when handling sudden fluctuations in ILI incidence rates. Occasional discrepancies during periods of rapid change likely reflect issues such as limited data availability or inherent randomness in epidemiological data, indicating room for further model refinement.
Compared to related research, the current study offers several methodological improvements. For instance, Khan et al. [
57] introduced a cloud-based modeling system for the influenza pandemic using a feed-forward propagation neural network (MSDII-FFNN) model for pandemic influenza forecasting using Internet of Things-generated data [
57]. While comparable in its real-time monitoring objective, their approach differed methodologically by focusing more on infrastructure integration rather than methodological optimization. Unlike our study, Khan et al. [
57] did not utilize standardized performance metrics or baseline controls, thus limiting the rigorous evaluation of their model’s reliability and accuracy. Similarly, Alzahrani and Guma [
58] utilized ARIMA, SARIMA, and XGBoost models to predict monthly influenza cases in Saudi Arabia. Their findings align with our results in demonstrating machine learning’s superiority over traditional statistical methods. However, their study differed significantly in terms of scope and temporal resolution, employing monthly data and omitting exogenous variables, thus restricting their model’s applicability for predicting rapid epidemiological changes such as those associated with the Hajj pilgrimage or unexpected outbreaks.
Meanwhile, the study by Olukanmi et al. [
59] is similar to ours in employing deep learning techniques for direct weekly ILI predictions. Their approach, however, was distinguished by using digital behavior data from Google Trends rather than structured environmental data. While both studies highlight the strengths of LSTM models, they demonstrated the effectiveness of real-time public data in identifying early symptom-related searches, whereas our study emphasized epidemiological and environmental proxies to further improve predictive performance. These comparisons illustrate how different data inputs—such as digital search trends in Olukanmi et al. [
59], cloud-embedded monitoring in Khan et al., and monthly aggregated case data in Alzahrani and Guma [
58]—influence model outcomes. Our research positions itself between these methodologies by utilizing formal, real-world weekly data with additional exogenous variables, achieving a balance of timeliness, precision, and policy relevance. Developing interpretable hybrid models capable of integration into public health systems will be crucial for timely and accurate epidemic forecasting across diverse epidemiological contexts.
However, other limitations must be considered. Training deep learning models such as LSTM is resource-intensive, demanding substantial computational power, specialized hardware, and memory [
43,
60]. Conversely, the Holt-Winters model is more suited to real-time applications due to its computational efficiency and lower resource requirements. Nonetheless, this computational efficiency compromises prediction accuracy when dealing with complex, nonlinear epidemiological data.
Integrating advanced characteristics—such as population mobility patterns, climatic variations, and vaccination rates—can significantly enhance a model’s ability to predict seasonal peaks and fluctuations associated with mass gatherings, notably the Hajj pilgrimage, thereby deepening our understanding of influenza transmission dynamics [
61,
62]. Certain models can dynamically adjust real-time forecasts according to changing environmental factors. Improved epidemiological models may focus more closely on population migration patterns to enhance disease transmission predictions [
61,
63]. Moreover, climatic factors like temperature and humidity have previously improved the accuracy of epidemiological models for COVID-19 transmission projections [
64]. Similarly, predictive models in the United States have enhanced performance by explicitly incorporating population migration rates as exogenous variables [
64]. Concentrating on these factors could allow machine learning models to predict changes in ILI incidence more accurately, ultimately facilitating more effective public health strategies.
When comparing machine learning models, interpretability and simplicity are key strengths of the Holt-Winters model, despite its performance decline in complex scenarios characterized by temporal dependencies or sudden shifts. Conversely, long-term machine learning models such as LSTM demonstrate robust capabilities in representing these complexities but require careful hyperparameter tuning, substantial computational resources, and strategies to mitigate overfitting. These findings underscore the appropriateness of advanced machine learning methods, like LSTM, for predicting influenza and related illnesses within dynamic epidemic settings. In stable conditions, the Holt-Winters model remains beneficial due to its efficiency and interpretability. Future studies could explore hybrid methodologies combining statistical and machine learning models, capitalizing on their complementary strengths to enhance predictive performance. Previous research has demonstrated the effectiveness of hybrid models—for instance, combining LSTM with ARIMA in time-series forecasting has yielded improvements in predictive accuracy and adaptability within epidemiological modeling [
62]. Additionally, hybrid deep learning approaches integrating convolutional neural networks with LSTMs have effectively predicted disease trends, enhancing both feature extraction and sequential learning capabilities [
63]. These hybrid models could achieve even greater accuracy when enriched with exogenous variables such as population migration, vaccination rates, or climatic conditions.
Limitations and Future Work
This study’s models had limited capacity to fully adjust for external factors affecting disease transmission, focusing predominantly on historical ILI case data. Incorporating exogenous variables such as population mobility, vaccination coverage, and environmental conditions (temperature and humidity) may further improve forecasting accuracy. Still, the models faced challenges accommodating abrupt shifts and inherent nonlinearities in the data, illustrating the broader challenge of relying solely on statistical approaches in highly volatile datasets. Nevertheless, the LSTM model consistently performed better across all evaluation metrics, reflecting its strong capability to capture temporal dependencies and nonlinear interactions. Lower RMSE and MAPE values combined with a higher R
2 indicate that LSTM is particularly suitable and effective for predicting ILI cases in dynamic and complex epidemic scenarios. Predictions occasionally deviated during rapid data fluctuations, indicating potential improvements through additional feature enrichment and hyperparameter tuning. Previous studies have shown that incorporating mobility data into COVID-19 models and environmental variables into epidemiological forecasts enhances predictive performance [
61,
63,
64]. Future research efforts should focus on developing hybrid models combining statistical and machine learning approaches to leverage their respective strengths. Moreover, integrating exogenous variables such as meteorological data, migration patterns, vaccination coverage, and public health interventions could yield more precise and practically valuable predictions. The findings from this study contribute to improved methods for forecasting ILI cases, facilitating public health planning and enhancing intervention strategies for seasonal and emerging infectious diseases. For instance, the ARIMAX model successfully incorporates external variables into epidemiological predictions. Combining ARIMAX with LSTM models could further enhance predictive accuracy while retaining interpretability. Future work might also involve public health experts in model development, thereby improving adoption and utilization within healthcare systems. Additionally, the integration of real-time data streams and adaptive modeling techniques warrants further investigation to better accommodate evolving epidemiological trends.