Investigating the Accuracy of Hybrid Models with Wavelet Transform in the Forecast of Watershed Runoff
Abstract
In the hydrological cycle, runoff precipitation is one of the most significant and complex phenomena. In order to develop and improve predictive models, different perspectives have been presented in its modeling. Hydrological processes can be confidently modeled with the help of artificial intelligence techniques. In this study, the runoff of the Leilanchai watershed was simulated using artificial neural networks (ANNs) and M5 model tree methods and their hybrid with wavelet transform. Seventy percent of the data used in the train state and thirty percent in the test state were collected in this watershed from 2000 to 2021. In addition to daily and monthly scales, simulated and observed results were compared within each scale. Initially, the rainfall and runoff time series were divided into multiple sub-series using the wavelet transform to combat instability. The resultant subheadings were then utilized as input for an ANN and M5 model tree. The results demonstrated that hybrid models with wavelet improved the ANN model's daily accuracy by 4% and its monthly accuracy by 26%. It also improved the M5 model tree's daily and monthly accuracy by 4% and 41%. The wavelet-M5 model's accuracy does not diminish to the same degree as the wavelet-ANN (WANN) model as the forecast horizon lengthens. Consequently, the Leilanchai watershed has a relatively stable behavior pattern. Finally, hybrid models, in conjunction with the wavelet transform, improve forecast accuracy.
1 Introduction
Water resources engineers need accurate surface runoff predictions for a variety of purposes. Although several models have been created to predict rainfall-runoff, precise prediction is challenging due to the intricate and nonlinear interactions between influencing components of rainfall-runoff transformation. For a long time, hydrology research has been conducted to discover the rainfall-runoff interaction influenced by precipitation patterns and watershed geomorphologic characteristics (Adnan et al. 2021; Fayaz et al. 2022). The interplay of climatic variables such as temperature, precipitation, evaporation, wind, and others with hydrological variables such as streamflow, concentration-time, permeation, and others has resulted in a nonlinear and indeterminate relationship between rainfall and runoff (Shoaib et al. 2018).
Hydrological models are crucial instruments for water and environmental resource monitoring. They also analyze urban and ecological modeling situations, such as land use, flood control, and watershed monitoring (Mohammadi et al. 2019). Using a series of calculations to describe runoff as a precipitation mechanism and other variables that reflect basin features, modeling the interaction between precipitation and runoff is common. Although both processes have a cause-and-effect relationship, the nonlinear behavior of the water cycle's complex features makes accurate rainfall-runoff modeling difficult (Okkan et al. 2021). For hydrological time series forecasting, classic time series models are commonly utilized. They are, however, essentially linear models that assume data is stable, with little capability to obtain non-stationarities and nonlinearities in hydrological data (Feng et al. 2020).
Numerous studies and method comparisons have been conducted regarding the rainfall-runoff process forecast. Lallahem and Mania (2003) proposed a suitable solution based on artificial neural networks and compared it to other methods for large-scale problems with longer intervals. Solomatine and Dulal (2003) analyzed the impact of the M5 model tree in the rainfall-runoff conversion and discovered that it could produce reasonable forecasts. Solomatine and Xue (2004) employed the M5 model tree to flood prediction and found the M5 model accurate. M5 model tree with ANN was used by Bhattacharya and Solomatine (2005) to construct a link between the water level and flow. Classical models are considered to be inferior to ANN and M5 model tree. Dastorani et al. (2009) used an artificial neural network and other models to reconstruct current discharge data. They found that the neural network results were superior to the correlation and normal ratio methods. Without considering the aspects, Asati and Rathore (2012) built a regression analysis, MLR, and ANN for a complicated non-linear interaction among rainfall as input data and output as runoff and evaluated the results from each method. Wei et al. (2013) proposed the WNN and ANN forecast process for a 48-month-ahead monthly streamflow forecast in the Weihe River in China. In comparison to the classic ANN method, the WNN hybrid method was fitted to get the capability to increase forecasting. Rezaeianzadeh et al. (2015) found that ANN models predicted the standardized streamflow index better than most other methods. Nourani et al. (2019) had a wavelet-based method for SSL modeling. When comparing the wavelet-M5 tree method to the wavelet-ANN (WANN) and M5 tree methods, it was discovered that the wavelet-M5 tree model estimates SSL with great accuracy. Gao et al. (2020) used machine learning methods to evaluate runoff amounts in a watershed in China's southeast. They discovered that the GRU method performed better in a short duration than most methods. Khan et al. (2020) conducted comparison research to examine the performance of three machine learning techniques in simulating Pakistan's SPIE: SVM, ANN, and k-Nearest Neighbour (KNN). The SVM outperformed the ANN and KNN, according to the research. Zamrane et al. (2021) applied the wavelet technique to a more evident appreciation of hydrologic variations in Morocco. More information on runoff prediction applications is involved here. For multi-step forward prediction of daily flood frequency up to 7 days ahead of time, Liu et al. (2021) examined several artificial intelligence methods. When evaluated with another method, the DGDNN proved to be more effective. According to the literature review, it can be concluded that the use of ANN and M5 models, as well as the application of wavelet transform on them, is beneficial for studying rainfall-runoff in the watershed (Prasad et al. 2017).
Figure 1 depicts the most significant reasons for each model's component. ANN and M5 models were initially employed to examine the available data. Next, the wavelet transform is implemented. In this method, wavelet transform was utilized to eliminate the current trend of the significant time series of the analyzed watershed and minimize the multiscale effects of the rainfall-runoff dataset. The daily precipitation and runoff time series of the Leilanchai watershed were decomposed into sub-signals with varying resolutions. The ANN and M5 models were then used to reconstruct the primary projected time series using these sub-signals. Finally, the proposed models were compared to evaluate their efficacy.
Figure 1 Different stages of the research.
Given the high cost and time commitment of laboratory methods and physical models for investigating rainfall-runoff phenomena, machine learning methods have grown in popularity in recent years. Furthermore, given the complexities of the rainfall-runoff phenomenon, it is necessary to assess each model's accuracy, benefits, and drawbacks. As a result, the accuracy of ANN and M5-Tree models, as well as the effect of hybrid with wavelet transformation on the accuracy of the models, have been investigated in the current study. The present study is unique in that it examines the accuracy of the results of the nonlinear model, the multi-linear model, and the effect of the hybrid with wavelet transformation.
2 Materials and methods
2.1 Study area
As a tributary of the Zarrineh River, the Leilanchai watershed is one of the major rivers in northwestern Iran's eastern portion of Lake Urmia. This watershed is located in the province of East Azerbaijan, encompasses portions of the cities of Maragheh and Malekan, and is an essential source of water for residents of the region (Figure 2). The watershed area is approximately 393 square kilometers, and its highest and lowest elevations are 3919 and 9132 meters, respectively. Notably, this watershed's data from 2000 to 2021 have been utilized (the data is daily and monthly). Seventy percent of the data was used in the training state and 30% in the testing state.
Figure 2 Location of the study area.
2.2 ANNs
In simulating and predicting nonlinear hydrological datasets as self-learning and self-adaptive approximation functions, Artificial Neural Networks (ANN) has proven to be extraordinarily effective (Ba et al. 2017). Artificial Neural Networks (ANN) is defined as a rudimentary human brain model. In recent decades, one of the AI approaches, ANN, has been used to mimic the rainfall-runoff process. The neural network model connects incoming inputs with outputs through physical mapping techniques. Since the early nineties, artificial intelligence has been applied as a strategy among several statistical information methodologies. Simulating the watershed system with only a few observations, giving substantially higher flexibility simulation with nonlinear mapping, and compensating for lacking hydrology knowledge, are the key advantages of ANNs in this application. In many studies, ANN is combined with wavelet processed data, capturing data seasonality, and enhancing implementation over single-layer ANNs. An ANN is a nonlinear mathematical structure capable of displaying the nonlinearity process for communicating between any system's inputs and outputs. This network is being trained with current data in the learning process and can be used to predict the future (Faghih et al. 2022). Weights connect each layer's neurons to the next layer. Therefore, the ANN model is suitable for data-driven time series modeling. Figure 3 shows a schematic of the ANN model.
Figure 3 Schematic figure of ANN model.
2.3 M5 model tree
Researchers of machine learning have also examined tree-based regression models (Kisi 2021). Utilizing multiple models in the leaves of trees is one of the accomplishments of this group's activities (Gholami et al. 2018). During the forecast, a smoothing process can correct discrepancies among consecutive linear programming. The decision tree's contrasting element aims to improve classification performance, whereas the model tree's splitting criterion reduces goal quantity uncertainty (Lee et al. 2019).
In the M5 model tree, a subset-data-driven machine learning method is used (M5 tree). In the M5 model tree, the tree structure is implemented as a data-driven procedure framework, built utilizing input and output databases (Londhe and Charhate 2010). The linear equation has been used for the tree structure. The root, nodes, branches, and leaves of an M5 tree are comparable to a real tree. The decision tree designs the tree's nodes, branches, and leaves using an input database. The shrink-the-developed tree model regulates the overfitting tree by trimming the branches and replacing them with linear functions. As shown in Equation 1, the nodes are chosen to utilize the split criterion and maximize standard deviation reduction (SDR).
(1) |
Where:
N | = | number of data points, |
Q | = | subset of a node's samples, |
Qi | = | subset of potential test samples, and |
sd | = | input data standard deviation. |
When the SDR of data at a node is impossible, a node will not be cut; as a result, it will be provided as the last step in producing a node or leaf. Due to a lower standard deviation and increased homogeneity in the M5 model's classification process, offspring nodes make more accurate predictions than parent nodes (Kisi et al. 2022). When all feasible nodes and branches are considered when determining the simulation procedure, M5 can be provided with a minimally erroneous and highly accurate relation. Therefore, the M5 model tree is suitable for data-driven time series modeling. Figure 4 shows how the M5 model tree works.
Figure 4 M5 model tree performance.
2.4 Wavelet transform
Wavelet transform analysis has recently gained popularity for elucidating signal spectral and seasonal data. This prevails from Fourier analysis' fundamental flaw: the Fourier spectrum only provides averaged data globally. As a result, datasets decomposition into their constituent parts utilizing wavelet transform can be used to pre-process data (Alizadeh et al. 2021). Wavelet transform helps forecasting models by capturing critical information at multiple resolution levels. With only a few coefficients, wavelet decomposition of nonstationary time series into multiple scales permits an interpretation of the series structure and a substantial amount of information regarding its history. This is why this technique is frequently employed to analyze time series of nonstationary signals (Zhang et al. 2018).
A signal's wavelet represents the data in the period range. The noise elements are eliminated during the processing, and the signals are dissected into high- and low-frequency elements using high-pass and low-pass procedures. The wavelet transform is vital in time-series data prediction to capture inherent and hidden traits and patterns and recognize confined and non-stationary occurrences. This work proposes that by encoding the input signals in low- and high-frequency datasets, wavelet can detect non-stationarity occurrences in rainfall and runoff data (Ouma et al. 2021). For the wavelet function, a mother wavelet function is built, if (t) is a set of linear square functions with (t) =L2(R) (R = domain), and its Fourier transform meets the compatibility requirement (Equation 2):
(2) |
The mother wavelet is defined as (t), and t represents the time. Wavelet's translation () and scale () factors are combined to get function (t), as shown in Equation 3:
(3) |
Where:
= | continuous wavelet |
The inner combination of the input signal x(t) and (t) is determined as in Equation 4, and its Fourier transform period is obtained as in Equation 5.
(4) |
(5) |
Where:
Time series are identified, denoised, and smoothed using the wavelet transform. The original data set is divided by the wavelet transform into multiple time series with sinusoidal waveform outputs, which are then fed into the neural network. The final output will be a set of data periodicities representing the original signal at various scales and resolutions. The actual profit of decomposing the datasets is that it reveals the hidden aggregate frequency in the data, making it easy to determine factors such as mode variations and temporal change. Figure 5 shows the wavelet transform function.
Figure 5 Wavelet transform performance.
2.5 Wavelet-ANN (WANN) and Wavelet-M5 model tree (W-MT)
The wavelet-based artificial neural network (WANN) model, which connects a wavelet transform to an artificial neural network (ANN) to identify various process features and anticipate runoff amounts, is a valuable tool in rainfall-runoff forecasting. Many factors must be applied to ANNs to simulate any hydrologic system with long-term historical data that uses the WANN approach as a simulation model (Bajirao et al. 2021). Using many inputs without considering their significance in the modeling may significantly drop WANN simulation results. As a result, practical approaches for determining dominant values as modeling inputs are required. The rainfall and runoff subseries processed by wavelet decomposition are input into the WANN and W-MT models. Different time scales are dealt with via Wavelet decomposition. The wavelet transform determines not only the number of frequencies in the signal, but also when those frequencies occur int he signal. The wavelet transform accomplishes this by working at various scales. The large-scale signal is considered first in wavelet transformation, and its large features are analyzed. The signal is then trated with small scales, and the signal's small features are obtained.
Large-scale sub-signals [Ia(t) or Qa(t)] and short-scale sub-signals [Idith(t) or Qdjth(t)] are the aspects of the wavelet decomposition that respect the collocation method, (I is the input indicator, and Q is the output indicator). Various kinds of mother wavelets are used according to the type of process. Using the trial-and-error method, the db4 and db7 mother wavelets were employed for daily and monthly scales, respectively, in the current study (Nourani et al. 2014). It is worth noting that the Daubechies wavelet transform is abbreviated as db; db4 and db7 represent the fourth and seventh order wavelet transforms, respectively. Daubechies wavelets of various orders are included in the software library. The generated wavelet-based sub-time series are also categorized in the W-MT model using the M5 model tree, and then suitable repressors for the classifications are supplied.
According to Figure 5, there are four phases in the suggested hybrid models. The rainfall-runoff data are obtained during the first stage. Any information procedure can be more efficient by using the proper pre-processing stage tool. While handling seasonal and multi-resolution datasets, wavelet transform is among the offered techniques which can be helpful as a data preparation strategy. The capacity of the wavelet decomposition to divide the primary data set into numerous sub-time series is one of the essential features. To maximize the structure of the model, the data is categorized into homogeneous clusters in the third stage. At the fourth step of the suggested process, the trends in given data are eventually selected.
As a result, a hybrid wavelet-ANN model provides a more likely prediction. The wavelet-ANN (WANN) combines wavelet transform and ANN to split an input data set into estimations and detailed parts. The WANN method is very close to the ANN model regarding its essential concepts. The error back-propagation (BP) algorithm is trained for the ANN and WANN models, which have three layers (Nourani et al. 2019; Tiwari et al. 2022). Due to the strengths of both M5 and wavelet methods in treating the hydrological cycle as an innovation, this paper also presents a hybrid approach incorporating wavelet transform and predictive analytics characteristics for rainfall-runoff modeling. Instead of sophisticated non-linear modeling, a model with a system of linear regressions (multi-linear model) that takes advantage of wavelet-based techniques may have been more reliable. The proposed hybrid method is used to assess the reliability of the wavelet-M5 model in the presence of diverse hydrological phenomena. The daily and monthly measurements evaluate the model's ability to account for the system's autoregressive and seasonal characteristics (Curceac et al. 2021).
2.6 Statistical criteria
The following three statistical criteria were used to evaluate the models: Nash–Sutcliffe efficiency (NSE), root mean squared error (RMSE), and mean absolute error (MAE).
(6) |
(7) |
(8) |
(9) |
Where:
N | = | number of data points, and |
= | observed, simulated, mean observed, and mean simulated values, respectively. |
2.7 Evaluation and distribution of data
The rainfall-runoff phenomenon involves the impacts of individual parameters like rainfall, evaporation, and transpiration, so choosing the correct input parameters is essential in rainfall-runoff analysis. According to earlier research, the rainfall-runoff phenomenon is the Markovian methodology (its current deal has the strongest connection to its prior agreements). As a result, the previous discharge rates might implicitly explain the influence of the abovementioned elements. As a result, the present flow discharge (Qt) would be a function of previous precipitation (It-m) and discharge (Qt-n) rates Equation 10 (Sharghi et al. 2019), In Equation 10, f represents function.
(10) |
Because the impacts of different previous precipitation values could be considered indirectly in the prior runoff flow, Equation 10 can be outlined as Equation 11. The duration of the training data iteration time was centered on the behavior of the catchment’s reaction.
(11) |
The ANN, seasonal-based WANN, M5, and wavelet-M5 methods were used to predict the rainfall-runoff phenomenon. The calibration data was used to train the methods, and the verification data was used to test them. In this case, 70% of the data was used in the training state, whereas 30% was used in the testing state. The same input computations were used in the suggested method to conduct multi-step-ahead prediction, the same as single-step-ahead prediction. Table 1 shows the statistical measures of the training and testing datasets.
Table 1 Statistical measures of the training and testing datasets for daily and monthly scales.
Scale | State | Minimum | Maximum | Mean±SD* |
Daily | Train | 0 | 6.39 | 1.29±2.56 |
Test | 0 | 5.40 | 1.19±3.17 | |
Monthly | Train | 0.23 | 5.24 | 1.22±1.83 |
Test | 0.27 | 4.64 | 1.39±2.41 |
3 Results and discussion
3.1 Results of ANN
The neural network was trained using the Backpropagation (BP) algorithm within the ANN and WANN techniques. The number of neurons in the input and hidden layers and the number of training periods were determined by trial and error. Fewer training iterations can lead to improper training, whereas a more extended calculation can lead to overfitting. The ANN was trained using the Levenberg-Marquardt method of the BP program due to its higher convergence rate (Sharghi et al. 2018). System training was terminated whenever the error value in the verification data grew. In this study, the activation function of the nonlinear kernel of neural networks was a sigmoid tangent. Table 2 displays the daily and monthly ANN analysis results. The table shows only the best structures' results.
Table 2 Daily and monthly results of the ANN model.
Scale | Output | Efficiency criteria | ||||||||
NSE | RMSE (m3/s) | MAE (m3/s) | ||||||||
Train | Test | Train | Test | Train | Test | |||||
Daily | Qt+1 | 0.94 | 0.87 | 0.02 | 0.02 | 0.11 | 0.54 | |||
Qt+2 | 0.88 | 0.70 | 0.03 | 0.04 | 0.39 | 0.94 | ||||
Qt+4 | 0.79 | 0.56 | 0.04 | 0.04 | 0.34 | 0.79 | ||||
Qt+7 | 0.71 | 0.51 | 0.05 | 0.06 | 0.82 | 1.31 | ||||
Monthly | Qt+1 | 0.76 | 0.70 | 0.06 | 0.09 | 3.07 | 3.42 | |||
Qt+2 | 0.68 | 0.47 | 0.07 | 0.08 | 4.13 | 5.96 | ||||
Qt+4 | 0.53 | 0.22 | 0.09 | 0.12 | 5.38 | 7.02 | ||||
Qt+7 | 0.44 | 0.15 | 0.10 | 0.14 | 5.84 | 8.16 |
As per the findings in Table 2, the precision of the classic ANN is effectively diminished in such predictions. Because the nonlinear amplification of the error per time step that occurs anytime the predicted amount (which has a minor error per time step) is used as the current input at the next time step. For example, the ANN model's efficiency was decreased by 6%, 16%, and 24% for runoff predicting 2, 4, and 7 days ahead, respectively. On a monthly scale, the percentage decline was raised to 11%, 30%, and 42%, respectively.
The daily scale is handled with plenty of input data samples compared to the monthly scale. It could improve the training state's capabilities and the model's monthly scale efficiency. As a result, the ANN model has another flaw: it is directly proportional to the amount of input data. The ANN model's train and test state findings are shown in Figures 6 and 7 for daily and monthly data.
Figure 6 Results of ANN model in daily scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figure 7 Results of ANN model in monthly scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figures 6 and 7 demonstrate that the ANN model is susceptible to the amount of input data. Therefore, daily results have been more accurate than monthly results. It is also important to note that the train state is more accurate than the test state.
3.2 Results of WANN
The ANN model considered the Markov characteristic of the rainfall-runoff phenomenon (Equation 11), and seasonality was neglected. The WANN method was used to simulate the seasonal characteristics of the system. The rainfall-runoff data set was discretized using wavelet decomposition to better the seasonal pattern obtained at various scales. Because rainfall and runoff have a close relationship, it became clear that both data sets had the same frequencies; as a result, they degraded at a simultaneous rate. The acquired subseries were being investigated as potential ANN model inputs. Table 3 shows the daily and monthly WANN findings.
Table 3 Daily and monthly results of the WANN model.
Scale | Output | Efficiency criteria | ||||||||
NSE | RMSE (m3/s) | MAE (m3/s) | ||||||||
Train | Test | Train | Test | Train | Test | |||||
Daily | Qt+1 | 0.98 | 0.93 | 0.01 | 0.01 | 0.04 | 0.18 | |||
Qt+2 | 0.95 | 0.86 | 0.01 | 0.02 | 0.07 | 0.34 | ||||
Qt+4 | 0.92 | 0.81 | 0.02 | 0.02 | 0.21 | 0.73 | ||||
Qt+7 | 0.87 | 0.64 | 0.03 | 0.04 | 1.13 | 1.40 | ||||
Monthly | Qt+1 | 0.96 | 0.85 | 0.01 | 0.03 | 0.32 | 1.06 | |||
Qt+2 | 0.89 | 0.79 | 0.02 | 0.05 | 0.84 | 1.67 | ||||
Qt+4 | 0.83 | 0.64 | 0.04 | 0.07 | 1.26 | 2.58 | ||||
Qt+7 | 0.78 | 0.59 | 0.06 | 0.11 | 2.16 | 4.59 |
The efficiency of the WANN method worsened even as the process it determined rose, similar to the ANN method since the error is multiplied nonlinearly and impacts the method's overall results. Compared to 1-day-ahead prediction, the accuracy of runoff predicting for 2, 4, and 7 days ahead has been lowered by 3%, 6%, and 11%, respectively. However, the decrease was 7%, 14%, and 19% on the monthly scale.
Another matter identified is the disparity in NSEs between the training and testing processes. In contrast, the quality of the WANN model improved slightly during the testing stage. However, it is a considerable distance from being ideal for the training state (since nonlinear systems, especially the WANN method, highly depend on input data, the amount in the test state is typically lower than in the train state). The training and testing findings of the WANN are shown in Figures 8 and 9 for daily and monthly data.
Figure 8 Results of WANN model in daily scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figure 9 Results of WANN model in monthly scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figures 8 and 9 show that the hybrid WANN model improves the daily and monthly accuracy, and training and testing states. Nonetheless, the dependence on the quantity of input data remains.
3.3 Results of M5 model tree
The multilinear M5 model tree was employed in the analysis after the approach was modeled using nonlinear kernel models (ANN and WANN). The M5 model tree separates the input set's nonlinear space over multiple classes (clusters), which can also be characterized by simple linear regression. The initial stage of the M5 analysis was to choose the prevalent input parameters. Rather than a complex nonlinear regression of overall input data, the M5 method splits the data into several groups and afterwards offers a linear regression per class (multilinear model). Table 4 shows the daily and monthly findings of the M5 model tree analysis.
Table 4 Daily and monthly results of the M5 model.
Scale | Output | Efficiency criteria | ||||||||
NSE | RMSE (m3/s) | MAE (m3/s) | ||||||||
Train | Test | Train | Test | Train | Test | |||||
Daily | Qt+1 | 0.93 | 0.88 | 0.02 | 0.03 | 0.08 | 0.67 | |||
Qt+2 | 0.86 | 0.73 | 0.05 | 0.07 | 0.27 | 0.58 | ||||
Qt+4 | 0.76 | 0.58 | 0.06 | 0.09 | 0.56 | 1.08 | ||||
Qt+7 | 0.68 | 0.53 | 0.07 | 0.10 | 1.04 | 1.23 | ||||
Monthly | Qt+1 | 0.66 | 0.65 | 0.08 | 0.09 | 2.41 | 2.87 | |||
Qt+2 | 0.61 | 0.59 | 0.11 | 0.13 | 3.52 | 4.72 | ||||
Qt+4 | 0.54 | 0.56 | 0.13 | 0.16 | 4.17 | 7.46 | ||||
Qt+7 | 0.47 | 0.42 | 0.17 | 0.21 | 6.03 | 8.64 |
As per the findings in Table 4, the M5 model's efficiency was decreased by 8%, 18%, and 27% for runoff, predicting 2, 4, and 7 days ahead, respectively. On a monthly scale, the percentage decline was raised to 8%, 18%, and 29%, respectively. Given the above values, the accuracy drop of the M5 model is almost the same on a daily and monthly scale and does not depend on the number of input data. The train and test findings of the M5 model are shown in Figures 10 and 11 for daily and monthly data.
Figure 10 Results of M5 model in daily scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figure 11 Results of M5 model in monthly scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figures 10 and 11 show that the difference between train and test states is negligible in the M5 model tree. Therefore, the number of input data has no bearing on the performance of this model. In addition, the forecast accuracy on a daily scale compared to a monthly scale is slightly higher than the train state test.
3.4 Results of Wavelet-M5
The multiscale rainfall-runoff dataset was divided into short- and long-term temporal sub-signals by wavelet decomposition in wavelet-M5 analysis to manage the included pattern in the primary data set. Each sub-time series was fed into the M5 model tree as an input. The M5 model tree for standard deviation classifies the dataset by assigning breaking criteria to the root node and branch. Then, any collection of data sets is fitted with a linear regression model. Table 5 shows the daily and monthly findings of the wavelet-M5 analysis.
Table 5 Daily and monthly results of the wavelet-M5 model.
Scale | Output | Efficiency criteria | ||||||||
NSE | RMSE (m3/s) | MAE (m3/s) | ||||||||
Train | Test | Train | Test | Train | Test | |||||
Daily | Qt+1 | 0.97 | 0.92 | 0.01 | 0.02 | 0.07 | 0.11 | |||
Qt+2 | 0.94 | 0.87 | 0.03 | 0.05 | 0.09 | 0.29 | ||||
Qt+4 | 0.91 | 0.83 | 0.05 | 0.06 | 0.18 | 0.31 | ||||
Qt+7 | 0.84 | 0.68 | 0.07 | 0.09 | 0.94 | 1.27 | ||||
Monthly | Qt+1 | 0.93 | 0.91 | 0.03 | 0.06 | 0.38 | 0.86 | |||
Qt+2 | 0.87 | 0.86 | 0.05 | 0.07 | 1.04 | 1.51 | ||||
Qt+4 | 0.83 | 0.82 | 0.07 | 0.11 | 2.53 | 3.06 | ||||
Qt+7 | 0.77 | 0.75 | 0.10 | 0.13 | 1.64 | 1.82 |
As per the findings in Table 5, the wavelet-M5 model's efficiency was decreased by 3%, 6%, and 13% for runoff, predicting 2, 4, and 7 days ahead, respectively. On a monthly scale, the percentage decline was raised to 6%, 11%, and 17%, respectively. According to the mentioned numbers, wavelet decomposition has increased forecast precision. The train and test findings of the wavelet-M5 model are shown in Figures 12 and 13 for daily and monthly data.
Figure 12 Results of wavelet-M5 model in daily scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figure 13 Results of wavelet-M5 model in monthly scale: (a) Time series of discharge for train state; (b) Scatter plot for train state; (c) Time series of discharge for test state; (d) Scatter plot for test state.
Figures 12 and 13 indicate that the application of wavelet transform has enhanced modeling precision. In addition, the wavelet-M5 model's accuracy does not diminish as much as the WANN model as the forecast horizon lengthens. The Leilanchai watershed exhibits a relatively consistent pattern of behavior. Multilinear models perform better than nonlinear models in these watersheds (Nourani et al. 2019).
3.5 Comparison of models
Table 6 presents the effect of hybrid models with wavelet transform on increasing the forecast accuracy.
Table 6 Effect of hybrid models with wavelet transform on increasing the forecast accuracy.
Models | Output | Scale | |||||
Daily | Monthly | ||||||
Train | Test | Train | Test | ||||
WANN vs ANN | Qt+1 | 4% | 7% | 26% | 21% | ||
Qt+2 | 8% | 23% | 31% | 68% | |||
Qt+4 | 16% | 45% | 57% | 191% | |||
Qt+7 | 23% | 25% | 77% | 293% | |||
wavelet-M5 vs M5 | Qt+1 | 4% | 5% | 41% | 40% | ||
Qt+2 | 9% | 19% | 43% | 46% | |||
Qt+4 | 20% | 43% | 54% | 46% | |||
Qt+7 | 24% | 28% | 64% | 79% |
The WANN method performed better in multi-step-ahead predicting than the ANN method. For example, at training and verification states, wavelet-based decompression enhanced the ANN's efficiency by 23% and 25% for 7-day-ahead predicting, respectively. The differences between daily and monthly datasets must be considered while modeling rainfall-runoff. The monthly dataset has fewer samples than the daily dataset, and therefore, the seasonal pattern differs significantly from the Markovian feature. As a result, WANN could manage both the Markovian and seasonal aspects of the approach. As a result, WANN performed admirably for both daily and monthly forecasts. Although the WANN model outperformed the ANN model, the content of analyses within that forecast rose considerably because of the expansion in input data.
The wavelet-M5 model's wavelet-based data preparation could dramatically enhance prediction accuracy, bringing the multilinear wavelet-M5 detection accuracy closer to the nonlinear WANN method. The other consideration is the NSE's closeness during the training and testing. Wavelet-M5 is not related to the number of data and is appropriate for procedures lacking a high data volume. Because the wavelet-M5 method is based on the M5 tree, all of the M5 tree's favorable characteristics are employed. Features include:
- understanding its structure,
- avoiding mistakes in magnification,
- using the overlap principle,
- equal ability in training and testing steps,
- minor variations in the precision of various data-sharing systems, and
- appropriate efficiency in multi-stage prediction.
In contrast, ANN and WANN mentioned qualities might help the participants employ many input variables without affecting accuracy.
The shortcomings and limitations in each of the numerical models, as well as encountering a large number of input data in combination with wavelet transformation, are among the limitations of the current study. Other limitations include conducting a study on a specific watershed, not comparing results with those of other watersheds, and not comparing results with those of physical models. It is suggested to investigate the performance of the wavelet-M5 and WANN models for forecasting several hydrological events. Comparing the wavelet-M5 and WANN methods' abilities to other events is necessary. In runoff simulation and forecasting, it is recommended to pay special attention to peak and time change errors to prevent significant time change errors in graphs comparing observed versus simulated time series.
4 Conclusion
In this study, rainfall runoff in the Leilanchai watershed was investigated. In this regard, ANN, WANN, M5 model tree, and wavelet-M5 models were used, and the results were compared. As a result, the effect of using hybrid models with wavelet transform was determined. Multiresolution rainfall-runoff time series affects both nonlinear and multilinear data analysis. According to the results, implementing a wavelet is possible to boost the ANN model's efficiency by 4% daily, and 26% monthly. The wavelet decomposition improved the efficiency of the M5 model tree by 4% on a daily scale, and 41% on a monthly scale. The suggested wavelet-M5 method's efficiency is satisfactory and comparable to the nonlinear WANN method.
The findings indicate that the model's performance diminishes when the forecast horizon increases. Hence, the error in nonlinear approaches expands nonlinearly, whereas the error in linear methods does not greatly increase and stays unchanged. As a result, multilinear techniques can produce better results in multi-step-ahead forecasts than nonlinear techniques. In such modeling, the efficiency of the predictions has a stronger connection with the activity of the watersheds. An additional advantage of the combination system multilinear wavelet-M5 method is that it may train it effectively with small data. A small volume of training datasets, similar to another nonlinear predicting approach, can influence the results of the WANN.
Author Declarations
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
References
- Adnan, R.M., A. Petroselli, S. Heddam, C.A. Guimarães Santos, and O. Kisi. 2021. “Comparison of different methodologies for rainfall–runoff modeling: machine learning vs conceptual approach.” Natural Hazards 105: 2987-3011.
- Alizadeh, A., A. Rajabi, S. Shabanlou, B. Yaghoubi, and F. Yosefvand. 2021. “Modeling long-term rainfall-runoff time series through wavelet-weighted regularization extreme learning machine.” Earth Science Informatics 14: 1047-1063.
- Asati, S.R., and S.S. Rathore. 2012. “Comparative study of stream flow prediction models.” International Journal of Life science and Pharma Research 1 (2): 139-151.
- Ba, H., S. Guo, Y. Wang, X. Hong, Y. Zhong, and Z. Lio. 2017. “Improving ANN model in runoff forecasting by adding soil moisture input and using data preprocessing techniques.” Hydrology Research 49: 744–760.
- Bajirao, T.S., P. Kumar, M. Kumar, A. Elbeltagi, and A. Kuriqi. 2021. “Potential of hybrid wavelet-coupled data-driven-based algorithms for daily runoff prediction in complex river basins.” Theoretical and Applied Climatology 145: 1207-1231.
- Bhattacharya, B., and D.P. Solomatine. 2005. “Neural networks and M5 model trees in modeling water level–discharge relationship.” Neurocomputing 63: 381–396.
- Curceac, S., A. Milne, P.M. Atkinson, L. Wu, and P. Harris. 2021. “Elucidating the performance of hybrid models for predicting extreme water flow events through variography and wavelet analyses.” Journal of Hydrology 598: 126442.
- Dastorani, M.T., A. Moghadamnia, J. Piri, and M.R.R. Jamalizadeh. 2009. “Application of ANN and ANFIS models for reconstructing missing flow data, Electronic supplementary material.” Environmental Monitoring and Assessment 10: 1007-1012.
- Faghih, H., J. Behmanesh, H. Rezaie, and K. Khalili. 2022. “Application of artificial intelligence in agrometeorology: A case study in Urmia Lake basin, Iran.” Theoretical and Applied Climatology, Preprint, submitted May 2021. https://doi.org/10.21203/rs.3.rs-565358/v1
- Fayaz, S.A., M. Zaman, and M.A. Butt. 2022. “Numerical and experimental investigation of meteorological data using adaptive linear M5 model tree for the prediction of rainfall.” Review of Computer Engineering Research 9: 1.
- Feng, Z.K., W.J. Niu, Z.Y. Tang, Z.Q. Jiang, Y. Xu, Y. Liu, and H.R. Zhang. 2020. “Monthly runoff time series prediction by variation-al mode decomposition and support vector machine based on quantum-behaved particle swarm optimization.” Journal of Hydrology 583: 124627.
- Gao, S., Y. Huang, S. Zhang, J. Han, G. Wang, M. Zhang, and Q Lin. 2020. “Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation.” Journal of Hydrology 589: 125188.
- Gholami, V., M.J. Booij, E. Nikzad Tehrani, and M.A. Hadian. 2018. “Spatial soil erosion estimation using an artificial neural network (ANN) and field plot data.” CATENA 163: 210-218.
- Khan, N., D.A. Sachindra, S. Shahid, K. Ahmed, M.S. Shiru, and N. Nawaz. 2020. “Prediction of droughts over Pakistan using machine learning algorithms.” Advances in Water Resources 139: 103562.
- Kisi, O. “Machine Learning with metaheuristic algorithms for sustainable water resources management.” 2021. Sustainability 13 (15): 8596.
- Kisi, O., S. Heddam, B. Keshtegar, J. Piri, and R.M. Adnan. 2022. “Predicting daily streamflow in a cold climate using a novel data mining technique: Radial M5 model tree.” Water, 14 (9): 1449.
- Lallahem, S., and J. Mania. 2003. “A nonlinear Rainfall-Runoff Model using neural network technique: example in fractured porous media.” Mathematical and Computer Modelling 37 (9-10): 1047-1061.
- Lee, S., K.K. Lee, and H. Yoon. 2019. “Using artificial neural network models for groundwater level forecasting and assessment of the relative impacts of influencing factors.” Hydrogeology Journal 27: 567–579.
- Liu, Z., Q. Li, J. Zhou, W. Jiao, and X. Wang. 2021. “Runoff prediction using a novel hybrid ANFIS model based on variable screening.” Water Resources Management 35: 2921-2940.
- Londhe, S., and S. Charhate. 2010. “Comparison of data-driven modelling techniques for river flow forecasting.” Hydrological Sciences Journal 55 (7): 1163-1174.
- Mohammadi, F., A. Fakheri Fard, and M.A. Ghorbani. 2019. “Application of cross-wavelet–linear programming–Kalman filter and GIUH methods in rainfall–runoff modeling.” Environmental Earth Sciences 78: 168.
- Nourani, V., A.H. Baghanam, J. Adamowski, and O. Kisi. 2014. “Applications of hybrid wavelet–Artificial Intelligence models in hydrology: A review.” Journal of Hydrology 514: 358–377.
- Nourani, V., A.D. Tajbakhsh, A. Molajou, and H. Gokcekus. 2019. “Hybrid wavelet-M5 model tree for rainfall-runoff modeling.” Journal of Hydrologic Engineering 24 (5): 04019012.
- Okkan, U., Z. Beril Ersoy, A.A. Kumanlioglu, and O. Fistikoglu. 2021. “Embedding machine learning techniques into a conceptual model to improve monthly runoff simulation: A nested hybrid rainfall-runoff modeling.” Journal of Hydrology 598: 126433.
- Ouma, Y.O., R. Cheruyot, and A.N. Wachera. 2021. “Rainfall and runoff time-series trend analysis using LSTM recurrent neural network and wavelet neural network with satellite-based meteorological data: Case study of Nzoia hydrologic basin.” Complex and Intelligent Systems 8: 213-236.
- Prasad, R., R. Deo, Y. Li, and T.N. Maraseni. 2017. “Input selection and performance optimization of ANN-based streamflow forecasts in a drought-prone Murray Darling Basin using IIS and MODWT algorithm.” Atmospheric Research 197: 42-63.
- Rezaeianzadeh, M., L. Kalin, and C.J. Anderson. 2015. “Wetland water-level prediction using ANN in conjunction with base-flow recession analysis.” Journal of Hydrologic Engineering D4015003: 1-12.
- Sharghi, E., V. Nourani, H. Najafi, and A. Molajou. 2018. “Emotional ANN (EANN) and wavelet-ANN (WANN) approaches for Markovian and seasonal based modeling of rainfall-runoff process.” Water Resources Management 32 (10): 3441–3456.
- Sharghi, E., V. Nourani, A. Molajou, and H. Najafi. 2019. “Conjunction of emotional ANN (EANN) and wavelet transform for rainfall-runoff modeling.” Journal of Hydroinformatics 21 (1): 136–152.
- Shoaib, M., A.Y. Shamseldin, S. Khan, M. Muneer Khan, Z.W. Mahmood Khan, and B. Melville. 2018. “A wavelet-based approach for combining the outputs of different rainfall–runoff models.” Stochastic Environmental Research and Risk Assessment 32: 155-168.
- Solomatine, D.P., and K.N. Dulal. 2003. “Model trees as an alternative to neural networks in rainfall-runoff modeling.” Hydrological Sciences Journal 48 (3): 399–411.
- Solomatine, D.P., and Y. Xue. 2004. “M5 model trees and neural networks: Application to flood forecasting in the upper reach of the Huai River in China.” Journal of Hydrologic Engineering 9 (6): 491–501.
- Tiwari, D.K., H.L. Tiwari, and R. Nateriya. 2022. “Runoff modeling in Kolar river basin using hybrid approach of wavelet with artificial neural network.” Journal of Water and Climate Change 13 (2): 963–974.
- Wei, S., H. Yang, J. Song, K. Abbaspour, and Z. Xu. 2013. “A wavelet-neural network hybrid modelling approach for estimating and predicting river monthly flows.” Hydrological Sciences Journal 58 (2): 374-389.
- Zamrane, Z., G. Mahé, and N.E. Laftouhi. 2021. “Wavelet analysis of rainfall and runoff multidecadal time series on large river basins in Western North Africa.” Water 13: 3243.
- Zhang, Z., Q. Zhang, V.P. Singh, and P. Shi. 2018. “River flow modelling: comparison of performance and evaluation of uncertainty using data-driven models and conceptual hydrological models.” Stochastic Environmental Research and Risk Assessment 32 (9): 2667-2682.