Surface Water Quality Assessment, Prediction, and Modeling of the River Daya in Odisha
Abstract
A decision tree-based approach is projected to predict surface water quality and is a good tool to assess the quality and guarantee the safe use of water for drinking. Modeling surface water quality using artificial intelligence-based models is essential in projecting suitable mitigation measures; however, it remains a challenge and requires further research to enhance the modeling accuracy. Because of the serious effects of low water quality, a faster and less expensive solution is required. With this motivation, this research explores a series of supervised machine learning algorithms to estimate the water quality. The objective of this study is to assess the surface water quality of the Daya watercourse to determine the optimal procedure to measure quality of drinking water. Samples were collected from designated locations throughout different seasons (winter, summer, rainy) over a period of five years (2016, 2017, 2018, 2019, and 2020). Total dissolved solids, pH, alkalinity, chloride, nitrate, total hardness, calcium, magnesium, iron, fluoride, were all tested, as well as total coliform, fecal coliform, and E. coli. Through this decision tree regression model, accuracy of prediction is 93.77%. This is a significant result, indicating that the decision tree-based approach has the potential to be a useful tool for surface water quality prediction. However, it is important to note that there may be limitations and uncertainties in the model, and further research and validation may be required to improve the accuracy and dependability of forecasts. The catastrophic consequences of poor water quality, as well as the need for faster and less expensive technologies for testing water quality, are the driving factors in this study. The study's findings can help to improve knowledge of water quality in the Daya watercourse and enhance the decision-making processes to ensure safe drinking water.
1 Introduction
Water is the prime component responsible for life on earth. The six billion people on the planet use over a third of the world's total available renewing water; however, billions of people lack access to basic water facilities. Among alternative countries, the Republic of India is one of the few countries that invests moderately in smartly used land as a water resource. The Republic of India is a country with immense geographic, biological, and climatic diversity. The average annual precipitation is approximately 4000 billion cubic meters (BCM). Annual water resources in various stream basins are estimated to be 1869 BCM, of which 1086 BCM is usable, with 690 BCM of surface water and 396 BCM of groundwater. The remaining water is lost in a variety of ways, including through wastage and evaporation.
In India, the surface water flows through 14 major river basins. Additionally, there are 55 minor and 44 medium watercourse basins. These rivers are quick flowing and principally monsoon fed. With the increase in the population and improved living standards, as well as the spatial and temporal variations in precipitation, there is a demand for water resources and H2O in sensible areas. Consequently, the per capita convenience of water is reducing daily. However, when compared to groundwater resources, the country's surface water resources have a far larger volume. Climate change influences precipitation, which has an impact on the amount of water available. Increased numbers from both point and nonpoint sources, on the other hand, have an impact on the quality of surface water. Since the majority of the country's rivers do not appear to be perennial, groundwater is the only thing that keeps the populace afloat during the dry months. In these watercourse systems, there is a huge difference in the amount and quality of groundwater discharge from one place to the next. With a few exceptions, all medium and minor watercourse basins begin in the highlands and contain fast-flowing, monsoon-fed streams within the cragged regions. Even by the time they reach the plains, they are still mostly conveyed as periodic event streams. Due to the sheer seasonal flow characteristics of such rivers, treated or untreated outflows from certain resources will find their way into rivers. During monsoon season, when rain falls into a watercourse, the discharge of pollutants, as well as the rate and depth of flow, fluctuate according to the tides. Because stormwater flows downstream, the time it takes for pollutants to be flushed out is significantly reduced. Several of the major watercourse basins are also dry during summer, caused by a lack of water to dilute added wastewater.
2 Literature review
An overview of the methods used to identify and categorize water quality is provided in this section. Deep neural networks, recurrent neural networks, neuro-fuzzy inference, and support vector regression are some of the methods used. Several studies have investigated the capacity to anticipate water quality parameters such as dissolved oxygen (DO), chlorophyll a, and the water quality index (WQI).
For example, in tests by Barzegar et al. (2020), Convolutional Neural Network (CNN)-Long Short-Term Memory (LSTM) amalgam models outperformed standalone CNN and LSTM models, as well as other machine-learning models such as Support Vector Regression (SVR) and decision trees. The model was used to predict oxygen levels and chlorophyll a. Oladipo et al. (2021) investigated fuzzy logic inference (FLI) and water quality index (WQI) methods to measure water quality in Nigeria, and he found that the FLI approach was more effective than his WQI method. On the other hand, Asadollah et al. (2021) used an ensemble machine-learning technique called Extra Tree Regression (ETR) to predict WQI scores for Hong Kong. Li et al. (2018) proposed a synthetic model combining a sparse autoencoder and his LSTM to estimate dissolved oxygen in aquaculture. To predict the WQI in Malaysia, Hameed et al. (2017) developed two neural network techniques. Radial basis function neural network (RBFNN), and Binarized Neural Network (BNN) models take longer to train, and longer to predict. A hybrid machine learning approach called random tree and bagging (BA-RT) was developed by Bui et al. (2020) and introduced cross-validation to provide predictions with a high level of accuracy.
Rajaee et al. (2020) discovered that artificial neural networks and wavelet neural networks are commonly utilized to forecast water quality in a thorough examination of 51 studies published between 2000 and 2016. Samsudin et al. (2019) developed an artificial neural network based on major water quality variables discovered using spatially discriminant analysis. Yilma et al. (2018) predicted the WQI of the Akaki River in Ethiopia with greater than 90% accuracy using an artificial neural network with multiple hidden layers. Imani et al. (2021) predicted water quality resilience in Sao Paulo, Brazil using an artificial neural network with a single hidden layer. However, these studies required a significant amount of water quality data for good prediction, which can be costly and time-consuming to obtain.
With different degrees of success, other methods, including decision trees, gradient boosting, polynomial regression, and support vector regression have also been employed to predict the WQI. Wang et al. (2017) achieved over 90% accuracy with support vector regression, but the model was computationally intensive due to an extensive use of water quality data. For time-series water quality data, Li et al. (2019) proposed an amalgam model combining recurrent neural networks and the Dempster-Shafer theory. However, fitting and testing the model required special data processing. Overall, the existing literature survey demonstrates the use of various machine learning and statistical methods for water quality prediction, with different levels of accuracy and computational requirements. Further research may be needed to develop more accurate and computationally efficient models for predicting water quality factors and the WQI, considering data availability and cost. The core motivation driving this research is the urgent necessity to confront pivotal water resource challenges in India, encompassing concerns like fluctuations in surface water quality and quantity. Additionally, there is a crucial need for efficient water resource management, given the shifting climate dynamics and the increasing demands stemming from a growing population. The study is propelled by the objective of delving into and evaluating a diverse array of machine learning and statistical methodologies aimed at predicting water quality parameters. The central focus lies in attaining elevated levels of precision and computational efficiency, while also addressing the intricate issues tied to data availability.
3 Study area
The Daya River Basin, (Figure 1) originates in Orissa, India at Badahati, and runs for 37 km before draining into Chilika Lake in the state's northeastern region. The Kuakhai River's branch flows through the districts of Khurda and Puri and is joined by the Malguni stream below Golabai. The historically significant Dhauli Hills, considered to be the site of the Kalinga War, are located on the banks of the Daya River, about 8 km south of Bhubaneswar. The mounds are surrounded by a large open space and have important Ashoka edicts etched into a rock mass near the upper road.
Figure 1 Daya River map taken from Google Earth.
Despite its importance, the Daya River is heavily polluted by waste from numerous businesses and human waste discharged through the Gangua Canal. This pollution affects the more than 1.2 million people who rely on the river for their water needs. A planned study project is focused on the basin, and its findings may be used to address pollution and other environmental issues in the area.
4 Methodology
The water was meticulously tested at different times of the year, over a five-year period, to establish the water quality of a river basin. Summer, wet season, and winter were used to categorize the data. The months of March, April, May, and June are the summer season; July, August, and September are the rainy season; while October, November, December, January, and February make up the winter season.
Of the fourteen physiological and biological water quality parameters listed in Table 1, only three parameters (pH, TDS, and iron) were considered and used for modeling and prediction using a decision tree approach. At the four hydrologic stations listed in Table 2, all samples were collected personally by the author.
Table 1 Descriptive data for water quality parameters.
Water Parameters | Means of Summer | Means of Winter | Means of Rainy Season |
TDS (mg/l) | 91.27±5.37 | 73.04±3.97 | 113.87±7.67 |
pH | 6.67±0.01 | 6.68±0.01 | 7.08±0.01 |
Total Alkalinity (mg/l) | 14±4.2 | 14±4 | 17±6.1 |
Total Hardness (mg/l) | 4.2±11 | 4±10 | 6.1±10 |
Calcium as Ca (mg/l) | 11±3.9 | 10±2 | 10±4.2 |
Magnesium as Mg (mg/l) | 5.65±0.71 | 6.08±1.11 | 7.16±0.54 |
Chloride as Cl (mg/l) | 13.04±1.43 | 12.36±1.38 | 12.42±1.35 |
Sulphate as SO4 (mg/l) | 6.3±1.14 | 3.42±0.83 | 6.28±1.11 |
Nitrate as NO3 (mg/l) | 1.08±0.11 | 1.1±0.13 | 1.17±0.23 |
Total Iron as Fe (mg/l) | 0.71±0.16 | 0.58±0.11 | 0.57±0.09 |
Fluoride as F (mg/l) | 0.1±0 | 0.1±0 | 0.1±0 |
Total Coliform (MPN/100ml) | 25.22±3.04 | 21.99±2.77 | 23.87±3.25 |
Fecal Coliform (MPN/100ml) | 9.54±1.99 | 8.46±1.08 | 6.22±1.49 |
Table 2 Geocoordinates of the Daya River sampling sites.
Sl. No. | Sampling sites | Name of the locality | Latitude | Longitude |
1 | L-1 | Daya Bridge,Lingipur | 20°12'32.8"N | 85°51'05.0"E |
2 | L-2 | Krushnapur | 20°12'37.5"N | 85°50'42.7"E |
3 | L-3 | Itipur | 20°12'11.1"N | 85°50'06.6"E |
4 | L-4 | Dolabedi,Palaspursasan | 20°11'23.1"N | 85°49'56.9"E |
5 Decision Tree
A decision tree classifier is a simple and effective method for classifying incoming data by decomposing a large decision into a series of smaller decisions. It can efficiently classify data and reduce complexity while automatically selecting features. The structure of a decision tree provides valuable information about the classifier's ability to predict or generalize. In addition, a decision tree classifier is useful for implementing periodic or shift classifiers. In a periodic classifier, the classification decision is based on data from a specific time period, such as a season, a month, or a day. In a stratified classifier, the classification decision is based on data from different strata or categories, e.g., different geographic locations or demographic groups.
Overall, a decision tree classifier is a powerful tool for data classification and prediction and can be used in a variety of applications, including water quality assessment.
6 Model
The Decision-tree regression algorithm was chosen as the prediction model for the given dataset for two main reasons:
- Relatively small dataset: Since the dataset used in this study has been pre-processed and features have been selected, resulting in a relatively small number of samples (less than 5,000), it is considered a small dataset. Machine learning models can be challenging to train on small datasets and may not provide accurate predictions. However, decision tree-based algorithms, such as decision-tree regression, can handle small datasets effectively and provide accurate predictions.
- Numerical data analysis: The data set includes numerical parameters such as pH, total iron content, and total dissolved solids (TDS). These parameters may have complex dependencies on each other, and the target parameter (water quality assessment) may depend on multiple attributes. Decision-tree regression is capable of capturing non-linear relationships among variables, making it suitable for analyzing numerical data with many-to-one relationships.
Additionally, decision trees are interpretable and easy to visualize, which can aid in understanding the relationships between variables and interpreting the results of the prediction model. Decision trees are also computationally efficient, making them a suitable choice for small datasets.
Overall, decision tree regression is an excellent choice for developing a predictive model for the data set provided because of its ability to handle small data sets, capture nonlinear correlations, and provide interpretable results. However, it is critical to properly examine the performance of the decision tree regression model using appropriate evaluation metrics and confirm the results to ensure the accuracy and reliability of the predictions.
7 Software used for Decision Tree
Python was used for model creation and prediction analysis. The decision tree regression algorithm framework with a maximum depth of three was imported using the scikit-learn package. The dataset was split into training and test data, with a ratio of 7:3. The train_test_split method of the scikit-learn package was used for this purpose. For feature analysis and selection, a seaborn package and an Extra Trees Regressor model were used. For graphical visualization, a matplotlib package was used. For the Exploratory Data Analysis (EDA) of the dataset, Pandas and NumPy were used.
8 Dataset
The dataset utilized in this study consists of the Daya River parameter samples gathered during a five-year period, from 2015 to 2019. The dataset includes measurements of several characteristics that are supposed to be essential for determining how ergonomically the river's water is used. These parameters include:
- Total Dissolved Solids (TDS): This parameter measures the concentration of dissolved solids in the water, which can affect the water quality and suitability for various uses.
- Total coliform: This parameter measures the presence of coliform bacteria in the water, which can indicate the overall microbial contamination and safety of the water for consumption.
- Fecal coliform: This parameter measures the presence of fecal coliform bacteria in the water, which can specifically indicate fecal contamination and potential health risks.
- Presence of E. coli: This parameter indicates the presence or absence of E. coli bacteria in the water, which can be an important indicator of fecal contamination and potential health hazards.
- Fluoride content: This parameter measures the concentration of fluoride in the water, which can have an impact on dental health and overall water quality.
- Hardness: This parameter assesses the concentration of hardness-forming minerals in water, which can impact the usability of water for a variety of applications such as drinking water, industrial operations, and irrigation.
- Alkalinity: This parameter measures the buffering capacity of the water, which can affect the pH stability and overall water quality.
The dataset contains a total of 14 parameters, out of which 13 are continuous in form, meaning they are measured as numerical values, and the presence of E. coli is indicated by categorical values of "Present" or "Absent". These parameters are important in assessing the overall water quality and ergonomic use of the Daya River and can provide valuable information for environmental and public health assessments.
9 Decision Tree algorithm
The Decision Tree method is used to highlight characteristics in a data collection and forecast data in the future in order to generate continuous meaningful output. The Regression Analysis method is used because the parameters are in continuous form and codependency is high. The prediction output from the model can be evaluated using R2, MSE (Mean-squared Error), and MAPE (Mean Absolute Percentage Error).
10 Results
10.1 Exploratory Data analysis
Pre-processing
The dataset samples were sorted according to the timeline, and then regrouped by averaging, based on their collection.
Checking for Null and outliers
Null/ Nan values were checked and removed in each parameter column. Outliers were checked using boxplot graphs. They were removed if they were too far from the margin or were replaced by the mean of the parameter samples.
Trend evaluation
The trend of the target parameters was evaluated before and after pre-processing based on the year of collection.
Feature analysis and selection
Cross-correlation: the dependency and relationships among the parameters were checked using a cross-correlation plot.
- If correlation = 1, the attribute can’t be used as it would bias the algorithm.
- If correlation <= 0, no or negative impact on the model prediction process.
- If correlation > 0.50, the target can be dependent on the attribute.
- If correlation >= 0.75, this attribute can be used for the model prediction with high accuracy.
ExtraTreesRegression model
The model uses similar regression techniques to determine the relationship of the parameters with the set target parameter, then the model scales the parameters from 0-100, based on their high influence on the target parameter.
Model accuracy
R2 (coefficient of determination): The statistical metric indicates how closely the regression line may resemble the actual data.
(1) |
Where:
yi | = | actual ith sample; |
yi’ | = | predicted ith sample; and |
Y | = | target parameter’s mean. |
R2 has a range from 0 to 1. The better a model can forecast for the target parameter, the higher its R2.
MSE (Mean Squared Error): MSE gives the standard deviation of the prediction errors for the residuals. It indicates how far the errors scatter from the main concentration of authentic data points.
(2) |
Where:
n | = | total number of samples. |
The lower the MSE for a model, the more accurate the prediction with fewer residuals.
MAPE (Mean Absolute Percentage Error): This measure is used for evaluating prediction model accuracy.
Formula is depicted as MAPE = abs((Actual - Forecast)/(Actual)).
Mathematical formula:
(3) |
10.2 Prediction with Decision Tree for TDS
The graphs for TDS pre-processing and post-processing are shown in Figures 2 and 3, respectively.
Figure 2 TDS plot – before pre-processing.
Figure 3 TDS plot – after pre-processing.
Figure 4 depicts the evaluation and comparison of anticipated and real TDS values using the decision tree model. The model's performance is measured using three metrics: R2, MSE, and accuracy.
Figure 4 Comparison of predicted and actual values for TDS.
R2, or the correlation coefficient, is a statistical measure of how well the model agrees with the observed data. An R2 value of 91.19% suggests that the decision tree model can explain 91.19% of the variance in the TDS data.
Table 3 Model accuracy values for water quality parameters.
Model Accuracy Values | |||
Parameter | R2 | MSE | MAPE |
TDS | 91.19% | 12.88 | 97.33% |
pH | 91.43% | 0.02 | 98.57% |
Iorn | 87.64% | 0.004 | 93.77% |
MSE, or mean squared error, is a measure of the average squared differences between the predicted and actual values. A lower MSE value indicates better accuracy of the model in predicting the TDS values. Here, the MSE value of 12.88 is relatively low, which indicates that the decision tree model has a high level of accuracy.
Another crucial indicator that shows the proportion of cases that were successfully identified is model accuracy. The accuracy of the model in this instance is 97.33%, indicating that there is a high degree of confidence in the model's ability to predict TDS values.
Overall, the assessment metrics show that the decision tree model is effective at forecasting TDS levels and may be used to assess the quality of water in the area surrounding the river.
The accuracy scores for TDS, pH, and iron, as derived from the data in Table 3, are as follows: 97.33%, 98.57%, and 93.77%, respectively.
10.3 Prediction with Decision Tree for pH
Figures 5 and 6 show the pH trend from data collected pre-processing and after processing and correlating with others.
Figure 5 Plot for pH before pre-processing.
Figure 6 Plot for pH after pre-processing.
The comparison of the predicted and actual pH values using the decision tree model is shown in Figure 7. R2, MSE, and accuracy are the three measures used to assess the model's performance.
Figure 7 Comparison of predicted and actual pH values.
R2, or the coefficient of determination, is a statistical measure that indicates how well the model fits the observed data. An R2 value of 91.43% suggests that the decision tree model can explain 91.43% of the variance in the pH data.
MSE, or mean squared error, is a measure of the average squared differences between the predicted and actual values. A lower MSE value indicates better accuracy of the model in predicting the pH values. Here, the MSE value of 0.02 is very low, which indicates that the decision tree model has a high level of accuracy in predicting pH values.
The accuracy of the model is another important metric that indicates the percentage of correctly classified instances. In this situation, the accuracy of the model is 98.57%, which shows that the model can predict pH values with high reliability.
Overall, the evaluation metrics indicate that the decision tree model is an excellent fit for predicting pH values, and it can be useful for analyzing the river basin's water quality.
10.4 Prediction with Decision Tree for iron
Figures 8 and 9 show the graphs for total iron after processing and the correlations between iron and other parameters, together with the related data.
Figure 8 Plot for Total Iron before preprocessing.
Figure 9 Plot for Total Iron after pre-processing.
Figure 10 depicts a comparison between predicted and real iron values using the decision tree model. R2, MSE, and accuracy are the three measures that are used to assess the model's performance.
Figure 10 Comparison of predicted and actual values for Iron.
R2, also known as the coefficient of determination, is a statistical metric that shows how well the model fits the observed data. The decision tree model can reportedly explain 87.64% of the variation in the iron data, according to an R2 value of 87.64%.
Mean squared error (MSE) is a measurement of the gap between expected and actual values. A lower MSE value denotes a model that predicts iron values more accurately. With a very low MSE value of 0.004, the decision tree model can predict iron values with a high degree of accuracy in this case.
The accuracy of the model is another important metric that indicates the percentage of correctly classified instances. The accuracy of the model in this instance is 93.77%, indicating a rather high degree of confidence in its ability to forecast iron levels.
Overall, the evaluation metrics indicate that the decision tree model is a good fit for predicting iron values and can be useful for assessing the water quality of the river basin. However, it should be noted that the accuracy score is lower compared to the models for pH and TDS, indicating that the model may have some limitations in accurately predicting iron values.
11 Discussion
In this section, the prediction results for the internal relations between the water quality components are presented. To develop an optimal model, an approach that was introduced by Parsaie et al. (2020) was considered. They stated that for developing the ANN, steps to reduce the trial-and-error process should be considered. The traditional and freestanding ANN, GEP, SVM, DT, RF, and regression-based models were employed in most of the published work in modeling surface water quality parameters. The use of traditional AI algorithms to model and predict water quality metrics did not yield the intended results. For an effective and precise modeling output, it is therefore critical to use modeling approaches with optimization algorithms. Only a few studies have merged modeling with executive search for input optimization, which the current work has successfully integrated.
In their perceptive study, Al-Mukhtar et al. (2019) unveiled a remarkable revelation: when assessed against the backdrop of both prior and contemporary research incorporating modeling and optimization techniques, regression models and ANN eclipsed the predictive prowess of their own kind in the precise anticipation of EC and TDS. Furthermore, Azad et al. (2019) reported improved model results with PSO optimization for predicting various water quality parameters. Gorgan-Mohammadi et al. (2022) used data mining techniques to study and predict the levels of soluble phosphorus and oxygen in Lake Erie. They did this to understand and make predictions about these important water parameters in the lake.
The current study's findings revealed that by leveraging the inputs’ optimization process, modeling accuracy, optimal structure, reduced computational time, optimisation inputs, and reduced model complexity could be attained. Furthermore, the integrated optimization algorithms are more effective than standalone ANN, SVM, GEP, and RF, as well as other multiple regressions in providing robust models with improved output.
12 Conclusion
In conclusion, this study successfully employed regression models for the accurate prediction of water quality parameters, including TDS, pH, and total iron. The models demonstrated consistent advancement regardless of the river's water quality variations, indicating their robustness and reliability.
The evaluation of advanced models relied on essential statistical standards such as Modeling Accuracy (R2) and Error Evaluation Standards (MSE), which provided comprehensive insights into the models' overall performance. Through input optimization, the study managed to reduce modeling complexity, leading to a streamlined process and decreased information series and processing overhead.
The Decision Tree version proposed in this research presents several advantages. It offers a clear assessment of water pollutant levels, which is crucial in understanding and managing water quality issues. Additionally, it eliminates the need for time-consuming calculations often associated with traditional WQI (Water Quality Index) methods.
The accuracy scores for TDS, pH, and iron obtained from Table 3 further validate the model's effectiveness. With scores of 97.33%, 98.57%, and 93.77% respectively, the model demonstrates high precision in predicting iron values. These accuracy metrics provide valuable information for users, facilitating better comprehension of the model's performance in practical applications.
In conclusion, the advancements presented in this study have the potential to drive positive change in environmental management practices. The developed regression model and parameter prediction approach, along with the decision tree analysis, offer a holistic and reliable framework for water quality assessment, environmental decision-making, and water resource management. The insights gained from this study can be instrumental in guiding the design of future studies that specifically explore the instances of false positives or false negatives, thereby contributing to a comprehensive understanding of water quality assessment and refining its accuracy. It is our hope that this study will be a benchmark for further research and foster continued efforts to protect and preserve the precious water resources of the River Daya and beyond. Researchers, policymakers, and decision-makers can rely on the model's predictions to guide their actions and formulate more informed strategies for water resource management and pollution control efforts.
Declaration of Competing Interest
The authors, whose names are listed, certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Data availability statement
All the data generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
- AI-Mukhtar, M., and F. AI-Yaseen. 2019. “Modeling water quality parameters using data driven models, a case study Abu-Ziriq marsh in south of Iraq.” Hydrology 6 (1): 24.
- Asadollah, S., A. Sharafati, D. Motta, and Z. Yaseen. 2021. "River water quality index prediction and uncertainty analysis: A comparative study of machine learning models." Journal of Environmental Chemical Engineering 9 (1): 104599.
- Azad, A., H. Karami, S. Farzin, S-F. Mousavi, and O. Kisi. 2019. “Modeling river water quality parameters using modified adaptive neuro fuzzy inference system.” Water Science and Engineering 12 (1): 45-54. https://doi.org/10.1016/j.wse.2018.11.001.
- Barzegar, R., M.T. Aalami, and J. Adamowski. 2020. "Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model." Stochastic Environmental Research and Risk Assessment 34, 415-433.
- Bui, D.T., K. Khosravi, J. Tiefenbacher, H. Nguyen, and N. Kazakis. 2020. "Improving prediction of water quality indices using novel hybrid machine-learning algorithms." Science of the Total Environment 721, 137612.
- Gorgan-Mohammadi, F., T. Rajaee, and M. Zounemat-Kermani. 2022. “Decision tree models in predicting water quality parameters of dissolved oxygen and phosphorus in lake water.” Sustainable Water Resources Managment 9, 1. https://doi.org/10.1007/s40899-022-00776-0
- Hameed, M., S.S. Sharqi, Z.M. Yaseen, H.A. Afan, A. Hussain, and A. Elshafie. 2017. “Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical region, Malaysia.” Neural Computing and Applications 28, 893–905. https://doi.org/10.1007/s00521-016-2404-7
- Imani, M., M.M. Hasan, L.F. Bittencourt, K. McClymont, and Z. Kapelan. 2021. "A novel machine learning application: Water quality resilience prediction model." Science of the Total Environment 768, 144459.
- Li, L., P. Jiang, H. Xu, G. Lin, D. Guo, and H. Wu. 2019 "Water quality prediction based on recurrent neural network and improved evidence theory: a case study of Qiantang River, China." Environmental Science and Pollution Research 26 (19): 19879-19896.
- Li, Z., F. Peng, B. Niu, G. Li, J. Wu, and Z. Miao. 2018. "Water quality prediction model combining sparse auto-encoder and LSTM network." IFAC-PapersOnLine 51 (17): 831-836.
- Oladipo, J.O., A.S. Akinwumiju, O.S. Aboyeji, and A.A. Adelodun. 2021. "Comparison between fuzzy logic and water quality index methods: A case of water quality assessment in Ikare community, Southwestern Nigeria." Environmental Challenges 3, 100038.
- Parsaie, A., H.M. Azamathulla, A.H. Haghiabi. 2020. “Physical and numerical modeling of performance of detention dams,” Journal of Hydrology 581: 121757. https://doi.org/10.1016/j.jhydrol.2017.01.018.
- Rajaee, T., S. Khani, and M. Ravansalar. 2020. "Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review." Chemometrics and Intelligent Laboratory Systems 200: 103978.
- Samsudin, M.S., A. Azid, S.I. Khalit, M.S.A. Sani, and F. Lananan. 2019. "Comparison of prediction model using spatial discriminant analysis for marine water quality index in mangrove estuarine zones." Marine Pollution Bulletin 141, 472-481.
- Wang, X., F. Zhang, and J. Ding. 2017. "Evaluation of water quality based on a machine learning algorithm and water quality index for the Ebinur Lake Watershed, China." Scientific Reports 7.1, 12858.
- Yilma, M., Z. Kiflie, A. Windsperger, and N. Gessese. 2018. "Application of artificial neural network in water quality index prediction: a case study in Little Akaki River, Addis Ababa, Ethiopia." Modeling Earth Systems and Environment 4, 175-187.