Regionalization of Low Flow Analysis in Data Scarce Region: The Case of the Lake Abaya-Chamo Sub-basin, Rift Valley Lakes Basin, Ethiopia
Abstract
Prediction of low flows in ungauged catchments is desirable for planning and management of water resources development and for sustaining the environment. The main objective of this study was to regionalize low flow indexes (the baseflow index BFI, Q80, Q90, and Q95) in the Lake Abaya–Chamo sub-basin by using multiple linear regression models. To develop the regional equation, nine baseflow separation methods were compared: two digital graphical methods and seven recursive digital filters were compared and applied in eight gauged catchments. The methods were evaluated through the coefficient of determination (R2) and the root mean square error (RMSE) as performance measures. The flow duration analyses were conducted to compute the flow exceedance quantiles Q80, Q90, and Q95. Regionalizing those indexes required the identification of homogeneous regions, which was accomplished through cluster analysis, based on physiographic and climatic data. Three significantly different homogeneous areas were identified using k-means clustering, and multiple linear regression models were developed for every low flow index in each homogeneous region. The R2 values in the model developed for BFI, Q80, Q90, and Q95 range from 0.75 to 0.98 throughout the region. For checking the performance of the model, verification of regional models was carried out by determining the relative error over four gauged catchments assuming they were ungauged. All regional models performed well by having relative errors <10% in the regions showing high performance. Therefore, the developed regional models could potentially solve the low flow estimation in the vast majority of ungauged catchments in the sub-basin. Consequently, current and future water resources development endeavors may use such estimation methods for planning, designing, and management purposes.
1 Introduction
Low flow has a direct impact on the quality of social life, ecological stability and the functioning of aquatic ecosystems and is therefore an important issue in hydrology (Stevkova et al. 2012). Low flow reduction has caused serious damage to socioeconomic development, the physical and ecological environment, and human lives (WMO 1974). All over the world, where perennial rivers are used for irrigation, water supply and domestic use during the dry season at individual or institutional levels without any water storage results in low flow that has negative impacts.
In such cases, reliable hydrological forecasts are required for society. However, in many cases, low flow data is not available in ungauged catchments or it is inadequate in terms of both quantity and quality in partly gauged catchments (Arai et al. 2012). In such cases, in areas where there are limited levels of data availability, low flow regionalization is a widely used method to resolve the lack of hydrological details (Malekinezhad et al. 2011). According to Smakhtin (2001), low flows were quantified by Q95, Q90, Q80 (the discharge that is exceeded on 95%, 90%, and 80% of all days of the measurement period) and baseflow index (BFI).
BFI is the long term ratio of total baseflow to total river flow volume (Institute of Hydrology 1980). In many previous studies, through its links to physiography, climate, land cover, geomorphology, geology, soil type and other related parameters, it is used as a typical spatiotemporal descriptor of basin characteristics and is among the most important low flow indexes (Haberlandt et al. 2001; Mwakalila et al. 2002).
Considering BFI as a low flow indicator, separating baseflow from measured flow (discharge) is the most crucial action that must be implemented. The separation of baseflow from stream flow can be carried out utilizing methods that can be grouped into the following four categories:
- Recursive digital filters (Nathan and McMahon 1990; Arnold and Allen 1999);
- Conceptual models (Furey and Gupta 2001; Eckhardt 2005; Huyck et al. 2005);
- Recession analysis (Tallaksen and Lanen 2004); and
- Graphical separation (Sloto and Crouse 1996).
Furthermore, there are graphical methods like the digital graphical method (DGM) in the form of the BFI module from the HydroOffice software package developed by Gregor (2010; 2012). In several previous studies showing different outcomes worldwide, the various categories of separation approaches, as described above, have been compared and checked (Chapman 1999; Smakhtin 2001; Schwartz 2007; Eckhardt 2008; Nathan and McMahon 1990).
Demuth and Young (2004) and Smakhtin (2001) presented a comprehensive list of possible approaches and techniques for estimating low flows in ungauged catchments, including spatial interpolation, regional curve construction, regional regression and simulation of time series; regional regression performed best. Kupczyk et al. (1994) performed an analysis of Polish river low flow regimes by identifying regional recession curves based on low summer flows. Kobold and Brilly (1994) used the average annual 10 d minimum as a main variable to determine the relationship between different low flow durations on a regional scale. Considering alternative methods for estimating low flows in ungauged catchments, the most frequently used strategy is the regional regression approach, which compares low flow indexes and catchment characteristics (Kottegoda and Rosso 2004).
Relevant morphological climate descriptors of river basins are directly linked to the hydrological processes taking place in drainage basins in a regression model. Ideally, these indexes should play a role in the average water balance within the basin, together with the hydrological response-related morphological indexes and the water input-related climatic indexes (Engeland and Hisdal 2009). In terms of low flow processes, if the study domain is broad or heterogeneous, several scholars suggest dividing the domain into subregions, where the low flow activity is presumed to be homogeneous (Engeland and Hisdal 2009; Schreiber and Demuth 1997).
Having morphological climatic descriptors, cluster analysis is most widely used to group catchments having similar characteristics. It is especially used to determine hydrologically homogeneous regions. Hierarchical clustering is one cluster analysis method used to decide the number of clusters in k-means clustering. Its structure is more informative than the others in deciding the optimal number of clusters, allowing for the generation of agglomeration groups and making it easier to decide on the number of clusters by looking at a dendrogram, and it is easy to implement (Cupak 2017, Cupak et al. 2017).
The Lake Abaya-Chamo sub-basin supports millions of people who are dependent on it for their livelihoods. The available water resources have been used for various competing needs: irrigated agriculture, domestic water supply, livestock, industry, and for sustaining the environment. In the sub-basin, most of the rivers are ungauged and about 41% of the catchments are ungauged. Planning of water resources development in those ungauged catchments depends on the low flow estimation. Many studies have been conducted in the sub-basin, mainly focusing on climate change and its impact on stream flow (e.g. Tadelech 2015; Biruk 2017), the influence of land use and land cover changes on stream flow (e.g. Assefa 2018; Shimels 2015), and rainfall runoff modeling (e.g. Belay 2008; Behafta 2019). Even with the large number of studies in the past, regionalization of low flows in the sub-basin has not yet been explored. Therefore, the main objective of this study is to regionalize low flow in the Lake Abaya and Chamo sub-basins and in the Rift Valley Lakes basin, Ethiopia, and specifically (1) to identify the appropriate baseflow separation methods; (2) to identify hydrologically homogeneous regions in the study area; and (3) to regionalize low flow indexes.
2 Study area and data sources
2.1 Study area
The study was carried out in the Abaya–Chamo sub-basin (ACSB), consisting of the southern portion of the Main Ethiopian Rift and the neighboring Southern Ethiopian Highlands. The basin is situated between 5°N–8°N and 37°E–38.5°E, with an area ~18 600 km2, including freshwater bodies (Figure 1). ACSB land characteristics include a remarkable topographical variation over a short distance, with elevation ranging from 1082 m above sea level (m.a.s.l.) near the lake shore in the rift floor to >3500 m.a.s.l. in the highlands of the Gughe mountain range.
Figure 1 Location map of the study area; numbers in the map indicate catchment name: 1 Badessa, 2 Bilate, 3 Elgo, 4 Hare, 5 Gidabo, 6 Uraye, 7 Kola, 8 Kulfo, 9 Shafe, 10 Sile, 11 Upper Gelana, 12 Weira.
The catchment area is well characterized by steep slopes. Its elevation decreases from 3565 m in the north-west to 982 m in the north-east, and 1136 m at the termination into Lake Chamo.
Climate and hydrology
ACSB is dominated by a hot, semi-arid tropical climate with temperatures ranging from 8.8 °C–31.2 °C and mean annual rainfall 665 mm–1240 mm (Teklemariam and Wenclawiak 2004). In addition, the climate is characterized by a high rate of evaporation (annual average 2300 mm) in a typical bimodal rainfall pattern with peaks in August–September and April–May and dominated by year round warm temperatures.
There are many rivers feeding the lake system: Gelana, Bilate, Gidabo, Hare, Baso, Gelana, and Hamessa drain into Lake Abaya. In addition, a number of small streams and ephemeral rivers feed into Lake Abaya. The Sile, Argoba, Wezeka, Sego, and Kulfo rivers drain into Lake Chamo. The basin contains Lakes Abaya and Chamo (Figure 1) which are hydrologically connected. Overflow from Lake Abaya flows into the Kulfo River, which in turn ends up in Lake Chamo. The lakes have not been used for irrigation due to salinity (Ababu and Bernd 2004).
Land use and soil
Natural grassland, shrubland, forest, urban, heterogeneous agricultural and small urban areas are well known types of land use in ACSB. In the river catchments Badessa, Gidabo, Kola, Weira, and Upper Gelana, the majority of land is classified as intensively cultivated, whereas the land in Kulfo, Hare, Uraye, Elgo, Sile, Bilate, and Shafe catchments is classified as moderately cultivated, and forest is dominant. According to the FAO soil classification, hydrologic groups A, B, and D are the dominant soil types distributed in the sub-basin.
2.2 Data sources
Hydrological data
Historical streamflow data, topographic data, land use–land cover data, and soil map data for twelve river catchments (Kulfo, Miessa, Hare, Bilate at Alaba Hamessa, Gidabo, Gidabo at Humbo, Kola, Bilate Tena, Guder Bilate tributary, Upper Gelana, Badessa, and Weira) were collected by the Ministry of Water, Irrigation and Electricity Ethiopia (MoWIE). Figure 2 shows samples of historical stream flow of three catchments (Hare, Kola, and Gidabo).
Figure 2 Historical stream flow data (1990-2000).
The meteorological data (daily rainfall, daily maximum and minimum temperatures) of 28 stations are collected by the National Meteorological Agency (NMA) of Ethiopia. The reliability of the raw hydrometeorological and hydrological data collected significantly affects the accuracy of input data from the model and, consequently, the simulation of the model. Therefore filling in the missing data and checking consistency and homogeneity of the real data is not optional. For this study, the linear regression method and the most widely used inverse distance weighting method were applied for the missed stream flow and rainfall data respectively.
3 Methods
3.1 Baseflow separation methods
More practical methods based on recursive digital filter (RDF) and digital graphical separation (DGM) methods were used to separate the baseflow in this study (Indarto et al. 2016). These approaches are more realistic and easier to adopt in developing countries and other catchments (Gregor 2012; 2010). It is done by manually calibrating the parameters of each separation method using BFI 3+ tools from (www.hydrooffice.org) proposed by Gregor (2010; 2012) to make the baseflow separation process more consistent, less tedious, and accurate. With this approach, we can estimate the annual baseflow volume of rivers and streams and calculate an annual baseflow index (BFI) and the ratio of baseflow to total flow volume for a given year for several years of data at one or more calculated sites. The method incorporates a recession-slope test with a local minimum analysis (Eckhardt 2008).
Recursive digital filter and digital graphical method
The recursive digital filter (RDF) works by adopting the same mechanism as the signal or frequency analysis approach used. The filter is used in hydrograph separation to distinguish the component of rapid flow that is close to the high frequency signal and the component of baseflow that is analogous to the low frequency signal. The method is repetitive for the entire recording period (Indarto et al. 2016). In this study, seven RDFs (one and two parameter Boughton, IHACRES, Chapman, Lyne–Hollick, Eckhardt, and EWMA) were used. The parameter value of the Eckhardt filter (BFImax) is set at 0.8 (Eckhardt 2005) due to the fact that most of the rivers in the Abaya–Chamo sub-basin are categorized as perennial rivers. The two most widely used methods (fixed interval and local minimum) are selected for this study since sliding interval methods perform less well when compared to the local minimum and fixed interval methods as shown in the literature. For each time cycle, the local minimum method searches for the minimum flow.
Firstly, the interval is obtained, using 0.5*(2N*1) d. The value of N is empirically calculated by Linsley (1982), using N = A0.2, where A is the catchment area (mi2). Secondly, to explain the baseflow portion of the hydrograph, minimum flows for each time interval are connected by a straight line. The minimum flow for each time interval is searched using the fixed interval method with the interval (2N*d). N is the number of days at the end of the runoff and the value of N is empirically calculated by Linsley (1982). This approach is illustrated by the use of a bar chart that intersects with a line hydrograph for each interval at the lowest point. By rotating the bar chart until the lowest part of the hydrograph is crossed, the baseflow for the next interval is determined.
Baseflow separation using the BFI 3+ tool
Daily discharge data from all catchments were prepared and then formatted as text using Excel. In addition, the text file for baseflow separation is imported into the BFI 3+ tool. Seven recursive digital filters (one and two parameter Boughton, Chapman, IHACRES, Lyne–Hollick, Eckhardt, and EWMA) and two digital graphical methods are used to separate the baseflow from measured (observed) daily streamflow data (local minimum and fixed interval).
Calibration and performance measures
For each method, parameter values are entered on an annual basis by trial and error. The process is stopped when the red curve (calculated baseflow) is close to the blue area curve (observed discharge) for the dry period. The dry periods between December and February were used for this study to test the efficiency of the calibration process, so there was typically no or little rainfall in the area between this time. These calibration processes are carried out separately for each catchment. By comparing estimated baseflow and measured total flow for the dry period (December–February), the period when there is no rainfall and no runoff for all catchments, statistical analysis of calibration results is carried out. At this time, we can assume that direct runoff (DRO or quick flow) is close to 0 and the determination coefficient R2 is obtained from the scatter plot drawn on different axes close to 1 by separate baseflow and observed dry season. In addition, the root mean square error (RMSE) was statistically measured to test the goodness of fit between the estimated and measured baseflow (Equation 1).
(1) |
where:
Qc | = | calculated baseflow (m3/s), |
Qo | = | measured total flow on the river (m3/s), and |
n | = | the number of samples. |
The strong association between measured and estimated baseflow is seen by low RMSE values. Evaluations are also conducted to obtain the correlation coefficient using a scatter plot. The findings from RMSE and the scatter plot regression coefficient are discussed.
BFI
Baseflow index is the most widely used low flow index in water policy making (WMO 2008). The BFI for this study is computed by using the appropriate baseflow separation result:
(2) |
where:
ΣBfi | = | summation of daily baseflow obtained by the appropriate method for N d, and |
ΣQi | = | summation of daily observed discharge (baseflow plus direct runoff) for N d. |
3.2 Percentiles of flow (Q80, Q90, and Q95)
Q80, Q90, and Q95 are the most common operationally used low flow indexes and are defined as flows exceeded for 80%, 90%, and 95% of the time (Castellarin et al. 2007). Flow duration curves (FDC) are calculated by ranking all daily discharges and finding that the discharge exceeded 80%, 90%, or 95% of the time; optionally, the values can be extracted from individual months, groups of months, or other defined periods.
3.3 Delineation of homogeneous regions
Evaluating the characteristics of ungauged sites is one of the key practical objectives for delineating hydrological homogeneous areas, thus deducing indications of the response behavior of such catchments. The ability to differentiate between them on the basis of parameters that are different from the streamflow signatures, namely a collection of climate and physical characteristics of the catchment, is therefore an essential feature of a cluster analysis aimed at identifying homogeneous clusters. Cluster analysis was used for this study to group together catchments that are related hydrometrically. Catchments were analyzed for the cluster study using the following physiographic and meteorological parameters.
Catchment characteristics
Catchment characteristics for the homogeneous region including geomorphology and climate and soil are collected, as shown in Table 1.
Table 1 Catchment characteristics and their relations.
Catchment characteristics | Description | Formula/source | References |
A | Area | Arc-HYDRO | Mehaiguene et al. 2012; Eng and Milly 2007 |
MRL | Main River Length | Arc-HYDRO | Vezza et al. 2010; Mehaiguene et al. 2012 |
Hmin | Minimum Elevation | Arc-SWAT | Mehaiguene et al. 2012; Eng and Milly 2007 |
Hmax | Maximum Elevation | Arc-SWAT | Mehaiguene et al. 2012; Eng and Milly 2007 |
Hmean | Mean Elevation | Arc-SWAT | Vezza et al. 2010; Mehaiguene et al. 2012 |
Hmedian | Median Elevation | Arc-SWAT | Vezza et al. 2010; Mehaiguene et al. 2012 |
Sm | Mean slope | Arc-HYDRO | Vezza et al. 2010; Mehaiguene et al. 2012 |
MRS | Main River Slope | Arc-HYDRO | Vezza et al. 2010; Mehaiguene et al. 2012 |
∆H | Diff between Hmean and Hmin | Hmean–Hmin | Mehaiguene et al. 2012; Eng and Milly 2007 |
Ns | Number of Streams | Arc-HYDRO | Vezza et al. 2010; Mehaiguene et al. 2012 |
Ho | Elevation of Outlet | Arc-HYDRO | Mehaiguene et al. 2012; Eng and Milly 2007 |
MAR | Mean Annual Rainfall | Theissen polygon |
Vezza et al. 2010; Mehaiguene et al. 2012 |
Tmx | Maximum Temperature | NMA | Mehaiguene et al. 2012; Eng and Milly 2007 |
Tmn | Minimum Temperature | NMA | Vezza et al. 2010; Mehaiguene et al. 2012 |
Soil_Z1 | Soil Thickness | Soil map | Mehaiguene et al. 2012; Eng and Milly 2007 |
HYDS-A% | % of hydrological soil group A | Soil map | Vezza et al. 2010; Mehaiguene et al. 2012 |
HYDS-B% | % of hydrological soil group B | Soil map | Vezza et al. 2010; Mehaiguene et al. 2012 |
HYDS-C% | % of hydrological soil group C | Soil map | Mehaiguene et al. 2012; Eng and Milly 2007 |
HYDS-D% | % of hydrological soil group D | Soil map | Vezza et al. 2010; Mehaiguene et al. 2012 |
Agl% | % of grass land area | LU map | Mehaiguene et al. 2012; Eng and Milly 2007 |
Ab% | % of built up area | LU map | Vezza et al. 2010; Mehaiguene et al. 2012 |
Afl% | % of forest land | LU Map | Mehaiguene et al. 2012; Eng and Milly 2007 |
Aa% | % of agriculture land | LU map | Mehaiguene et al. 2012; Eng and Milly 2007 |
Cluster analysis
Cluster analysis is an effective statistical tool to find hydrologically homogeneous catchments based upon observed values of catchment characteristics. In this study, two step clustering (hierarchical followed by k-means clustering) was used.
Hierarchical clustering
In this analysis, Ward’s method of the inner squared distance was used to obtain the optimal number of clusters used for k-means clustering; some authors have defined it as the method that enhances other algorithms in terms of separation to provide relatively dense clusters with small variance within the group (Koplin et al. 2012; Hannah et al. 2005). The hierarchical clustering results (Ward’s method) are usually represented in a tree-like diagram called a dendrogram. The lengths of the limbs of the dendrogram reflect the proximity of points, so data can be clustered by cutting the elbow point dendrogram obtained after plotting distance versus cluster number. A hierarchical cluster analysis was performed to determine the appropriate number of clusters; a single linkage method (nearest neighbour) based on Euclidean distances was used (Equation 5). SPSS does not provide a scree plot for cluster analysis, and a scree plot is obtained using Excel by plotting the distances (coefficient column) against the number of clusters.
In Ward’s method, the distance between clusters was calculated using variance analysis (Equation 3). For any two clusters that may be created at any point, the method consists of minimizing the total of squared deviations. Even though it yields small clusters, this method is considered to be very efficient. Ward’s method aims to eliminate tiny clusters and create clusters of around the same size. It is a widely used method of regionalization in hydrology and meteorology (Stanisz 2007).
(3) |
(4) |
where:
W | = | total within group error sum of squares, |
Xkij | = | value of variable j in observation i to cluster k, |
= | mean value of variable j to cluster k, | |
P | = | maximum value of cluster, |
M | = | total number of variables included, and |
N | = | total observations. |
The Euclidean distance is used as a similarity measure for clustering.
(5) |
where:
da,b | = | Euclidian distance between variable a and b, |
xa,j | = | value of variable a at instance j, and |
xb,j | = | value of variable b at instance j. |
K-means clustering
In this research, k-means cluster analysis was carried out to eventually classify districts into clusters after hierarchical clustering decides the number of clusters. With other partitioning strategies, k-means processes all tend to execute sideways. It is one of the most commonly used clustering algorithms and is widely used because of its simplicity, ease of execution, effectiveness and empirical realization (Lengyel 2003). The aim of k-means clustering is to minimize the squared error total for all clusters. The provided dataset is categorized in this process through a user-defined number of clusters (k). The primary definition is to identify one k centroid for each cluster.
(6) |
where:
= | chosen distance measured between a data point and the cluster centre, | |
Xi(j) | = | data point of cluster j for case i, |
Cj | = | centroid for cluster j, and |
J | = | objective function. |
Figure 3 shows the flow diagram of the k-means algorithm. The k-means algorithm procedure is composed of the following recursive steps:
- Preprocessing (i.e. assume we choose to form k dataset clusters. Now, randomly take k separate points (patterns)). Initial group centroids represent these points. As these centroids will shift after each iteration until clusters are set, there is no need to spend time deciding to pick the centroids.
- Any object is assigned to a category with the centroid nearest to it.
- When all objects have been assigned, recalculate the locations of the k centroids.
- Repeat steps 1 and 2 until the centroids are not moving anymore. This results in the objects being separated into classes from which the metric to be minimized can be determined (Figure 3).
Figure 3 Flow diagram of k-means algorithm.
3.4 Regional regression model
Following catchment grouping by cluster analysis, the next step involved was the development of the models reflecting regression relationships. As a multiple regression (Equations 7 and 8), regional regression is constructed, representing the relationship between low flow (dependent variable) and morphoclimatic parameters (independent variables). It is used to identify the parameters which form a strong relationship with the low flow. The coefficient of determination R2 for significance level 0.05, was used to determine the precision of the regression equation. When using stepwise regression, the best results were obtained:
(7) |
(8) |
where:
xi | = | morphoclimatic parameters of a catchment, |
Qsi | = | discharge exceeded by 80%, 90%, and 95% of the total record (m3/s), |
βi, αi | = | regression coefficients, and |
p | = | total number of independent variables. |
3.5 Verification of the regional regression model
Finally, after forming the regional linear regression model, verification of the regional regression model was estimated using low flow indexes of four validation sub-basins excluded from regression model development (Bilate at Alaba, Guder Bilate tributary, Gidabo near Miesso, and Hamessa near Wajifo) assuming they are ungauged and collecting catchment characteristics which are used as a variable in the regional regression model. Using the relative error (E), the predictive output for each percentile flow was quantified as:
(9) |
where:
Ob, Pb | = | the observed and predicted percentile flow for basin b. |
The absolute value was used to calculate the sum of E for each percentile flow
4 Results and discussion
4.1 Comparison of baseflow separation and BFI computation
Statistical analysis
In the Bilate Tena catchment, the EMWA separation method almost fully captured the dry season flow, whereas in the rest of the catchments in the study area, the separation methods captured the observed flow.
The digital graphical separation method (fixed interval and local minimum) performed poorly, with RMSE values 0.07–0.082 across all catchments. However, catchments such as Weira, Gidabo, Kola, and Bilate Tena performed poorly with R2 values 0.22–0.45. On the other hand, in Badessa, Upper Gelana, Kulfo and Hare, those methods performed better with R2 values 0.53–0.82. Thus, in the Abaya Chamo Lake sub-basin, none of the digital graphical methods were selected as appropriate separation methods, but recursive digital filters (Lyne–Hollick and EWMA) were selected with R2 values 0.8–0.97 and RMSE 0.003–0.029. Table 2 shows that the EWMA and Lyne–Hollick filters estimated baseflow in dry periods more accurately than four other RDF filters.
Table 2 Results of appropriate methods: BFI, Q80, Q90, and Q95 of catchments.
Catchments | BFI | Q80 | Q90 | Q95 | Appropriate Sep. method | R2 | RMSE |
Weira | 0.3565 | 0.814 | 0.615 | 0.462 | RDF EWMA | 0.97 | 0.003 |
Kulfo | 0.6175 | 3.355 | 1.778 | 0.959 | RDF Lynie and Holick | 0.94 | 0.029 |
Hare | 0.6251 | 0.787 | 0.535 | 0.324 | RDF Lynie and Holick | 0.80 | 0.008 |
Gidabo | 0.4140 | 0.625 | 0.443 | 0.319 | RDF Lynie and Holick | 0.93 | 0.0028 |
Upper Gelana | 0.4549 | 1.079 | 0.593 | 0.404 | RDF EWMA | 0.97 | 0.004 |
Bilate Tena | 0.3962 | 2.989 | 1.733 | 1.059 | RDF EWMA | 0.98 | 0.006 |
Kola | 0.5537 | 0.625 | 0.443 | 0.319 | RDF Lynie and Holick | 0.91 | 0.004 |
Badessa | 0.4891 | 0.384 | 0.201 | 0.128 | RDF Lynie and Holick | 0.96 | 0.003 |
Contrary to the results of this study, Indarto et al. (2016) showed that the digital graphical method (local minimum) performed better as recursive digital filter methods were shown as having R2 values 0.545–0.875 and RMSE 0.004–0.126 in East Java watersheds.
Flow duration curve
Flow duration curves (FDC) were used to determine low flow indexes (Q80, Q90, and Q95). The baseflows in catchments Kulfo, Hare, Upper Gelana, and Kola, were dominant, having respective total discharges of 61.75%, 62.51%, 45.49%, and 55.37%. In Weira, Gidabo, Bilate Tena, and Badessa, direct runoff was dominant with respective percentages of 64.35%, 59.6%, 59.38%, and 52.09%. The likely reason for the disparity is the topographic and climatic differences within catchments.
4.2 Delineation of homogeneous regions
Identification of the number of clusters
Figure 5 presents the scree plot for cluster analysis. In general, a separate break indicates where an additional combination of two objects or clusters occurs at a substantially greater distance; separate breaks in the graph are visible at distances 18.977 (9 clusters), 39.997 (7 clusters), 137.213 (3 clusters), and 264 (1 cluster).
Figure 5 Scree plot for identification of the number of clusters.
From this analysis, elbows (changes on the scree plot) occur when the number of clusters becomes 3, which is indirectly the optimal number of clusters in the regionalization of the homogeneous region. Therefore, 3-cluster solutions were found to be better to interpret. The results of the cluster analysis are in agreement with the results of Rahmat and Sen (2016), who performed cluster analyses for the assessment of regional competitiveness, and Kumar and Swamy (2015), who compared clustering approaches for pavement design. Figure 6 presents the dendrogram of Ward’s method, showing the cluster distances for each number of clusters identified by the branch intersection.
Figure 6 Dendrogram using Ward’s method.
Compared to three clusters, the comparatively large distance value for two clusters also makes it seem realistic to select three clusters as the optimal number of clusters when using Ward’s method. In addition, choosing six clusters for Ward’s method means that two clusters have only one member, which is not a good way to present a catchment as a predictor of such catchments (Jehangir et al. 2015). Therefore, the number of clusters chosen for k-means by scree plot using Ward’s method was 3.
Final clustering with the k-means clustering technique
A k-means cluster analysis was conducted to eventually classify districts into clusters. This clustering was performed using a 3-cluster solution, which was obtained from hierarchical clustering. ANOVA, which is used as a key variable to view clusters, provides k-means clustering using SPSS. At significance level 0.05, the critical value is 3.885 for the degree of independence. Both F-table values contained in the rejection area were >3.885. Therefore, for the middle of the three clusters, the null hypothesis was dismissed for all 24 groups of catchment characteristics. This shows that in terms of the basis of their catchment features, all three clusters vary significantly.
Figure 7 shows the study area is classified into three homogeneous groups: Kulfo, Sile, Shafe, and Uraye catchments are in region 1; Kola, Badessa, Gidabo, and Upper Gelana are in region 2; and Bilate Tena, and Weira are in region 3, having the most ungauged downstream parts (shown by crosshatch in Figure 7) in the eastern part of the study area.
Figure 7 Delineated homogeneous region.
4.3 Development of regression equation
Regression models were developed for each homogeneous area, as shown in Table 3. The R2 values of the regression models between the indexes of low flow (BFI, Q80, Q90, and Q95) and the characteristics of the physiographic or hydro-climate catchment show strong regression when the individual characteristics are treated as a single predictor. However, when single catchment properties are combined into multiplicative equation models, the correlations are poor. As shown in Table 3, the highest R2 values were obtained when main river length (MRL), soil depth (Z1), and area (A) are combined in region 3, and when main river length (MRL), soil depth (Z1), area (A), and main river slope (MRS) are combined in region 1. In region 2, low flow indexes were highly correlated with main river slope (MRS), Gravelius index (Kg), and soil depth (Z1).
Table 3 Regression equations for BFI, Q80, Q90, and Q95 within each homogeneous region.
Regions | Regression equation | R2 |
Region 1 | BFI = 0.0025A + 0.603Kg − 0.0115MRL + 0.122MRS + 0.00054MAR − 1.9675 | 0.87 |
Q80 = 0.0136A + 0.42Kg + 0.083MRL + 0.319MRS − 0.00233MAR − 2.344 | 0.75 | |
Q90 = 0.0065A + 0.099Kg + 0.033MRL + 0.117MRS − 0.0009MAR − 0.776 | 0.89 | |
Q95 = 0.004A + 0.471Kg + 0.0223MRL + 0.12MRS − 0.00046MAR − 1.56 | 0.82 | |
Region 2 | BFI = 0.183Kg − 0.019MRS + 0.0005MAR − 0.4787 | 0.97 |
Q80 = −0.334Kg − 0.1053MRS + 0.0008MAR + 0.5672 | 0.93 | |
Q90 = −0.1667Kg − 0.076MRS + 0.0003MAR + 0.522 | 0.84 | |
Q95 = 0.0534Kg − 0.0593MRS + 0.374 | 0.91 | |
Region 3 | BFI = 0.0001A + 0.02MRL + 0.0317Z1 − 3.24 | 0.95 |
Q80 = −0.0028A + 0.0578MRL − 0.9558Z1 + 95.5 | 0.98 | |
Q90 = −0.0018A + 0.045MRL − 0.6Z1 + 60.07 | 0.94 | |
Q95 = −0.0024A + 0.043MRL − 0.733Z1 + 73.15 | 0.97 |
The need for homogeneous region delineation is evident from the contrast of the global model determination coefficients, calibrated across the entire area and dataset, and the regional single models, calibrated across regions 1, 2, and 3. Regional models showed variance for BFI ranges between 87% (R2 cluster analysis region 1) and 97% (R2 cluster analysis region 2). For Q80, Q90, and Q95, variance ranges were respectively 75%–98%, 84%–95%, and 82%–99%.
The significant parameters are mean annual rainfall (MAR), Gravelius index (Kg), main river length (MRL), main river slope (MRS), area (A), and soil depth (Z1). The positive rainfall relationship with low flows is evident. Mean annual precipitation provides water that is collected in the catchments in various forms (in groundwater systems such as snow, soil, and reservoirs) and released at different timescales. In the catchment contribution range, the impact of MRS is related to the reduction in infiltration. It is expected that groundwater will affect the low flows, which indirectly says that soil thickness or depth directly affects low flows, because groundwater is the most usual source of dry season or low flows. The main river length (MRL) is a catchment size measurement; larger catchments have larger BFI, Q95, Q90 and Q80 values; Engeland and Hisdal (2009) obtained a similar result due to the positive interactions between aquifers and the river streams (gaining streams in the valley). They found that the catchment area was a good explanatory vector for Norway’s winter low flows. The circularity ratio (Kg) is also positively correlated to Q95. The circularity ratio is inversely proportional to the elongation ratio of the catchment. Evapotranspiration is thus predicted to decrease in order to increase the circularity ratio, resulting in the highest Q95. High Kg values have resulted in high potential for infiltration and recharge of groundwater systems, with a much more elongated catchment. In contrast to this research, Aschwanden and Kan (1999) outlined in a study on low flow regionalization for Switzerland that land use plays an important role in predicting low flow indexes, especially considering the proportion of characteristics of pre-Alpine farming structures and agricultural areas.
Also, unlike the regression model results obtained above to obtain BFI, Mehaiguene et al. (2012) showed that BFI is highly correlated with aridity index, drainage density, vegetation cover percentage, average slope, and hydrogeological classification. The main differences in the independent variables for the development of the regional regression model to obtain BFI, Q95, Q90, and Q80 in the Abaya–Chamo lake basin and other low flow studies across the world (e.g., Engeland and Hisdal 2009; Mehaiguene et al. 2012; Aschwanden and Kan 1999) are due to physiography, climatic and other heterogeneities.
4.4 Verification of the regional models
Table 4 shows the observed low flows and outputs obtained from regional regression models; the estimated BFI is less than the observed low flows in regions 1 and 3, but greater in region 2. In the same manner, Q80 and Q95 are overestimated by regional models in region 1, and underestimated in regions 2 and 3, even if the difference is insignificant numerically.
Table 4 Observed low flow indexes and outputs from the regional regression models.
Low flow Indexes |
Catchments | ||||
Gidabo at Miessa | Bilate at Alaba | Hamessa at Humbo | Guder Bilate tributary | ||
BFI | Observed | 0.453 | 0.464 | 0.357 | 0.428 |
From regional model | 0.451 | 0.512 | 0.363 | 0.419 | |
Q80 | Observed | 0.505 | 2.134 | 0.719 | 0.671 |
From regional model | 0.513 | 2.03 | 0.687 | 0.653 | |
Q90 | Observed | 0.323 | 1.028 | 0.409 | 0.374 |
From regional model | 0.221 | 1.251 | 0.375 | 0.382 | |
Q95 | Observed | 0.267 | 0.792 | 0.243 | 0.178 |
From regional model | 0.269 | 0.721 | 0.354 | 0.281 |
Table 5 shows the relative errors of regional models developed. It is noted that the respective relative errors of BFI, Q80, Q90, and Q95 are 0.01, 0.033, 0.004, and 0.006 in Gidabo at Miessa, 0.033, 0.033, 0.099 and 0.17 in Bilate at Alaba, 0.004, 0.019, 0.024 and 0.089 in Hamessa at Humbo, and 0.006, 0.011, 0.006 and 0.087 in the Guder Bilate tributary. Conceptually, this shows that all regional models have relative errors <8.7% in all regions except Q90 and Q95 in region 3 and region 1, which have respective relative errors of 9.9% and 8.9% but are still <10%. This result is similar to work carried out by Laaha and Bloschl (2005) to determine Q95, which resulted in the catchment that had a relative error of <10% in the regression model performing better. This shows the regression model has good performance for the estimation of low flows.
Table 5 Relative errors of the regional regression models.
Verification catchments | BFI | Q80 | Q90 | Q95 |
Gidabo at Miessa | 0.001 | 0.005 | 0.077 | 0.002 |
Bilate at Alaba | 0.033 | 0.033 | 0.099 | 0.040 |
Hamessa at Humbo | 0.004 | 0.019 | 0.024 | 0.089 |
Guder Bilate Tributary | 0.006 | 0.011 | 0.006 | 0.087 |
5 Conclusions
In this study, the quantification of low flow characteristics (BFI, Q95, Q90, and Q80) and the application of a linear regression model to predict the low flows at ungauged sites for Lake Abaya and Chamo sub-basins, having a sparse monitoring streamflow gauges network, have been carried out. Among different low flow statistics, the current study has been focused on the quantification of the most important low flow features, namely baseflow index (BFI), Q95, Q90, and Q80. For BFI quantification, comparison of baseflow separation methods has been accomplished; two digital graphical methods and seven recursive digital filters have been compared showing that the Gelana EMWA method executes best. Thus, EMWA and Lyne–Hollick can be used as the baseflow separation method in Lake Abaya–Chamo basins. BFI was calculated using the ratio of total baseflow to entire streamflow volume. There is great variability in climate and topographical records, in geological complexes and land cover, as well as in hydrological regimes due to the wide area extension of the region; computed mean, maximum and minimum BFI values are about 48.8%, 62.5%, and 35.7% respectively; showing direct runoff is the highly dominating component of streamflow in the region. Q95, Q90, and Q80 are quantified from flow duration curves.
The need for delineation of homogeneous regions is preliminary in the catchment having large areal extension, with nearly uniform low flow behavior. The k-means cluster analyses were used for catchment grouping, with the help of hierarchical clustering, to decide the number of clusters by using 25 independent variables related to geomorphology, climate, soil and land use. K-means clustering reproduced three regions, with sufficient numerical reliability for the application of regression models. In each homogeneous region delineated by k-means clustering, multiple regional regression models were developed to calculate low flow indexes (BFI, Q95, Q90, and Q80) in each region. Finally, verification of regional models was carried out using relative error estimations between observed and calculated low flows, which resulted in all regional models performing well, having a maximum relative error 9.9%. Therefore, regional models can be used as low flow regionalizing methods in the region. Current and future water resources development endeavors can apply such discharge data for planning and design purposes.
In the end, we believe that the best low flow quantification in ungauged catchments is site dependent and cannot be applied a priori in every catchment except in the homogeneous regions. Looking at the findings, we may claim that the use of regionalization is helpful, not only to quantify low flow in ungauged catchments, but also to reduce relative errors, with the aid of cluster analysis techniques, on study locations. Therefore, as an operating technique for potential low flow regionalization studies, we recommend that more approaches should be used to help the critical scrutiny of the findings.
References
- Ababu, T., and W. Bernd. 2004. “Water quality monitoring within the Abaya Chamo drainage basin.” Research Symposium, 109-117.
- Arnold, J. G. and P. M. Allen. 1999. "Automated Methods for Estimating Baseflow and Groundwater Recharge from Streamflow records." Journal of American Water Resources Association, 35, 411-424. https://doi.org/10.1111/j.1752-1688.1999.tb03599.x
- Arai, F. K., S. B. Pereira, and G. G. Goncalves. 2012. “Characterization of water availability in a hydrographic basin.” Engenharia Agrícola, Jaboticabal 32 (3): 591–601. https://doi.org/10.1590/S0100-69162012000300018
- Aschwanden, H., and C. Kan. 1999. Le débit d’étiage Q347-Etat de la question. (The discharge Q347: State of the question.) Communications hydrologiques 27, Swiss National Hydrological and Geological Survey.
- Assefa, G. 2018. "The impact of land use–land cover change on water availability." M.Sc. thesis, Arba Minch University, Ethiopia.
- Behafta, G. 2019. "Flood inundation mapping under climate change using LISFLOOD-FP model." M.Sc. thesis, Arba Minch University, Ethiopia.
- Belay, B. 2008. "Investigation of flood hazards and mitigation measures in Lante area of Gamo-Gofa zone." M.Sc. thesis, Arba Minch University, Ethiopia.
- Biruk, Z. 2017. "Evaluation of water availability under changing climate in the Bilate catchment." M.Sc. thesis, Arba Minch University, Ethiopia.
- Castellarin, A., G. Camorani and A. Brath. 2007. “Predicting annual and long-term flow-duration curves in ungauged basins.” Advances in Water Resources 30: 937–53.
- Chapman, T. G. 1999. “Comment on evaluation of automated techniques for baseflow and recession analyses.” Water Resources Research 27: 1783–4.
- Cupak, A. 2017. “Initial results of nonhierarchical cluster methods use for low flow grouping.” Journal of Ecological Engineering 18 (2): 44–50.
- Cupak, A., A. Wałęga, and B. Michalec. 2017. “Cluster analysis in determination of hydrologically homogeneous regions with low flow.” Acta Scientiarum Polonorum. Formatio Circumiectus 16 (1): 53–63.
- Demuth, S., and A. R. Young. 2004. "Regionalization procedures." In Hydrological Drought. Processes and Estimation Methods for Streamflow and Groundwater, edited by L. M. Tallaksen and V. H. A. J. Lanen, 307–43. Developments in Water Science, 48, Amsterdam: Elsevier.
- Eckhardt, K. 2005. “How to construct recursive digital filters for baseflow separation.” Hydrological Processes 19: 507–15. https://doi.org/10.1002/hyp.5675
- Eckhardt, K. 2008. “A comparison of baseflow indexes, which were calculated with seven different baseflow separation methods.” Journal of Hydrology 352: 168–173. https://doi.org/10.1016/j.jhydrol.2008.01.005
- Eng, K., and P. C. D. Milly. 2007. "Relating low-flow characteristics to the base flow recession time constant at partial record stream gauges." Water Resources Research, 43, W01201. https://doi.org/10.1029/2006WR005293
- Engeland, K., and H. Hisdal. 2009. “A comparison of low flow estimates in ungauged catchments using regional regression and HBV-Model.” Water Resources Management 23 (12): 2567–86. https://doi.org/10.1007/s11269-008-9397-7
- Furey, P. R., and V. K. Gupta. 2001. “A physically based filter for separating baseflow from streamflow time series.” Water Resources Research 37 (11): 2709–22.
- Gregor, M. 2010. HydroOffice User Manual version 2010. http://hydrooffice.org
- Gregor, M. 2012. HydroOffice User Manual version 2012. http://hydrooffice.org
- Haberlandt, U., B. Klocking, V. Krysanova, and A. Becker. 2001. “Regionalization of the baseflow index from dynamically simulated flow components: A case study in the Elbe River Basin.” Journal of Hydrology 248: 35–53.
- Hannah, D. M., S. R. Kansakar, A. J. Gerrard, and G. Rees. 2005. “Flow regimes of Himalayan rivers of Nepal: Nature and spatial patterns.” Journal of Hydrology 308: 18–32.
- Huyck, A. A. O., V. R. N. Pauwels, and N. E. C. Verhoest. 2005. “A base flow separation algorithm based on the linearized Boussinesq equation for complex hillslopes.” Water Resources Research 41.
- Indarto, I., E. Novita, and S. Wahyuningsih. 2016. “Preliminary study on baseflow separation at watersheds in East Java regions.” Agriculture and Agricultural Science Procedia 9: 538–50. https://doi.org/10.4172/2157-7587.1000300
- Institute of Hydrology. 1980. "Low Flow Studies Report no.1 Research Report." Institute of Hydrology, 50pp.
- Jehangir, A. A., D-H Bae, and K-J. Kim. 2015. “Identification and trend analysis of homogeneous rainfall zones over the East Asia monsoon region.” International Journal of Climatology 1422–33. https://doi.org/10.1002/joc.4066
- Kobold, M., and M. Brilly. 1994. “Low flow discharge analysis in Slovenia.” In FRIEND: Flow regimes from international experimental and network data, edited by P. Seuna, A. Gustard, N. W. Arnell, and G. A. Cole; Proceedings of an international conference held at the Technical University of Braunschweig, Germany from 11 to 15 October 1993, 119–32. International Association of Hydrological Sciences. IAHS Publ. no. 221.
- Koplin, N., B. Schädler, D. Viviroli, and R. Weingartner. 2012. “Relating climate change signals and physiographic catchment properties to clustered hydrological response types.” Hydrology and Earth System Sciences 16 (7): 2267–83.
- Kottegoda, N., and R. Rosso. 2004. Statistics, Probability, and Reliability for Civil and Environmental Engineers, International ed. New York: McGraw-Hill.
- Kumar, A., and A. K. Swamy. 2015. “Comparison of clustering approaches on temperature zones for pavement design.” Bituminous Mixtures & Pavements VI: 2011–9.
- Kupczyk, E., A. Kasprzyk, L. Radczuk, and W. Czamara. 1994. “A study of the low flow regimes of Polish rivers.” In FRIEND: Flow regimes from international experimental and network data, edited by P. Seuna, A. Gustard, N. W. Arnell, and G. A. Cole; Proceedings of an international conference held at the Technical University of Braunschweig, Germany from 11 to 15 October 1993, 133–40. International Association of Hydrological Sciences. IAHS Publ. no. 221.
- Laaha, G., and G. Bloschl. 2005. "Low flow estimates from short stream flow records—a comparison of methods." Journal of Hydrology, 306: 1–4, 264–286.
- Lengyel, I. 2003. Verseny és területi fejl ődés: térségek versenyképessége Magyarországon. JATE Press.
- Linsley, R. K. 1982. "Rainfall–runoff models-an overview." In: Proc. Int. Symp. on Rainfall-Runoff Modelling. Water Resources Publications, 3–22. Littleton, Colorado, USA.
- Malekinezhad, H., H. P. Nachtnebel, and A. Klik. 2011. “Comparing the index flood and multiple regression methods using L-moments.” Physics and Chemistry of the Earth 36: 54-60.
- Mehaiguene, M., M. Meddi, A. Longobardi, and S. Toumi. 2012. “Low flows quantification and regionalization in North-West Algeria.” Journal of Arid Environments 67–76. https://doi.org/10.1016/j.jaridenv.2012.07.014
- Mwakalila, S., J. Feyen, and G. Wyseure. 2002. “The influence of physical catchment properties on baseflow in semi-arid environments.” Journal of Arid Environments 52: 245–58.
- Nathan, R. J., and T. A. McMahon. 1990. “Evaluation of automated techniques for baseflow and recession analysis.” Water Resources Publications 26 (7): 1465–73.
- Rahmat, S. and J. Sen. 2016. "Cluster Analysis Based Approach to Delineate Homogeneous Regions for the Assessment of Regional Competitiveness: A Case of Districts of India." Journal of Multidisciplinary Engineering Science Studies (JMESS), 2:1.
- Schreiber, P., and S. Demuth. 1997. Regionalization of low flows in southwest Germany. Hydrologic Science Journal 42 (6): 845–58.
- Schwartz, S. S. 2007. “Automated algorithms for heuristic base-flow separation.” Journal of American Water Resources Association 43: 1583–94. https://doi.org/10.1111/j.1752-1688.2007.00130.x
- Shimels, A. 2015. "The Effect of Land Use–Land Cover Change on Hydrology of Bilate Watershed." M.Sc.Thesis. Arba Minch University, Ethiopia.
- Sloto, R. A., and M. Y. Crouse. 1996. HYSEP: A computer program for stream flow hydrograph separation and analysis. New Cumberland, PA: U.S. Geological Survey, Water-Resources Investigations. Report 96-4040. https://doi.org/10.3133/wri964040
- Smakhtin, V. U. 2001. “Low flow hydrology: A review.” Journal of Hydrology 240: 147–86.
- Stanisz, A. 2007. Przystępny kurs statystyki z zastosowaniem STATISTICA PLna przykładach z medycyny. T. 3. Analizy wielowymiarowe. Krakow: Stat-Soft.
- Stevkova, A., M. Sabo, and S. Kohnova. 2012. “Pooling of low flow regimes using cluster and principal component analysis.” Slovakian Journal of Civil Engineering 20: 19.
- Tadelech, A. 2015. "Evaluation of Impact of Climate Change on Water Resource Availability on Abaya-Chamo Basin." M.Sc. thesis, Arba Minch University, Ethiopia.
- Tallaksen, L. M., and V. H. A. J. Lanen. 2004. "Hydrological drought. Processes and estimation methods for stream flow and groundwater." Developments in Water Science, 48. Amsterdam, Elsevier Science.
- Teklemariam, A., and B. Wenclawiak. 2004. "Water quality monitoring within Lake Abaya-Chamo drainage basin." In Lake Abaya Research Symposium 2004—Catchment and Lake Research, University of Siegen, Siegen, Germany. pp. 109–118.
- Vezza, P., C. Comoglio, M. Rosso, and A. Viglione. 2010. "Low Flows Regionalization in North-Western Italy." Water Resources Management, 24 (14), 4049–4074. https://doi.org/10.1007/s11269-010-9647-3
- WMO (World Meteorological Organization). 1974. "International glossary of hydrology." Geneva: WMO.
- WMO (World Meteorological Organization). 2008. Technical conference on meteorological and environmental instruments and methods of observation (TECO-2008), 27-29, St. Petersburg, Russian Federation.