# Uncertainty Characterization of Rainfall Inputs used in the Design of Storm Sewer Infrastructure

## Abstract

We assess relationships between the period of historical rainfall record and the uncertainties of predictions. Based on twenty rain gauges in Ontario, at least 49 y, 62 y and 73 y of records are needed to achieve a 95% confidence interval as small as 10% of the predictions, for 5 y, 10 y and 25 y recurrence events respectively. These findings indicate the need to consider uncertainties when using design rainfall rates based on available records. Further, an example of nonlinear regression between the intensity–duration–frequency estimates and rainfall durations shows the possibility of underestimating design rainfall intensities.

## 1 Summary

Intensity–duration–frequency (IDF) curves, from which design rainfall magnitudes are developed, are constructed using rainfall predictions associated with different return periods and durations. However, the degree of confidence for estimates of rainfall rates used as input for the design of stormwater infrastructure is influenced by the length and character of historical precipitation records. Relationships are assessed between the period of historical record of rainfalls and the uncertainties of predictions for different confidence levels, using both analytical and resampling methods. The uncertainty in IDF curve regression is also analysed.

The correlation between the log-transformed length of the record and the ratio of the standard deviations of predictions are determined to be high, allowing a linear regression model to be developed to characterize the 95% confidence intervals. Based on 20 rain gauges in Ontario, it is estimated that to achieve a 95% confidence interval as small as 10% of the predictions, at least 49 y, 62 y, and 73 y of records are needed for 5 y, 10 y, and 25 y recurrence events respectively. Considering that record lengths of rain gauges are mostly <50 y, the challenges to accurately estimate the event magnitudes are substantial, indicating the need to consider uncertainties when using design rainfall rates based on available records.

Further, an example of nonlinear regression between the IDF estimates and confidence intervals and rainfall durations shows the possibility of underestimating design rainfall intensities. This raises awareness of uncertainties when selecting design rainfall rates.

## 2 Introduction

Rainfall is one of the most important inputs in stormwater infrastructure design, to determine the conveyance capacity needed for the stormwater system. The rainfall input may be a rainfall event obtained from the historical record characterized from several decades of historical events, or a storm generated from the IDF curves, combined with a selected rainfall distribution (e.g. a triangular distribution).

Uncertainty of a design rainfall arises since the rainfall intensities are not known with certainty; there is a range of values. This range is typically represented as a confidence interval. Thus it is important to consider the confidence intervals as well as the design rainfall intensities. Use of the upper confidence limit is more conservative in stormwater design, although this approach may increase the cost due to the extra capacity needed for conveying the predicted stormwater flows. When dealing with uncertainties, the selected design rainfall intensity should be a balance between cost and risk of being more frequently exceeded; a smaller range of uncertainty is helpful to ensure the design performs as intended.

For Ontario, IDF curves are available from two sources: the IDF database from the Ontario Ministry of Transportation (MTO) and the IDF data file from Environment Canada (EC). MTO’s IDF database provides estimated rainfall depth and intensities, and the regression coefficient values for the equations. MTO’s IDF curves are remarkably localized: they are available at locations that don’t have rainfall gauges, which implies spatial interpolation has been employed. However, the MTO’s IDF curves are not provided with information on uncertainties.

The EC IDF data files contain more information in comparison with the MTO’s IDF database, although EC IDFs are only at locations at which there are rain gauges. EC IDF data files include the quantile–quantile plot for distribution fit, and event estimate graphs, based on historical records at a rainfall gauge. The annual maximal rainfall intensities over various durations are also provided, together with the estimates of events for return periods from 2 y through 100 y. The confidence intervals for the event estimates are also supplied, at the confidence level of 95%, two-sided. The regression coefficients for each of these IDF curves are also available. With the data provided, users are immediately advised about the magnitudes of the uncertainties, and capable of fitting rainfall records with alternative probability distributions and quantifying the uncertainties.

An issue is raised when examining the confidence intervals related to storm durations. EC’s IDF curves provide the confidence intervals for durations of 5 min, 10 min, 15 min and 30 min, and of 1 h, 2 h, 6 h, 12 h and 24 h. However, the design rainfall for stormwater system design could be over any duration ≤2 h, depending upon the hydrologic characteristics (e.g. time of concentration) of the catchment. The confidence limits are needed to compare with expected values, to be confident of the design rainfall selected. Therefore interpolation or regression of the confidence intervals is necessary, in addition to the regression of the expected values of design rainfall intensities.

The intensities and confidence intervals used for IDF regression are usually estimated by parametric methods, including fitting a probability distribution to extreme rainfall data series, and by estimation of the extreme event intensities for selected exceedance frequencies. Uncertainties are introduced into estimated intensities generated from both modelling error and sampling error involved in parametric methods, and further introduced into IDF curves and design rainfalls.

The uncertainties in design rainfalls could sometimes be very large. As an example, Figure 1 illustrates the IDF curves for Waterloo, Ontario. The cautionary note in the upper right corner indicates the large range of the confidence intervals. The 10 y return period event of 1h duration rainfall is estimated to be 45.1 mm/h as an expected value, with a 95% confidence interval of ±9.9 mm/h, which is almost ±20% of the expected value. Consequently, designers are looking at a design rainfall intensity which could be anywhere from 35.2 mm/h to 55.0 mm/h, reflecting the 95% confidence interval.

Figure 1 IDF curves for Waterloo, Ontario (publicly available from Environment Canada).

In response, one possibility is to have a longer record, which would reduce the uncertainties in design rainfall estimates by reducing sampling error. The length of the rainfall record needed to achieve a specified degree of certainty in design rainfall estimate should be determined as well. However, a lengthy record may present temporal trends or step changes. It is not always appropriate to use the longest record, as the current climate storms are suspected to be different from those decades or centuries ago.

Of interest is to develop the relationships between the confidence intervals for design rainfall versus the length of the rainfall record, to quantify the length of record needed to achieve specific uncertainty or to determine the magnitude of the uncertainties given the available historical period of record. Further, this paper develops the relationships between the design rainfall confidence limits and the rainfall durations of the same return period, to provide the confidence limits for selection of the design rainfall for a given design storm duration.

## 3 Literature Review

The uncertainties in extreme rainfall event estimation have been analyzed in research to investigate the impact of climate change (Fowler and Kilsby 2003; García-Ruiz et al. 2000; Coles et al. 2003) and to estimate the impact of uncertain input to stormwater system design (Aronica et al. 2005; Semadeni-Davies et al., 2008). However, the relationship between uncertainties in the estimation of extreme event intensities and record length have not been comprehensively investigated. Rauch and De Toffol (2006) investigated six rainfall series in Austria to assess the length of the rainfall series required to estimate the extreme rainfall and associated uncertainties of 1 y return period events with 15 min duration. The rainfall intensities are estimated based on segments of the historical record, with the lengths of 1 y, 10 y and 20 y. Rauch and De Toffol (2006) observed a correlation between the magnitude of uncertainties (expressed as the ratio between the width of the 90% confidence interval and the expected value using the entire record) and the length of segments of the historical record. It is suggested to use a ≥10 y record to estimate a 1 y event. Use of the longest available record is not always recommended, as increasing record length might be influenced by temporal trends. Rauch and De Toffol (2006) only tested the 1 y return period rainfall over 15 min duration, which cannot demonstrate all the characteristics of all extreme rainfall events; this relationship between uncertainties in the estimation of extreme event intensities and the record length needs to be thoroughly investigated.

## 4 Methodologies

The uncertainty of design rainfall in this research is represented as the confidence interval at the confidence level of 95%. There are two methods to describe the relationships between the length of the rainfall record and the confidence intervals: the analytical method and the resampling method. In both methods, the sample sets are fitted with the Gumbel distribution using L–moments (Hosking and Wallis 1997, chap. 2). In the analytical method, the variance of the rainfall estimate is calculated by the asymptotic method, using Equation 1 (after Stedinger et al. 1993).

(1) |

where:

p |
= | non-exceedance probability (see Equation 3), |

y |
= | −ln[−ln(1/T)], the reduced variant for the Gumbel distribution, |

T |
= | return period in years, |

α |
= | scale parameter for the Gumbel distribution, and |

n |
= | sample size. |

The confidence interval is:

(2) |

where:

z |
= | normal score for a given significance level. |

The rainfall intensity is estimated by the inverse of the cumulative distribution function of the Gumbel distribution, Equation 3, with scale (*α*) and location (*ξ*) parameters, for non-exceedance probability.

(3) |

In this research, the uncertainty is characterized as a percentage of the extent of the confidence interval compared to the expected value.

(4) |

Note that the percentage is a function of the sample size (*n*) and the reduced variant (*y*), which is a function of the return period (*T*). Therefore, for a specified return period, the magnitude of uncertainty is only determined by the sample size, which equals the length of the record when using annual maxima. With the analytical method, the relationship between the record length and the uncertainties can be computed using Equation 4.

With the resampling method, the relationship is characterized 100 times using similar procedures to those used in the analytical method based on resampled datasets. First, 5 000 values are synthesized using the Gumbel distribution to form a large dataset (*A*). Second, 100 values are randomly selected from this dataset *A*, to construct an annual maximum rainfall record (*B*) over 100 y. Third, the first 10 values in this record *B* are assumed as a sample set (*C*) (a sample size <10 is expected to have a large sampling error, defined as the variance of the sample). Fourth, the sample set *C* is fitted with the Gumbel distribution to calculate the percentage of the 95% confidence interval compared with the event estimates for return periods of 2 y, 5 y, 10 y and 25 y. The third step is repeated by including the next value from the record *B* into the sample set *C* until the sample size reaches 100. Subsequently, the percentages are plotted against the log-transformed size of the sample set *C* taken from the record *B*. Finally, steps two through four are repeated 100 times, with results as plotted in Figure 2, and compared with the curve developed by the analytical method.

The relationship between the log-transformed record length and the percentages of the confidence intervals is depicted in Figure 2 (the solid black curve is the analytical method of Equation 4). The figure demonstrates that the relationship is approximately linear, especially for the segment between 20 y and 100 y of record. The 100 trials of the resampling methods are also drawn in Figure 2 (the dashed grey curves), and compare favorably with the analytical method (the solid black curve).

Figure 2 Analytical and resampling methods for the relationships between the record length and the uncertainties for events of 10 y return period.

In Figure 2, the resampled curves (the grey curves) scatter over a large range when the record length is <20 y, and are in immediate proximity to the analytical curve as the record length increases. In addition, the resampled curves are distributed evenly around the analytical curve. Therefore, these resample curves are valid as good estimates of the relationships between uncertainties and record lengths. For 100 repetitions, the resampled curves show considerable density close to the analytical curve, and the relationship is obvious. There is no necessity for more repetitions to make it denser.

The analytical method is not applicable to historical rainfall records since the population parameters are unknown. Therefore steps three and four in the resampling method are used to draw the relationships between record length and uncertainties in historical record (used as dataset *B*). A least squares linear regression is applied to model this relationship, and extrapolated to get the minimum record length needed, to achieve a specific percentage of uncertainty.

To verify the minimum record length, a bootstrap method was applied to the historical record. In the bootstrap method, the historical record is assigned as the population (dataset *A*), and the dataset *B* is randomly selected (with replacement for 50 or 100 repetitions) with length calculated from the linear regression model. The dataset *B* is directly used to fit to the Gumbel distribution and to estimate the confidence intervals, and the percentage of the confidence interval against the expected value is calculated as well. The mean of the percentages in each repetition is compared to the desired percentage to check the minimum record length estimate.

The nonlinear regression of IDF curves and their confidence intervals are developed, and compared with the linear regression available from EC’s IDF files. The nonlinear equation used for the IDF curves is:

(5) |

where:

I |
= | design rainfall intensity in mm/hr, |

a, b, c |
= | coefficients to be optimized by the least-squares method, and |

t |
= | rainfall duration in hours. |

Further, the regression functions for the upper and lower confidence limits are developed using Equation 5, by substituting the intensity with the upper or lower limits, *I _{Upper}* and

*I*.

_{Lower}## 5 Results and Discussion

The 1 h duration historical record at Kingston is analyzed as an example. As shown in Figure 3, the relationship for the 25 y event fluctuates when the record length increases from 10 y to 20 y, but the percentages are gradually reduced to 13% when using the entire record of 45 y.

Figure 3 Relationship at Kingston between the percentage of the 95% confidence interval and the record length.

The linear regression for the percentage (*r*) of 95% confidence intervals and the length of the record (*l*) is expressed in Equation 6, based on the segment of curve from 20 y to 45 y.

(6) |

The slope and the correlation coefficient are both significant at <0.001.

Using Equation 6, the minimal length of the record for Kingston needed to achieve a 95% confidence interval as small as ±10% of the prediction is exp((45.66 − 10)/8.64), or 62 y.

The bootstrap method is used to check this estimate of record length. Fifty and 100 sets of 62 values are randomly selected from the historical record, with replacement, and fitted to the Gumbel distribution. The 25 y event intensity is estimated using Equation 3 for each set of values. The mean, variance, and 95% confidence limits are estimated based on these 50 and 100 estimates, assuming the normal distribution. The percentage of the magnitude of the confidence interval to compare to the mean is obtained from Equation 4. This bootstrap method gives 10.7% for 50 sets of values, and 9.4% for 100 sets of values. Thus good agreement is observed between the analytical method and the resampling method, indicating that the assessment of the required record length as 62 y is valid.

Table 1 Record lengths and percentages of uncertainty of rainfall gauges in Ontario

Rain Gauges | Record Length (Years) | Record Length for 95% CI as 10% of Predictions | Percentage of 95% CI of Entire Record | ||||

5 y | 10 y | 25 y | 5 y | 10 y | 25 y | ||

Toronto | 65 | 53 | 67 | 81 | 9.00 | 10.36 | 11.92 |

St Thomas | 43 | 59 | 81 | 99 | 10.50 | 12.20 | 14.14 |

Windsor Airport | 43 | 44 | 51 | 59 | 10.53 | 12.22 | 14.16 |

Belleville OWRC | 41 | 41 | 57 | 74 | 10.10 | 11.83 | 13.83 |

Ottawa | 42 | 41 | 53 | 65 | 9.67 | 11.39 | 13.36 |

London Airport | 46 | 48 | 61 | 74 | 10.31 | 11.94 | 13.81 |

Fergus Shand Dam | 42 | 59 | 69 | 78 | 13.23 | 14.83 | 16.65 |

Kingston Pumping Station | 45 | 40 | 51 | 62 | 9.45 | 11.11 | 13.01 |

Toronto International Airport | 46 | 50 | 65 | 79 | 10.18 | 11.82 | 13.69 |

Hamilton RBG | 39 | 27 | 42 | 59 | 8.72 | 10.46 | 12.48 |

Sault Ste Marie Airport | 37 | 53 | 66 | 79 | 11.52 | 13.35 | 15.43 |

Delhi | 39 | 43 | 50 | 57 | 11.06 | 12.84 | 14.88 |

Chatham Waterworks | 35 | 82 | 116 | 136 | 11.68 | 13.57 | 15.72 |

Owen Sound MoE | 30 | 48 | 54 | 61 | 15.45 | 17.37 | 19.56 |

Sioux Lookout Airport* | 33 | 136 | 148 | 151 | 12.94 | 14.86 | 17.05 |

North Bay Airport | 34 | 48 | 60 | 71 | 11.56 | 13.48 | 15.66 |

Stratford MoE | 33 | 59 | 68 | 77 | 13.84 | 15.72 | 17.87 |

Brockville PCC | 34 | 59 | 70 | 81 | 12.73 | 14.63 | 16.78 |

Kenora Airport | 33 | 44 | 50 | 57 | 13.62 | 15.51 | 17.67 |

Port Colborne | 31 | 38 | 45 | 53 | 11.65 | 13.66 | 15.96 |

Trenton Airport | 32 | 43 | 49 | 56 | 12.88 | 14.84 | 17.07 |

Table 1 lists information about the 21 rain gauges, including the record length available and the percentage of 95% confidence interval compared to predictions based on the entire record (1 h duration). The lengths needed to achieve the 95% confidence interval as low as 10% of the predictions for 5 y, 10 y and 25 y events separately, are listed therein. Excluding the gauge at Sioux Lookout Airport (at which gauge the slope of the linear regression function is not rejected as equalling zero) it is calculated that the average length of record needed to achieve a 95% confidence interval as low as 10% of the prediction is 49 y, 62 y and 73 y for return periods of 5 y, 10 y and 25 y respectively. Considering that the average record length is 40 y for the remaining 20 gauges, it is strongly recommended to consider the uncertainties of rainfall intensity estimates when selecting design rainfall from these IDF curves, because otherwise the design rainfall is at risk of being significantly underestimated.

The 5 y event IDF curve at Waterloo is employed as an example to explain the importance of using confidence intervals as well as expected values. The expected values and 95% confidence intervals are obtained from the EC IDF files, and listed below in Table 2. The EC regression equation is shown in Equation 7 as a benchmark, which is a linear regression between the intensities (*I*) and the log of durations (*t*) in hours.

(7) |

The nonlinear regression using Equation 5 becomes

(8) |

Both of these equations are plotted in Figure 4, and demonstrate the expected values and 95% confidence intervals for events over nine durations of storms.

Figure 4 IDF curve for 5 y event at Waterloo.

In Figure 4, the EC regression equation (Equation 7) is close to the lower confidence limit of rainfall events at durations of 30 min, 1 h and 2 h. Thus if a hydrologic model uses design rainfall >90 min duration, it is in fact using an intensity that is close to the lower confidence limit. Further, there is a 95% chance that this event will be exceeded more frequently than once every 5 y on average.

Table 2 5 y event estimates and 95% confidence intervals at Waterloo.

Duration | Intensity (mm/h) | 95% Confidence Interval |

5 min | 153.3 | ±24.1 |

10 min | 110.4 | ±17.7 |

15 min | 91.9 | ±15 |

30 min | 66.4 | ±12.5 |

1 h | 45.1 | ±9.9 |

2 h | 26.7 | ±5.8 |

6 h | 10.8 | ±2.1 |

12 h | 5.9 | ±1 |

24 h | 3.2 | ±0.5 |

## 6 Conclusion

A linear relationship is observed and modeled between the uncertainties (in the form of the percentage of the 95% confidence intervals, compared with the expected values) and the record length. Using this linear relationship, it is possible to quantify the record length needed to achieve a specified uncertainty. For example, a 95% confidence interval that is <10% of the expected value. With the record lengths quantified, modelers are better aware of uncertainties in rainfall intensities estimated from records with limited durations.

The uncertainties of extreme event predictions are constrained by the limitation of the historical record. It is difficult to provide a confidence estimate of the 100 y event based on a record of 40 y or 50 y. This situation is a very common circumstance for Ontario. Stormwater infrastructure design is, in fact, dealing with extreme rainfall events to a very large degree of uncertainties. Using the expected value does not incorporate the uncertainty in the estimation of the rainfall intensities which are used for design of stormwater infrastructure.

The design rainfall intensities obtained from the IDF curve regression equations may be exceeded more frequently than the design return period. Modelers should compare these intensities with the corresponding confidence intervals to decide which of the intensities (the upper confidence limit or the interpolated expected value) should be used in modeling.

## References

- Aronica, G., G. Freni and E. Oliveri. 2005. “Uncertainty Analysis of the Influence of Rainfall Time Resolution in the Modeling of Urban Drainage Systems.”
*Hydrological Processes*19 (5): 1055–71. - Coles, S., L. R. Pericchi and S. Sisson. 2003. “A Fully Probabilistic Approach to Extreme Rainfall Modeling.”
*Journal of Hydrology*273:35–50. - Fowler, H. J. and C. G. Kilsby. 2003. “A Regional Frequency Analysis of United Kingdom Extreme Rainfall from 1961 to 2000.”
*International Journal of Climatology*23 (11): 1313–34. - García-Ruiz, J. M., J. Arnaéz, S. M. White, A. Lorente and S. Beguería. 2000. “Uncertainty Assessment in the Prediction of Extreme Rainfall Events: An Example from the Central Spanish Pyrenees.”
*Hydrological Processes*14 (5): 887–98. - Hosking, J. R. M. and J. R. Wallis. 1997.
*Regional Frequency Analysis: An Approach Based on L–Moments*. Cambridge: Cambridge University Press. - Rauch, W. and S. de Toffol. 2006. “On the Issue of Trend and Noise in the Estimation of Extreme Rainfall Properties.”
*Water Science and Technology*54 (6/7): 17–24. - Semadeni-Davies, A., C. Hernebring, G. Svensson and L.-G. Gustafsson. 2008. “The Impacts of Climate Change and Urbanisation on Drainage in Helsingborg, Sweden: Combined Sewer System.”
*Journal of Hydrology*350 (1–2): 100–13. - Stedinger, J. R., R. M. Vogel, E. Foufoula-Georgiou and D. R. Maidment. 1993.
*Frequency Analysis of Extreme Events*. New York: McGraw-Hill.