We noticed that you're not using the latest version of your browser. You'll still be able to use our site, but it might not work or look the way it's supposed to. We recommend upgrading your browser.
JOURNAL OF WATER MANAGEMENT MODELING JWMM
AUTHORS
REVIEWERS
ABOUT
RESOURCES
Menu SEARCH LOGIN
Software
Tap in to water management modeling that excels. PCSWMM is flexible, easy to use and streamlines your workflow – saving you time and resources.
Training
Beginner or seasoned user, our flexible training options help you understand and master the full capabilities of both EPA SWMM5 and PCSWMM.
Community
There's a whole community to support you - find solutions, view code and more.
OPEN SWMM
OPEN EPANET
Journal
Our peer-reviewed, open-access Journal of Water Management Modeling. Expand your knowledge, get insights and discover new approaches that let you work more effectively.
Conference
The International Conference on Water Management Modeling. Meet your colleagues, share your experiences and be on the forefront of advances in our profession.
Consulting
Not sure how to solve a complex water management issue? Put our experience, knowledge, and innovation to work for you.
  • AUTHORS
  • REVIEWERS
  • ABOUT
  • SEARCH
  • RESOURCES
    Software
    Training
    Community
    OPEN SWMM
    OPEN EPANET
    Journal
    Conference
    Consulting

JWMM Login

Verifying credentials  Don't have an account?
Forgot your password?

Reservoir Operation Optimization by Reinforcement Learning

Masoud Mahootchi, Hamid R. Tizhoosh and K. (Ponnu) Ponnambalam (2007)
University of Waterloo
DOI: https://doi.org/10.14796/JWMM.R227-08
Comments

Collapse all
Collapse all

Abstract

Planning of reservoir management and optimal operations of surface water resources has always been a critical and strategic concern of all governments. Today, many equipments, facilities, and substantial budgets have been assigned to carry out an optimal scheduling of water and energy resources over long or short periods. Many researchers have been working on these areas to improve the performance of such a system. They usually attempt to apply new mathematical and heuristic techniques to tackle a wide variety of complexities in real-world applications and especially large-scale problems. Stochasticity, nonlinearity/nonconvexity and dimensionality are the main sources of complexity. In other words, there are many techniques, which could circumvent these complexities via some kind of approximations in uncertain environments with complex and unknown relations between various system parameters. In fact, using different methods to optimize the operations of large-scale problems coming along with much unrealistic estimations makes the final solution very imprecise and usually too far from real optimal solution. Moreover, the existing limitations of hardware or software cause some important physical constraints, which prevent various relations between variables and parameters from being considered. In other words, even if all possible relations between parameters in a problem are known and definable, considering all of them simultaneously might make the problem very difficult to solve.

In an optimization model of a real-world application of reservoir operations, there usually exist different objective functions and numerous linear and non-linear constraints. Thus, if the number of variables and parameters in this model make the problem intractable and too large, existing software or hardware might not be able to find an optimal solution using conventional optimization methods in a reasonable time. For example, stochastic dynamic programming (SDP), a well-known technique in the reservoir management, suffers seriously from the curse of dimensionality and of modeling in multi-reservoir systems. Therefore, to overcome this challenge, several ideas have been developed and implemented in past decades: dynamic programming successive approximations (DPSA) (Larson, 1968), incremental dynamic programming (IDP) (Hall et al., 1969), multilevel incremental dynamic programming (MIDP) (Nopmongcol & Askew, 1976), and different aggregation decomposition methods (Turgeon, 1981; Ponnambalam, 1987; Ponnambalam & Adams, 1996).

Using simulation along with optimization techniques could be a promising alternative in water resources management. Labadie believes that a direct linkage between simulation and implementation of reservoir optimization algorithms could be an important key to success in reservoir management in the future (Labadie, 2004).

Different reinforcement learning (RL) techniques, as simulation-based optimization techniques, might be suitable approaches to overcome the curse of dimensionality or at least decreases this difficulty in real-world applications. The mechanism of learning in these approaches is based on interacting with an environment and receiving immediate or delayed feedback through taking actions (Watkins & Dayan, 1992, Sutton & Barto, 1998). In other words, these techniques could start learning without a prior-knowledge of the stochastic behavior of the system; therefore, they are called model-free methods. This means that they do not need to know anything about the behavior of the system at the starting point of the learning process. The agent or decision-maker begins from an arbitrary situation and attempts to interact with the environment. During these interactions, the agent experiences new situations, saves the results and uses them in the future decision-making. It is clear that in the beginning of the learning, for most of the time, the agent encounters new situations which have never been observed. In this situation, the action taken is not based on a prior knowledge. However, after having enough interactions with the environment, the agent can slowly understand the behavior of the system, and thereafter it attempts to utilize this knowledge for more accurate decision-making. Furthermore, the agent usually looks for finding new information about the environment by taking an action randomly.

In fact, different techniques in RL are able to learn continually. In other words, they could be applied in on-line (real time) or off-line (simulation) learning. Using RL in on-line learning from scratch (without any prior knowledge) could be very expensive and troublesome; therefore, it could be initially used as an off-line learning during which a basic understanding of the environment is achieved. This knowledge could be eventually useful to start an on-line learning.

In most real-world applications, the dynamic of the system is continuously changing. RL techniques are substantially able to adapt itself to these changes and to generate adequate responses and reactions to them. Furthermore, in some optimization techniques such as SDP, the final policy should cover all possible states in the system while many of them are practically impossible or unimportant. However, In RL, because of using simulation or on-line interaction with the environment, the focus is on the significant states or those states, which are practically possible.

In this study, one of the well-known and popular techniques in RL called Q Learning is used to find an optimal closed-loop operational policy in a single-reservoir problem with linear objective functions, considering the stochastic nature of inflows into a reservoir. This could be a starting point to tackle the difficulty of finding an optimal solution for multi-reservoir applications in the future. Like the SDP method, the release from the reservoir is a decision variable that should be determined for every storage level as a system state, that is, the water stored in the reservoir. It is assumed that inflow into the reservoir is a normally distributed random variable. Two types of creating admissible actions including the optimistic and pessimistic schemes are investigated. Based on preliminary results in the simulation, the performance of the Q-Learning method will be measured and compared with the results of the SDP technique.

This paper is only available in PDF Format:

  View full text PDF

Image


Expand all

PAPER INFO

Identification

CHI ref #: R227-08 841
Volume: 15
DOI: https://doi.org/10.14796/JWMM.R227-08
Cite as: CHI JWMM 2007;R227-08

Publication History

Received: N/A
Accepted: N/A
Published: February 15, 2007

Status

# reviewers: 2
Version: Final published

Copyright

© 2007 CHI. Some rights reserved.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

The Journal of Water Management Modeling is an open-access (OA) publication. Open access means that articles and papers are available without barriers to all who could benefit from them. Practically speaking, all published works will be available to a worldwide audience, free, immediately on publication. As such, JWMM can be considered a Diamond, Gratis OA journal.

All papers published in the JWMM are licensed under a Creative Commons Attribution 4.0 International License (CC BY).

JWMM content can be downloaded, printed, copied, distributed, and linked-to, when providing full attribution to both the author/s and JWMM.


AUTHORS

Masoud Mahootchi

University of Waterloo, Waterloo, ON, Canada
ORCiD:

Hamid R. Tizhoosh

University of Waterloo, Waterloo, ON, Canada
ORCiD:

K. (Ponnu) Ponnambalam

University of Waterloo, Waterloo, ON, Canada
ORCiD:


ADDITIONAL DATA

 

COMMENTS

Be the first to comment.

RELATED PAPERS

 


TAGS

 

Connect With Us

Journal of Water Management Modeling
ISSN: 2292-6062

  info@chijournal.org

147 Wyndham St. N., Ste. 202
Guelph, Ontario, Canada, N1H 4E9
About JWMM

Mission and intent

Editorial board

Review process

Disclaimer

Privacy policy

For Authors

Submit paper

Author checklist

Journal paper template

Reference guide

Transfer of copyright

Unit conversion table

For Reviewers

Reviewing guidelines

Criteria to be used

Standards of acceptance


Copyright 2022 by CHI