Colloquium announcement

Faculty of Engineering Technology

Department Computer Architecture for Embedded Systems - EEMCS
Master programme Sustainable Energy Technology

As part of his / her master assignment

Hernandez Urena, P.J. (Patrick)

will hold a speech entitled:

Leveraging deep reinforcement learning to enhance battery energy storage system allocation in a multi-energy market cost optimization

Date26-09-2025
Time14:00
RoomLA1208

Summary

The de-carbonization of the Dutch energy sector led to the emergence of challenging electricity markets, each characterized by price volatility that is intensified by renewable energy generation. Minimizing costs on a multi-market level through a centralized optimization program may be impractical, as this requires the valuation of electricity in each market, which is generally more difficult to predict in advance for markets that close at a later time. Alternatively, cost minimization can be disaggregated, allowing cost minimization within each market shortly before a market closes when it is more predictable. This calls for an overarching system capable of coordinating each individual optimization, as a decision in one market may influence opportunities in another market. This thesis investigates how reinforcement learning (RL) can be utilized as an underlying model to provide such coordination. For this purpose, a RL model is developed that is designed to find policies that minimize opportunity costs on a multi-market level. The proposed RL model is based on a deep Q-learning method that employs a smooth Q-value approximation function consisting of radial basis functions. This RL model is developed on a simulated environment involving the exploitation of electricity spot markets using a battery energy storage system (BESS). The simulated environment utilizes a feudal hierarchical structure, within which a RL-agent operates at the highest level of abstraction allocating the capacity of the BESS among lower-level agents that minimize costs in a specific market. This work demonstrates that a deep Q-network capable of capturing temporal dependencies and nonlinearities can effectively approximate a policy. The implementation of double critic networks combined with soft target updates has proved to effectively reduce overoptimism, thereby contributing to the stabilization of the training process of a RL agent. In testing the RL model, the results show that a trained RL agent outperforms an alternative approach involving a fixed BESS capacity reserve, which was validated by simulating on a test data set. Furthermore, it is shown that the proposed RL model can potentially learn more complex policies, as it allows for the creation of a multi-dimensional value space. Each dimension within this space represents the approximate value of an objective, wherein a solution can be found through exploration of the value space. This is demonstrated in a case study involving the sharing of BESS capacity among multiple stakeholders with conflicting preferences, in which metaheuristics were employed to search the value space to approximate a Pareto-dominant policy.