Competitive Reinforcement Learning for Real-Time Pricing and Scheduling Control in Coupled EV Charging Stations and Power Networks

dc.contributor.authorSurani, Adrian-Petru
dc.contributor.authorWu, Tong
dc.contributor.authorScaglione, Anna
dc.date.accessioned2023-12-26T18:39:36Z
dc.date.available2023-12-26T18:39:36Z
dc.date.issued2024-01-03
dc.identifier.doi10.24251/HICSS.2024.366
dc.identifier.isbn978-0-9981331-7-1
dc.identifier.otherf57a00cd-6566-4cd1-9f7d-49b7d2125438
dc.identifier.urihttps://hdl.handle.net/10125/106749
dc.language.isoeng
dc.relation.ispartofProceedings of the 57th Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectMonitoring, Control, and Protection
dc.subjectev charging pricing
dc.subjectrobust reinforcement learning
dc.subjecttwin delayed ddpg (td3)
dc.titleCompetitive Reinforcement Learning for Real-Time Pricing and Scheduling Control in Coupled EV Charging Stations and Power Networks
dc.typeConference Paper
dc.type.dcmiText
dcterms.abstractThis paper proposes a robust Multi-Agent Reinforcement Learning (MARL) approach to optimize the charge schedule and price offered by EV charging stations competing to maximize profits, i.e. the differences between the payments collected by the charging stations and the electricity price set from a distribution system operator. It is assumed that, to prevent energy congestion on the distribution grid, each charging station pays the locational marginal price (LMP) of electricity to serve its customer, determined to be the dual variable of the optimal power flow (OPF) problem. Our proposed RL algorithm trains multiple agents to make optimal charging and pricing decisions at each time step, based solely on past event observations. Additionally, the algorithm takes into account the randomness caused by user behavior, such as travel and wait times, and user flexibility. We observe that, when they are profit maximizing, competing agents vie for higher profits. This intense competition can often lead agents to adopt inefficient policies, mainly due to the disruptions caused by the actions of their competitors. To address this issue, we incorporate constant-sum game theory in the RL policy training. This approach utilizes the minimax policy gradient to maximize the reward of a robust agent, while considering the worst-case scenarios created by competing agents. Simulation results validate that robust agents are capable of generating greater profits than competing agents that do not undergo minimax training and that their presence stabilizes the training.
dcterms.extent10 pages
prism.startingpage3030

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0297.pdf
Size:
2.84 MB
Format:
Adobe Portable Document Format