Document Type
Article
Publication Date
1-1-2019
Publication Title
Computer Modeling in Engineering and Sciences
Publisher
Tech Science Press
Volume
118
Issue
2
First page number:
253
Last page number:
274
Abstract
In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments.
Keywords
Reinforcement learning; Graph; Exploration and exploitation; Maze
Disciplines
Computer Engineering
File Format
File Size
1,105 KB
Language
English
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Repository Citation
Li, H.,
Chen, T.,
Teng, H.,
Jiang, Y.
(2019).
A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation.
Computer Modeling in Engineering and Sciences, 118(2),
253-274.
Tech Science Press.
http://dx.doi.org/10.31614/cmes.2019.05807