DSpace Repository

An Off-Policy Natural Policy Gradient Method for a Partial Observable Markov Decision Process

Show simple item record

dc.contributor.author Nakamura Yutaka
dc.contributor.author Mori Takeshi
dc.contributor.author Ishii Shin
dc.date.accessioned 2017-11-09T19:36:47Z
dc.date.available 2017-11-09T19:36:47Z
dc.date.issued 2005
dc.identifier.uri http://hdl.handle.net/123456789/2753
dc.description.abstract There has been a problem called " exploration-exploitation problem " in the field of reinforcement learning. An agent must decide whether to explore a better action which may not necessarily exist, or to exploit many rewards by taking the current best action. In this article, we propose an off-policy reinforcement learning method based on a natural policy gradient learning, as a solution of the exploration-exploitation problem. In our method, the policy gradient is estimated based on a sequence of state-action pairs sampled by performing an arbitrary " behavior policy " ; this allows us to deal with the exploration-exploitation problem by handling the generation process of behavior policies. By applying to an autonomous control problem of a three-dimensional cart-pole, we show that our method can realize an optimal control efficiently in a partially observable domain.
dc.format application/pdf
dc.subject
dc.title An Off-Policy Natural Policy Gradient Method for a Partial Observable Markov Decision Process
dc.type generic


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account