Papers
arxiv:2004.05599

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Published on Apr 12, 2020
Authors:
,
,
,
,

Abstract

Kernel-UCBVI, a model-based reinforcement learning algorithm, efficiently balances exploration and exploitation in finite-horizon problems using kernel estimators and achieves a novel regret bound.

AI-generated summary

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with K episodes and horizon H, we provide a regret bound of Oleft( H^3 K^{2d{2d+1}}right), where d is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2004.05599 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2004.05599 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.