Xunyu Zhou

Exploration versus exploitation in reinforcement learning: A stochastic control approach

We consider reinforcement learning (RL) in continuous time and study the problem of achieving the best tradeoff between exploration of black box environment and exploitation of current knowledge. We propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory formulation for the feature dynamics that captures repetitive learning under exploration. The resulting optimization problem is a resurrection of the classical relaxed stochastic control. We carry out a complete analysis of the problem in the linear--quadratic (LQ) case, and deduce that the optimal control distribution for balancing exploitation and exploration is Gaussian. This in turn interprets and justifies the widely adopted Gaussian exploration in RL, beyond its simplicity for sampling. Moreover, the exploitation and exploration are reflected respectively by the mean and variance of the Gaussian distribution.

We also find that a more random environment contains more learning opportunities in the sense that less exploration is needed other things being equal.

As the weight of exploration decays to zero, we prove the convergence of the solution to the entropy-regularized LQ problem to that of the classical LQ problem. Finally, we characterize the cost of exploration, which is shown to be proportional to the entropy regularization weight and inversely proportional to the discount rate in the LQ case. This is a joint work with Haoran Wang and Thaleia Zariphopoulou.

Contact Email:

mpelger@stanford.edu

Explore More Events

ABFR Webinar
Will Cong (Cornell): AlphaManager A Data-Driven-Robust-Control Approach to Corporate Finance

Thursday, April 25, 2024 | 9:00am - 10:00am PDT
AFTLab Seminars
Dominik Rothenhaeusler (Stanford): Out-of-distribution generalization under random, dense distributional shifts

Thursday, April 25, 2024 | 5:00pm - 6:00pm PDT

Huang 305
475 Via Ortega
Stanford, CA 94305
United States
Conferences
AI in Fintech Forum: 2024

Friday, May 10, 2024

326 Galvez Street
Frances C. Arrillaga Alumni Center
Stanford, CA 94305
United States

All Upcoming Events

Xunyu Zhou

Event Details:

Exploration versus exploitation in reinforcement learning: A stochastic control approach

Related Topics

Explore More Events

Will Cong (Cornell): AlphaManager A Data-Driven-Robust-Control Approach to Corporate Finance

Dominik Rothenhaeusler (Stanford): Out-of-distribution generalization under random, dense distributional shifts

AI in Fintech Forum: 2024