Advanced optimization methods for DQN

Department of Electrical Engineering

Control Robotics and Machine Learning Lab

Technion - Israel Institute of Technology

המעבדה לבקרה רובוטיקה ולמידה חישובית

Advanced optimization methods for DQN

Background

Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy.

Training a deep neural network, requires solving a non-convex optimization problem. Is has been found that different optimization techniques can have a major impact in the final performance.

As an example, the DQN uses two neural networks and a two-timescale update scheme to help convergence. Does there exist a better solution?

Most researchers tend to use adaptive gradient methods, such as ADAM and RMSProp, though recently works such as Wilson et. al. have shown that these solutions tend to generalize worse than the simple stochastic gradient descent (SGD).

In this work we aim to explore the effects and combinations of different optimization techniques. Such as ‘Stochastic variance-reduced gradient method’ (SVRG), a more robust solution to variance reduction, Boosted FQI, and several optimization tricks such as using different optimizers (SGD, ADAM, RMSProp) and combining them together during the learning phase.

Project Goal

Develop and implement various optimization techniques such as SVRG in the DQN framework.

Project steps

Understand the DQN framework.
Get a basic understanding of various optimization techniques.
Implement and assess the effects of various optimization techniques.
Suggest and test your own ideas!

Required knowledge

Strong programming skills.
Any knowledge in DL and RL is an advantage.
Any knowledge in optimization is an advantage.

Environment

Torch / TensorFlow / PyTorch

Comments and links

See the DQN paper by Google.
See this paper regarding SGD vs Adaptive methods.
See this paper regarding the YellowFin optimizer.
See this paper regarding the SVRG optimization method.
See this paper regarding shallow updates for DRL.
See this paper regarding Boosted FQI.