Alchemy: A structured task distribution for meta-reinforcement learning

Abstract

There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, which combines structural richness with structural transparency. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.

Authors' Notes

When humans are faced with a new task, we are typically able to tackle it with admirable speed, requiring very little experience to get going. This kind of efficiency and flexibility is something we would also like to see in artificial agents. However, although there has recently been dramatic progress in building deep reinforcement learning (RL) agents that can perform complex tasks after extensive training, getting deep RL agents to rapidly master new tasks remains an open problem.

One promising approach is meta-learning or learning to learn. The idea here is that the learner gains repurposable knowledge across a large set of experiences, and as this knowledge accumulates, it allows the learner to adapt more and more quickly to each new task it encounters. There has been rapidly growing interest in developing methods for meta-learning within deep RL. Although there has been substantive progress toward such ‘meta-reinforcement learning,’ research in this area has been held back by a shortage of benchmark tasks. In the present work, we aim to ease this problem by introducing (and open-sourcing) Alchemy, a useful new benchmark environment for meta-RL, along with a suite of analysis tools.