清华大学交叉信息研究院

Curriculum, Evolution and Emergent Complexity with Multi-Agent Reinforcement Learning

演讲人： Yi Wu OpenAI Inc.
时间： 2019-10-24 15:00-2019-10-24 16:00
地点：FIT 1-222
内容：

The theory of evolution was introduced in Darwin’s book, On the Origin of Species, in 1859, which states that the organisms on Earth are evolved through natural selection over a diverse population according to their adaptabilities to environmental changes. This evolution process eventually brings us, humans, who possess intelligence.

We observe similar evolutionary phenomena for artificial agents trained by deep reinforcement learning in simulated environments, where agents compete/collaborate, co-evolve jointly and eventually learn complex skills and strategies resembling human behavior. We believe such an observation provides a significant indication of the future of Artificial General Intelligence (AGI).

Particularly, two recent advances will be covered in this talk. In the first part, we will show that by simple hide-and-seek rules, agents naturally evolve in an open-ended physical world via self-play. This evolution process acts as an auto-curriculum and leads to 6 emergent phases of learned strategies with tool-use skills. In the second part, we will introduce a learning paradigm, Evolutionary Population Curriculum (EPC), which scales multi-agent reinforcement learning through an evolutionary process. EPC starts from just a few agents and progressively evolves the agents to harder environments with exponentially growing populations.

个人简介:

Yi Wu is now a researcher in the multi-agent team at OpenAI Inc. and will join the institute of interdisciplinary information sciences (IIIS), Tsinghua University, as an assistant professor in 2020 summer. He recently earned his PhD degree from UC Berkeley under the supervision of Prof. Stuart Russell. His research focuses on improving the generalization ability of artificial agents. He is broadly interested in a variety of topics in AI, including deep reinforcement learning, natural language processing and probabilistic programming. His work, Value Iteration Network, won the best paper award at NIPS 2016.