Curiosity-driven Exploration by
Self-supervised Prediction

Deepak Pathak

Pulkit Agrawal

Alexei A. Efros

Trevor Darrell

University of California, Berkeley

ICML 2017

[Download Paper]

[Github Code]

In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch.

Demo Video

Source Code and Demo

We have released the TensorFlow based implementation for on the github page. It builds upon OpenAI Gym with factorized RL environment wrappers which are generally useful. Try our code!

[GitHub]

Intrinsic Curiosity Module (ICM)

We propose intrinsic curiosity formulation to help agent exploration. Curiosity help agent discover the environment out of curiosity when extrinsic rewards are spare or not present at all. Our proposed intrinsic model (ICM) is learned jointly with agent's policy even without any rewards from the environment. A glimpse of our model is shown in figure below. For more details, refer to the paper.

Paper

[Paper 2MB] [arXiv]

Citation

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros and Trevor Darrell. Curiosity-driven Exploration by Self-supervised Prediction.
In ICML 2017.

[Bibtex]

@inproceedings{pathakICMl17curiosity,
    Author = {Pathak, Deepak and
    Agrawal, Pulkit and
    Efros, Alexei A. and
    Darrell, Trevor},
    Title = {Curiosity-driven Exploration
    by Self-supervised Prediction},
    Booktitle = {ICML},
    Year = {2017}
}

In the Media

Tutorials and discussions on the web

Quanta Magazine

The Wall Street Journal

Reddit Discussion

Selected media articles

New Scientist	MIT Technology Review	Wired
California Magazine	Digital Trends	Tech Xplore
Engadget	NY Post	India Times
Publico	Futurism	Cognition X
Caixin (chinese)	Real AI

Acknowledgements

We would like to thank Sergey Levine, Evan Shelhamer, Georgia Gkioxari, Saurabh Gupta, Phillip Isola and other members of the BAIR lab for fruitful discussions and comments. We thank Jacob Huh for help with Figure-2 and Alexey Dosovitskiy for VizDoom maps. This work was supported in part by NSF IIS-1212798, IIS-1427425, IIS-1536003, IIS-1633310, ONR MURI N00014-14-1-0671, Berkeley DeepDrive, equipment grant from Nvidia, NVIDIA Graduate Fellowship to DP, and the Valrhona Reinforcement Learning Fellowship.