Curiosity-driven Exploration by
Self-supervised Prediction
Deepak Pathak
Pulkit Agrawal
Alexei A. Efros
Trevor Darrell
University of California, Berkeley
ICML 2017
[Download Paper]
[Github Code]



In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch.


Demo Video



Source Code and Demo

We are releasing demo on the github webpage. It has been built upon TensorFlow and OpenAI Gym. Check out !
[GitHub]


Intrinsic Curiosity Module (ICM)

We propose intrinsic curiosity formulation to help agent exploration. Curiosity help agent discover the environment out of curiosity when extrinsic rewards are spare or not present at all. Our proposed intrinsic model (ICM) is learned jointly with agent's policy even without any rewards from the environment. A glimpse of our model is shown in figure below. For more details, refer to the paper.





Paper

[Paper 2MB]  [arXiv]

Citation
 
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros and Trevor Darrell. Curiosity-driven Exploration by Self-supervised Prediction.
In ICML 2017.

[Bibtex]
@inproceedings{pathakICMl17curiosity,
    Author = {Pathak, Deepak and
    Agrawal, Pulkit and
    Efros, Alexei A. and
    Darrell, Trevor},
    Title = {Curiosity-driven Exploration
    by Self-supervised Prediction},
    Booktitle = {ICML},
    Year = {2017}
}



Acknowledgements

We would like to thank Sergey Levine, Evan Shelhamer, Georgia Gkioxari, Saurabh Gupta, Phillip Isola and other members of the BAIR lab for fruitful discussions and comments. We thank Jacob Huh for help with Figure-2 and Alexey Dosovitskiy for VizDoom maps. This work was supported in part by NSF IIS-1212798, IIS-1427425, IIS-1536003, IIS-1633310, ONR MURI N00014-14-1-0671, Berkeley DeepDrive, equipment grant from Nvidia, NVIDIA Graduate Fellowship to DP, and the Valrhona Reinforcement Learning Fellowship.