Learning Instance Segmentation by Interaction

Deepak Pathak*
Yide Shentu*
Dian Chen*
Pulkit Agrawal*
Trevor Darrell
Sergey Levine
Jitendra Malik
University of California, Berkeley
* equal contribution
[Download Paper]
[Github Code]

We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. To deal with noisy training signal for segmenting objects obtained by self-supervised interactions, we propose robust set loss. A dataset of robot's interactions along-with a few human labeled examples is provided as a benchmark for future research. We test the utility of the learned segmentation model by providing results on a downstream vision-based control task of rearranging multiple objects into target configurations from visual inputs alone.

Source Code and Robot Interaction Dataset

We have released the implementation of robust set loss and link to dataset on the github page. Try it out!

Self-Supervised Data Collection

Our robotic agent collected data for segmentation in an unsupervised fashion for several days, 24x7.

Segmentation by Interaction

A brief summary of our approach is shown below.

Paper and Bibtex

[Paper]  [ArXiv]

Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine and Jitendra Malik. Learning Instance Segmentation by Interaction
In CVPR Workshop on Benchmarks for Deep Learning in Robotic Vision 2018.

      Author = {Pathak, Deepak and
      Shentu, Yide and Chen, Dian and
      Agrawal, Pulkit and Darrell, Trevor and
      Levine, Sergey and Malik, Jitendra},
      Title = {Learning Instance Segmentation
        by Interaction},
      Booktitle = {CVPR Workshop on Benchmarks for
        Deep Learning in Robotic Vision},
      Year = {2018}


We would like to thank members of BAIR community for fruitful discussions. This work was supported in part by ONR MURI N00014-14-1-0671; DARPA; NSF Award IIS-1212798; Berkeley DeepDrive, Valrhona Reinforcement Learning Fellowship and an equipment grant from NVIDIA. DP is supported by the Facebook graduate fellowship.