Learning Instance Segmentation by Interaction

We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. To deal with noisy training signal for segmenting objects obtained by self-supervised interactions, we propose robust set loss. A dataset of robot's interactions along-with a few human labeled examples is provided as a benchmark for future research. We test the utility of the learned segmentation model by providing results on a downstream vision-based control task of rearranging multiple objects into target configurations from visual inputs alone.

@inproceedings{pathakCVPRW18segByInt, Author = {Pathak, Deepak and Shentu, Yide and Chen, Dian and Agrawal, Pulkit and Darrell, Trevor and Levine, Sergey and Malik, Jitendra}, Title = {Learning Instance Segmentation by Interaction}, Booktitle = {CVPR Workshop on Benchmarks for Deep Learning in Robotic Vision}, Year = {2018} }

Source Code and Robot Interaction Dataset

Self-Supervised Data Collection

Segmentation by Interaction

Paper and Bibtex

Acknowledgements