logo

SCIENTIA SINICA Informationis, Volume 46 , Issue 7 : 811-818(2016) https://doi.org/10.1360/N112015-00285

The inherent ambiguity in scene depth learning from single\\ images

More info
  • ReceivedJan 6, 2016
  • AcceptedMar 22, 2016

Abstract


Funded by

中国科学院战略先导专项(XDB02070002)

国家自然科学基金(61333015)

国家自然科学基金(61421004)

北京市自然科学基金(7142152)


References

[1] Hoiem D, Efros A A, Hebert M. Automatic photo pop-up. ACM Trans Graphics (TOG), 2005, 24: 577-584 CrossRef Google Scholar

[2] Hedau V, Hoiem D, Forsyth D. Thinking inside the box: using appearance models and context based on room geometry. In: Proceedings of the 11th European Conference on Computer Vision. Berlin: Springer, 2010. 224-237. Google Scholar

[3] Schwing A G, Urtasun R. Efficient exact inference for 3d indoor scene understanding. In: Proceedings of the 12th European Conference on Computer Vision. Berlin: Springer, 2012. 299-313. Google Scholar

[4] Saxena A, Chung S H, Ng A Y. Learning depth from single monocular images. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2005. 1161-1168. Google Scholar

[5] Saxena A, Chung S H, Ng A Y. 3-d depth reconstruction from a single still image. Int J Comput Vision, 2008, 76: 53-69. Google Scholar

[6] Saxena A, Sun M, Ng A Y. Make3d: learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell, 2009, 31: 824-840 CrossRef Google Scholar

[7] Liu M, Salzmann M, He X. Discrete-continuous depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 716-723. Google Scholar

[8] Baig M H, Jagadeesh V, Piramuthu R, et al. Im2depth: scalable exemplar based depth transfer. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), Steamboat Springs, 2014. 145-152. Google Scholar

[9] Fouhey D F, Gupta A, Hebert M. Data-driven 3d primitives for single image understanding. In: Proceedings of the IEEE International Conference on Computer Vision, Sydney, 2013. 3392-3399. Google Scholar

[10] Fouhey D F, Gupta A, Hebert M. Unfolding an indoor origami world. In: Proceedings of 13th European Conference on Computer Vision. Berlin: Springer, 2014. 687-702. Google Scholar

[11] Ladicky L, Shi J, Pollefeys M. Pulling things out of perspective. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 89-96. Google Scholar

[12] Häne C, Ladicky L, Pollefeys M. Direction matters: depth estimation with a surface normal classifier. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 381-389. Google Scholar

[13] Liu C, Yuen J, Torralba A. Nonparametric scene parsing via label transfer. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 2368-2382 CrossRef Google Scholar

[14] Karsch K, Liu C, Kang S B. Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 2144-2158 CrossRef Google Scholar

[15] Liu C, Yuen J, Torralba A. Sift flow: dense correspondence across scenes and its applications. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 978-994 CrossRef Google Scholar

[16] Konrad J, Meng W, Ishwar P, et al. Learning-based, automatic 2D-to-3D image and video conversion. IEEE Trans Image Process, 2013, 22: 3485-3496 CrossRef Google Scholar

[17] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2014. 2366-2374. Google Scholar

[18] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 2650-2658. Google Scholar

[19] Wang X L, Fouhey D F, Gupta A. Designing deep networks for surface normal estimation. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 539-547. Google Scholar

[20] Liu F, Shen C, Lin G. Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 5162-5170. Google Scholar

[21] Liu F Y, Shen C H, Lin G S, et al. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell, in Press. doi: 10.1109/TPAMI.2015.2505283. Google Scholar

[22] Li B, Shen C, Dai Y, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1119-1127. Google Scholar

[23] Silberman N, Hoiem D, Kohli P, et al. Indoor segmentation and support inference from rgbd images. In: Proceedings of the 12th European Conference on Computer Vision. Berlin: Springer, 2012. 746-760. Google Scholar

[24] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: the kitti dataset. Int J Robot Res, 2013, 32: 1231-1237 CrossRef Google Scholar

[25] Kendall A, Grimes M, Cipolla R. PoseNet: a convolutional networks for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 2938-2946. Google Scholar