A 2D + 3D rich data approach to scene understanding

Xiao, Jianxiong

dc.contributor.advisor	Antonio Torralba.	en_US
dc.contributor.author	Xiao, Jianxiong	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2014-02-10T17:00:47Z
dc.date.available	2014-02-10T17:00:47Z
dc.date.issued	2013	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/84901
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 213-227).	en_US
dc.description.abstract	On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes - a kitchen, an elevator, your office - and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision since the field was first established 50 years ago. In this dissertation, we aim to rethink the path researchers took over these years, challenge the standard practices and implicit assumptions in the current research, and redefine several basic principles in computational scene understanding. The key idea of this dissertation is that learning from rich data under natural setting is crucial for finding the right representation for scene understanding. First of all, to overcome the limitations of object-centric datasets, we built the Scene Understanding (SUN) Database, a large collection of real-world images that exhaustively spans all scene categories. This scene-centric dataset provides a more natural sample of human visual world, and establishes a realistic benchmark for standard 2D recognition tasks. However, while an image is a 2D array, the world is 3D and our eyes see it from a viewpoint, but this is not traditionally modeled. To obtain a 3D understanding at high-level, we reintroduce geometric figures using modern machinery. To model scene viewpoint, we propose a panoramic place representation to go beyond aperture computer vision and use data that is close to natural input for human visual system. This paradigm shift toward rich representation also opens up new challenges that require a new kind of big data - data with extra descriptions, namely rich data. Specifically, we focus on a highly valuable kind of rich data - multiple viewpoints in 3D - and we build the SUN3D database to obtain an integrated place-centric representation of scenes. We argue for the great importance of modeling the computer's role as an agent in a 3D scene, and demonstrate the power of place-centric scene representation.	en_US
dc.description.statementofresponsibility	by Jianxiong Xiao.	en_US
dc.format.extent	227 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	A 2D + 3D rich data approach to scene understanding	en_US
dc.title.alternative	Two-dimensional plus three-dimensional rich data approach to scene understanding	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	868829591	en_US

Files in this item

Name:: 868829591-MIT.pdf
Size:: 45.86Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record