Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric Pu Jing, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as “pick up a cup on a kitchen table” or “navigate to a sofa on which someone is sitting”. In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet [1] dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments. The code and dataset used for evaluation can be found at https://github.com/changhaonan/OVSG.

Original languageEnglish (US)
JournalProceedings of Machine Learning Research
Volume229
StatePublished - 2023
Event7th Conference on Robot Learning, CoRL 2023 - Atlanta, United States
Duration: Nov 6 2023Nov 9 2023

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Keywords

  • Object Grounding
  • Open-Vocabulary Semantics
  • Scene Graph

Fingerprint

Dive into the research topics of 'Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs'. Together they form a unique fingerprint.

Cite this