首页 正文

ThinkMatter: Panoramic-Aware Instructional Semantics for Monocular Vision-and-Language Navigation

{{output}}
Vision-and-Language Navigation in continuous environments (VLN-CE) requires an embodied robot to navigate the target destination following the natural language instruction. Most existing methods use panoramic RGB-D cameras for 360° observation of environments... ...