首页 正文

Fine-Grained Alignment Supervision Matters in Vision-and-Language Navigation

{{output}}
The Vision-and-Language Navigation (VLN) task involves an agent navigating within 3D indoor environments based on provided instructions. Achieving cross-modal alignment presents one of the most critical challenges in VLN, as the predicted trajectory needs to p... ...