StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition [0.03%]
基于加权样本的结构和语义知识蒸馏的地方识别方法
Yanqing Shen,Sanping Zhou,Jingwen Fu et al.
Yanqing Shen et al.
Visual place recognition is a challenging task for autonomous driving and robotics, which is usually considered as an image retrieval problem. A commonly used two-stage strategy involves global retrieval followed by re-ranking using patch-l...
Spatiotemporal Observer Design for Predictive Learning of High-Dimensional Data [0.03%]
高维数据预测学习的时空观测器设计
Tongyi Liang,Han-Xiong Li
Tongyi Liang
Although deep learning-based methods have shown great success in spatiotemporal predictive learning, the frameworks of those models are mainly designed by intuition. How to make spatiotemporal forecasting with theoretical guarantees is stil...
Xiang Chen,Jinshan Pan,Jiangxin Dong et al.
Xiang Chen et al.
Recent years have witnessed significant advances in image deraining due to the progress of effective image priors and deep learning models. As each deraining approach has individual settings (e.g., training and test datasets, evaluation cri...
Tianlu Zhang,Qiang Zhang,Kurt Debattista et al.
Tianlu Zhang et al.
Contemporary multi-modal trackers achieve strong performance by leveraging complex backbones and fusion strategies, but this comes at the cost of computational efficiency, limiting their deployment in resource-constrained settings. On the o...
Dong Liang,Yuanhang Gao,Ling Li et al.
Dong Liang et al.
Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into LLE a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic...
HandRT: Simultaneous Hand Shape and Appearance Reconstruction with Pose Tracking from Monocular RGB-D Video [0.03%]
HandRT:从单目RGB-D视频中同时进行手部姿态跟踪和外观重建
Pratik Kalshetti,Parag Chaudhuri
Pratik Kalshetti
We propose a method to reconstruct a personalized hand avatar, representing the user's hand shape and appearance, from a monocular RGB-D video of a hand performing unknown hand poses under unknown illumination. Our method, HandRT, jointly o...
Tao Huang,Shan You,Fei Wang et al.
Tao Huang et al.
The paper introduces DIST, an innovative knowledge distillation method that excels in learning from a superior teacher model. DIST differentiates itself from conventional techniques by adeptly handling the often significant prediction discr...
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning [0.03%]
NavCoT:通过学习分离推理来提升基于大语言模型的视觉和语言导航能力
Bingqian Lin,Yunshuang Nie,Ziming Wei et al.
Bingqian Lin et al.
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promisin...
Dongdong Ren,Wenbin Li,Tianyu Ding et al.
Dongdong Ren et al.
Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To add...
Scalable High-Fidelity 3D Hand Shape Reconstruction Via Graph-Image Frequency Mapping and Graph Frequency Decomposition [0.03%]
通过图形图像频率映射和图形频率分解实现可扩展的高保真三维手形重建
Tianyu Luan,Yuanhao Zhai,Jingjing Meng et al.
Tianyu Luan et al.
Despite the impressive performance obtained by recent single-image hand modeling techniques, they lack the capability to capture sufficient details of the 3D hand mesh. This deficiency greatly limits their applications when high-fidelity ha...