Nonlinear Transformed Low-Rank Quaternion Tensor Total Variation for Multidimensional Color Image Completion [0.03%]
非线性变换低秩四元数张量总变化的多维彩色图像补全方法
Liqiao Yang,Yexun Hu,Tai-Xiang Jiang et al.
Liqiao Yang et al.
Completing multidimensional color images is a fundamental challenge in image processing and computer vision. However, some tensor-based methods often treat RGB channels as independent modes, thereby neglecting their intrinsic correlations. ...
Collaborated with Hallucination: Enhancing Egocentric Grounded Question Answering via Error Demonstrations [0.03%]
利用错误示范增强第一人称视角 grounding 问答
Shenshen Li,Xing Xu,Fumin Shen et al.
Shenshen Li et al.
The grounded question answering in egocentric videos (Ego-GQA) aims to identify the relevant temporal window and generate corresponding responses in natural language given a textual question. Compared with third-person videos, egocentric vi...
Ding Qi,Jian Li,Shuguang Dou et al.
Ding Qi et al.
Dataset distillation improves neural network training efficiency by compressing large real datasets into compact synthetic datasets. Existing methods typically optimize matching objectives, such as aligning gradients, features, and trajecto...
Dizhan Xue,Shengsheng Qian,Changsheng Xu
Dizhan Xue
Recently, backdoor attacks on Deep Neural Networks (DNNs) have raised urgent security threats, which can manipulate the behavior of an attacked model by embedding the backdoor trigger into the input. Since triggers can be designed to be ste...
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [0.03%]
ReconX:使用视频扩散模型从稀疏视图重建任何场景
Fangfu Liu,Wenqiang Sun,Hanyang Wang et al.
Fangfu Liu et al.
Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a d...
Zero-Pose-Prior NeRF: Recursive Radiance Field Reconstruction from Unposed and Unordered Images [0.03%]
无需姿态先验的NeRF模型:无姿态无序图像的辐射场递归重建方法
Xinxin Liu,Qi Zhang,Xue Wang et al.
Xinxin Liu et al.
The dependence of neural radiance fields (NeRF) on accurate camera poses has emerged as a critical obstacle to their widespread real-world applications. While recent advances have demonstrated the potential for simultaneously addressing cam...
A Multi-level Self-Distillation-Based Unified Tracker for Efficient RGB-T Tracking [0.03%]
一种基于多级自蒸馏的高效RGB-T统一跟踪器
Mohamed Awad,Ahmed Elliethy,M Omair Ahmad et al.
Mohamed Awad et al.
RGB-Thermal (RGB-T) tracking enhances visual tracking robustness by combining RGB and thermal infrared (TIR) modalities, addressing limitations of RGB-only trackers under challenging conditions such as low light and appearance variations. H...
DAMind: Zero-shot Visual Cross-Domain Alignment and Representation for EEG Decoding [0.03%]
基于零样本视觉跨领域对齐和表征的EEG解码方法
Haodong Jing,Yongqiang Ma,Panqi Yang et al.
Haodong Jing et al.
To efficiently assist humans in various tasks, it is crucial to accurately decode and understand the rich information embedded in brain's visual cognition. Existing brain-driven research often fails to overcome the challenge of small target...
Exploring Hierarchical Cross-Modal Correlation Consistency for Partial Mismatching [0.03%]
探索层次化跨模性相关一致性以解决部分不匹配问题
Xiaoqing Liu,Zhiwen Yu,Jun Yu et al.
Xiaoqing Liu et al.
Cross-modal retrieval facilitates more flexible information access and improves semantic understanding across different modalities. However, traditional cross-modal retrieval models rely on well-aligned datasets, which are often labor-inten...
Video Frame Interpolation via Appearance-based Intermediate Flow Estimation [0.03%]
基于外观的中间流量估计的视频帧插值技术
Keyi Chen,Jingwei Xin,Nannan Wang et al.
Keyi Chen et al.
Intermediate flow estimation is an important part of video frame interpolation (VFI). Most previous works use interpolation to derive the intermediate flow assuming localized linear motion. However, this method is not effective when dealing...