Loic Jezequel,Jean Beaudet,Aymeric Histace et al.
Loic Jezequel et al.
Deep anomaly detection aims to provide robust and efficient classifiers for zero-shot (unsupervised, UNS) and few-shot (imbalanced supervised, IMS) settings. However, current models still struggle on edge-case normal samples and are often u...
SMTrack: State-Aware Mamba for Efficient Temporal Modeling in Visual Tracking [0.03%]
状态感知蟒蛇:视觉跟踪中高效时序建模的SMTrack
Yinchao Ma,Dengqing Yang,Zhangyu He et al.
Yinchao Ma et al.
Visual tracking aims to automatically estimate the state of a target object in a video sequence, which is challenging especially in dynamic scenarios. Thus, numerous methods are proposed to introduce temporal cues to enhance tracking robust...
Yao Xiao,Pengxu Wei,Guangrun Wang et al.
Yao Xiao et al.
A few recent works attempt to train an adversarially robust Unsupervised Domain Adaptation (UDA) model, transferring the robustness from a robust source model or other robust pre-trained models to an unlabeled target domain. However, it is ...
Partially Supervised Compositional Zero-Shot Learning by Class-Balanced Distribution Alignment [0.03%]
类别均衡分布对齐的半监督组合零样本学习
Aditya Panda,Dipti Prasad Mukherjee
Aditya Panda
The partially supervised Compositional Zero-Shot Learning (pCZSL) recognizes new compositions of states and objects, where for every image in the training set either the state or the object annotation is available. In pCZSL, features of a s...
Temporal Visual Semantics-Induced Human Motion Understanding with Large Language Models [0.03%]
基于大型语言模型的时序视觉语义诱导人体运动理解
Zheng Xing,Weibing Zhao
Zheng Xing
Unsupervised human motion segmentation (HMS) can be effectively achieved using subspace clustering techniques. However, traditional methods overlook the role of temporal semantic exploration in HMS. This paper explores the use of temporal v...
Scale-invariant Feature Matching Network for V-D-T Few-Shot Semantic Segmentation [0.03%]
一种用于V-D-TFew-Shot语义分割的尺度不变特征匹配网络
Xiaofei Zhou,Jia Lin,Dongmei Chen et al.
Xiaofei Zhou et al.
Multi-modal few-shot semantic segmentation (FSS) aims to perform dense prediction from multiple modality images including visible image, depth image, and thermal image with a few annotated samples. However, some efforts treat the three moda...
Unsupervised Domain Adaptive Object Detection via Semantic Consistency and Compactness Learning [0.03%]
基于语义一致性和紧凑性学习的无监督领域自适应目标检测方法
Yajing Liu,Zhen Zhang,Yiming Su et al.
Yajing Liu et al.
Unsupervised domain adaptive object detection methods enhance model robustness in the target domain without requiring target-domain annotations. Despite notable progress, existing methods face two major challenges: 1) insufficient and ineff...
Zhihao Chen,Yongqi Chen,Changsheng Chen et al.
Zhihao Chen et al.
Text removal is an important task in processing both scene and document images. However, existing scene text removal (STR) methods are primarily focus on scene text images. The STR models (trained by scene text images) perform poorly on doc...
TranIU-Net: Indicative Electrical Tomography Imaging Based on Implicit Unrolling Transformer [0.03%]
基于隐式展开变压器的指示性电气层析成像( TranIU-Net)
Binchun Lu,Lidan Fu,Juntao Ren et al.
Binchun Lu et al.
Deep unrolling networks have rapidly gained popularity in image reconstruction by integrating data-driven networks with iterative model-driven reconstruction algorithms. Technically, existing unrolling networks could easily break down and p...
HoloQA: Full Reference Video Quality Assessor of Rendered Human Avatars in Virtual Reality [0.03%]
基于虚拟现实中的渲染角色的全参考视频质量评估器
Avinab Saha,Yu-Chih Chen,Christian Hane et al.
Avinab Saha et al.
We present HoloQA, a new state-of-the-art Full Reference Video Quality Assessment (VQA) model that was designed using principles of visual neuroscience, information theory, and self-supervised deep learning to accurately predict the quality...