Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection [0.03%]
全"WOOD":充分利用多样化质量标签进行弱监督定向对象检测
Yi Yu,Xue Yang,Yansheng Li et al.
Yi Yu et al.
Accurately estimating the orientation of visual objects with compact rotated bounding boxes (RBoxes) has become a prominent demand, which challenges existing object detection paradigms that only use horizontal bounding boxes (HBoxes). To eq...
Minjing Dong,Yanxi Li,Yunhe Wang et al.
Minjing Dong et al.
Deep Neural Networks (DNNs) are vulnerable to adversarial attacks. Existing methods are devoted to developing various robust training strategies or regularizations to update the weights of the neural network. But beyond the weights, the ove...
Jiamian Wang,Kunpeng Li,Yulun Zhang et al.
Jiamian Wang et al.
Snapshot compressive imaging (SCI) surges as a novel way of capturing hyperspectral images. It operates an optical encoder to compress the 3D data into a 2D measurement and adopts a software decoder for the signal reconstruction. Recently, ...
Chunming He,Yuqi Shen,Chengyu Fang et al.
Chunming He et al.
Deep generative models have gained considerable attention in low-level vision tasks due to their powerful generative capabilities. Among these, diffusion model-based approaches, which employ a forward diffusion process to degrade an image a...
Yi Huang,Jiancheng Huang,Yifan Liu et al.
Yi Huang et al.
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning t...
Yunxin Li,Shenyuan Jiang,Baotian Hu et al.
Yunxin Li et al.
Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) archi...
Julia Hornauer,Amir El-Ghoussani,Vasileios Belagiannis
Julia Hornauer
Monocular depth estimation, similar to other image-based tasks, is prone to erroneous predictions due to ambiguities in the image, for example, caused by dynamic objects or shadows. For this reason, pixel-wise uncertainty assessment is requ...
CCDPlus: Towards Accurate Character to Character Distillation for Text Recognition [0.03%]
CCDPlus:面向文本识别的精确字符到字符蒸馏
Tongkun Guan,Wei Shen,Xiaokang Yang
Tongkun Guan
Existing scene text recognition methods leverage large-scale labeled synthetic data (LSD) to reduce reliance on labor-intensive annotation tasks and improve recognition capability in real-world scenarios. However, the emergence of a synth-t...
PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement [0.03%]
PhysMLE:通用且包含先验知识的多任务远程生理测量方法
Jiyao Wang,Hao Lu,Ange Wang et al.
Jiyao Wang et al.
Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is ...
Learning Emotion Category Representation to Detect Emotion Relations across Languages [0.03%]
学习情绪类别表示以检测跨语言的情绪关系
Xiangyu Wang,Chengqing Zong
Xiangyu Wang
Understanding human emotions is crucial for a myriad of applications, from psychological research to advancements in Natural Language Processing (NLP). Traditionally, emotions are categorized into distinct basic groups, which has led to the...