UVaT: Uncertainty Incorporated View-Aware Transformer for Robust Multi-View Classification [0.03%]
UVaT:不确定性的视图感知变换器在鲁棒的多视图分类中的应用
Yapeng Li,Yong Luo,Bo Du
Yapeng Li
Existing multi-view classification algorithms usually assume that all examples have observations on all views, and the data in different views are clean. However, in real-world applications, we are often provided with data that have missing...
Rethinking Self-Training for Semi-Supervised Landmark Detection: A Selection-Free Approach [0.03%]
无选择性分支的半监督 landmarks检测方法重塑自训练模型研究
Haibo Jin,Haoxuan Che,Hao Chen
Haibo Jin
Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection fa...
Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration [0.03%]
动态相关性学习和正则化在多标签置信度校准中的应用
Tianshui Chen,Weihang Wang,Tao Pu et al.
Tianshui Chen et al.
Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confid...
IdeNet: Making Neural Network Identify Camouflaged Objects Like Creatures [0.03%]
IdeNet:使神经网络像动物一样识别伪装物体
Juwei Guan,Xiaolin Fang,Tongxin Zhu et al.
Juwei Guan et al.
Camouflaged objects often blend in with their surroundings, making the perception of a camouflaged object a more complex procedure. However, most neural-network-based methods that simulate the visual information processing pathway of creatu...
Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation [0.03%]
超越表观:用于高效且鲁棒视频目标分割的多帧时空上下文记忆网络
Jisheng Dang,Huicheng Zheng,Xiaohao Xu et al.
Jisheng Dang et al.
Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the object's appearance over t...
RoMo: Robust Unsupervised Multimodal Learning With Noisy Pseudo Labels [0.03%]
基于嘈杂伪标签的鲁棒无监督多模态学习(RoMo)
Yongxiang Li,Yang Qin,Yuan Sun et al.
Yongxiang Li et al.
The rise of the metaverse and the increasing volume of heterogeneous 2D and 3D data have created a growing demand for cross-modal retrieval, enabling users to query semantically relevant data across different modalities. Existing methods he...
Relation Knowledge Distillation by Auxiliary Learning for Object Detection [0.03%]
基于辅助学习的用于目标检测的关系知识蒸馏方法
Hao Wang,Tong Jia,Qilong Wang et al.
Hao Wang et al.
Balancing the trade-off between accuracy and speed for obtaining higher performance without sacrificing the inference time is a challenging topic for object detection task. Knowledge distillation, which serves as a kind of model compression...
Facial Action Unit Representation Based on Self-Supervised Learning With Ensembled Priori Constraints [0.03%]
基于自监督学习和先验约束的面部动作单元表示方法
Haifeng Chen,Peng Zhang,Chujia Guo et al.
Haifeng Chen et al.
Facial action units (AUs) focus on a comprehensive set of atomic facial muscle movements for human expression understanding. Based on supervised learning, discriminative AU representation can be achieved from local patches where the AUs are...
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation [0.03%]
一种改进 checkerboard上下文模型、使用可变形残差模块和知识蒸馏的快速高性能图像学习压缩方法
Haisheng Fu,Feng Liang,Jie Liang et al.
Haisheng Fu et al.
Deep learning-based image compression has made great progresses recently. However, some leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complex...
Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding [0.03%]
基于时空卷积神经网络的视频编解码增强内插技术
Philipp Merkle,Martin Winken,Jonathan Pfaff et al.
Philipp Merkle et al.
This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spati...