Federica Arrigoni,Kathlen Kohn,Andrea Fusiello et al.
Federica Arrigoni et al.
The concept of viewing graph solvability has gained significant interest in the context of structure-from-motion. A viewing graph is a mathematical structure where nodes are associated with cameras and edges represent the epipolar geometry ...
Allies Teach Better than Enemies: Inverse Adversaries for Robust Knowledge Distillation [0.03%]
盟友胜过敌人的教学方式:用于鲁棒知识蒸馏的逆对手方法
Junhao Dong,Raoof Zare Moayedi,Yew-Soon Ong et al.
Junhao Dong et al.
Adversarially robust knowledge distillation aims to compress a large-scale robust teacher model into a lightweight student counterpart while preserving adversarial robustness and natural performance. Previous methods primarily focused on al...
Top-$k$ Feature Selection in Sparse Learning via Accelerated Coordinate Descent Method [0.03%]
基于加速坐标下降法的稀疏学习中Top-k特征选择问题研究
Han Zhang,Yannian Gu,Feiping Nie et al.
Han Zhang et al.
Top-$k$ feature selection in sparse learning is a fundamental problem in machine learning. It is difficult to conquer due to the rigid $/ell _{2,0}$-norm constraint. Existing literature mostly relaxes the constraint and seeks the approximat...
Yong Li,Yuanzhi Wang,Yi Ding et al.
Yong Li et al.
Human multimodal emotion recognition (MER) seeks to infer human emotions by integrating information from language, visual, and acoustic modalities. Although existing MER approaches have achieved promising results, they still struggle with i...
Evaluating and Mitigating Relationship Hallucinations in Large Vision-Language Models [0.03%]
大型视觉-语言模型中关系幻觉的评估与缓解
Mingrui Wu,Jiale Li,Jiayi Ji et al.
Mingrui Wu et al.
The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detect...
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations [0.03%]
仅需少量上下文示例即可对齐囚禁和守卫语言模型
Zeming Wei,Yifei Wang,Ang Li et al.
Zeming Wei et al.
Large Language Models (LLMs) have demonstrated remarkable success across diverse applications, yet their susceptibility to malicious exploitation remains a critical challenge. Notably, LLMs are known to be vulnerable to jailbreaking attacks...
EvTexture++: Event-Driven Texture Enhancement for Video Super-Resolution [0.03%]
基于事件驱动的视频超分辨率纹理增强算法
Dachun Kai,Jiayao Lu,Yueyi Zhang et al.
Dachun Kai et al.
Event-based vision has drawn increasing attention owing to its distinctive properties, including ultra-high temporal resolution and extreme dynamic range. Recent works have introduced it to video super-resolution (VSR) to enhance flow estim...
DiffusionLight-Turbo: Accelerated Light Probes for Free Via Single-Pass Chrome Ball Inpainting [0.03%]
扩散光-涡轮增压:通过单次passes的铬球上色加速自由光线探测器
Worameth Chinchuthakun,Pakkapon Phongthawee,Amit Raj et al.
Worameth Chinchuthakun et al.
We introduce a simple yet effective technique for estimating lighting from a single low-dynamic-range (LDR) image by reframing the task as a chrome ball inpainting problem. This approach leverages a pre-trained diffusion model, Stable Diffu...
Jun-Peng Jiang,Si-Yang Liu,Hao-Run Cai et al.
Jun-Peng Jiang et al.
Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks ...
NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models [0.03%]
基于神经增强提示调优的对抗鲁棒视觉语言模型
Jiaming Zhang,Xin Wang,Xingjun Ma et al.
Jiaming Zhang et al.
Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capabilities in understanding relationships between visual and textual data through joint embedding spaces. Despite their effectiveness, these models remain vulnerable ...