Screen Detection from Egocentric Image Streams Leveraging Multi-View Vision Language Model [0.03%]
基于多视角视觉语言模型的主观图像流中的屏幕检测
Xueshen Li,Sen Shen,Xinlong Hou et al.
Xueshen Li et al.
Accurately monitoring the screen exposure of young children is important for research related to screen use, such as childhood obesity, physical activity, and social interaction. Most existing studies rely upon self-report or manual measure...
Jiangpeng He,Xiaoyan Zhang,Luotao Lin et al.
Jiangpeng He et al.
Deep learning-based food recognition has made significant progress in predicting food types from eating occasion images. However, two key challenges hinder real-world deployment: (1) continuously learning new food classes without forgetting...
Support Vector Regression-based Reduced-Reference Perceptual Quality Model for Compressed Point Clouds [0.03%]
基于支持向量回归的有损点云压缩减少参考感知质量模型
Honglei Su,Qi Liu,Hui Yuan et al.
Honglei Su et al.
Video-based point cloud compression (V-PCC) is a state-of-the-art moving picture experts group (MPEG) standard for point cloud compression. V-PCC can be used to compress both static and dynamic point clouds in a lossless, near lossless, or ...
Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA [0.03%]
基于可能世界VQA的视觉问答跨模态偏差的因果视角
Ali Vosoughi,Shijian Deng,Songyang Zhang et al.
Ali Vosoughi et al.
To increase the generalization capability of VQA systems, many recent studies have tried to de-bias spurious language or vision associations that shortcut the question or image to the answer. Despite these efforts, the literature fails to a...
Indoor Camera Pose Estimation from Room Layouts and Image Outer Corners [0.03%]
基于房间布局和图像外角的室内相机姿态估计
Xiaowei Chen,Guoliang Fan
Xiaowei Chen
To support indoor scene understanding, room layouts have been recently introduced that define a few typical space configurations according to junctions and boundary lines. In this paper, we study camera pose estimation from eight common roo...
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures [0.03%]
基于交叉引用的自训练网络在音频混合中的声音事件检测方法
Sangwook Park,David K Han,Mounya Elhilali
Sangwook Park
Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording. With advances in deep neural networ...
Real-Time and Accurate UAV Pedestrian Detection for Social Distancing Monitoring in COVID-19 Pandemic [0.03%]
面向新冠疫情的社交距离监测实时准确无人机行人检测方法研究
Zhenfeng Shao,Gui Cheng,Jiayi Ma et al.
Zhenfeng Shao et al.
Coronavirus Disease 2019 (COVID-19) is a highly infectious virus that has created a health crisis for people all over the world. Social distancing has proved to be an effective non-pharmaceutical measure to slow down the spread of COVID-19....
Head Motion Modeling for Human Behavior Analysis in Dyadic Interaction [0.03%]
双人互动中用于人类行为分析的头部运动模型式研究
Bo Xiao,Panayiotis Georgiou,Brian Baucom et al.
Bo Xiao et al.
This paper presents a computational study of head motion in human interaction, notably of its role in conveying interlocutors' behavioral characteristics. Head motion is physically complex and carries rich information; current modeling appr...