首页 正文

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications

{{output}}
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. However, the pairwise token affinity and complex matrix operations limit its deployment on resource-constrained scenarios a... ...