Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
{{output}}
Representing the cutting-edge technique of text-to-image models, the latest Multimodal Diffusion Transformer (MMDiT) largely mitigates many generation issues existing in previous models. However, we discover that it still suffers from subject neglect or mixing... ...