Transformer-based visual question answering model comparison
Published in Journal of Physics: Conference Series, 2023
This paper provides a comparative analysis of two excellent models, LXMERT and UNITER, evaluating their performance on visual question answering (VQA) tasks. The study demonstrates the reasons for performance differences on specific downstream tasks and provides insight into the possibility of further model optimization for multi-modal tasks.
Recommended citation: Zhicheng He, Yuanzhi Li, Dingming Zhang. (2023). "Transformer-based visual question answering model comparison." Journal of Physics: Conference Series, 2646(1), 012031.
Download Paper
