Transformer-based visual question answering model comparison

Published in Journal of Physics: Conference Series, 2023

This paper provides a comparative analysis of two excellent models, LXMERT and UNITER, evaluating their performance on visual question answering (VQA) tasks. The study demonstrates the reasons for performance differences on specific downstream tasks and provides insight into the possibility of further model optimization for multi-modal tasks.

Recommended citation: Zhicheng He, Yuanzhi Li, Dingming Zhang. (2023). "Transformer-based visual question answering model comparison." Journal of Physics: Conference Series, 2646(1), 012031.
Download Paper

Share on

Twitter Facebook LinkedIn