Multimodal Content Alignment with LLM for Visual Presentation of Papers

Published in International Conference on Document Analysis and Recognition (ICDAR), 2025

We propose Paper2PPT, a novel framework that generates visual presentations of scientific papers, prioritizing the integration of visual elements and their explanatory contexts through systematic cross-modal alignment. The framework addresses key challenges in aligning visual elements with their associated text.

Recommended citation: Huiying Hu, Zhicheng He, Yixiao Zhou, Tongwei Zhang, Xiaoqing Lyu. (2025). "Multimodal Content Alignment with LLM for Visual Presentation of Papers." International Conference on Document Analysis and Recognition, Pages 238-256.
Download Paper

Share on

Twitter Facebook LinkedIn