• Computer Science and Engineering College, Dalian Minzu University, Dalian, Liaoning 116650, P. R. China;
YU Yuhai, Email: yuyh@dlnu.edu.cn
Export PDF Favorites Scan Get Citation

Medical visual question answering (MVQA) plays a crucial role in the fields of computer-aided diagnosis and telemedicine. Due to the limited size and uneven annotation quality of the MVQA datasets, most existing methods rely on additional datasets for pre-training and use discriminant formulas to predict answers from a predefined set of labels. This approach makes the model prone to overfitting in low resource domains. To cope with the above problems, we propose an image-aware generative MVQA method based on image caption prompts. Firstly, we combine a dual visual feature extractor with a progressive bilinear attention interaction module to extract multi-level image features. Secondly, we propose an image caption prompt method to guide the model to better understand the image information. Finally, the image-aware generative model is used to generate answers. Experimental results show that our proposed method outperforms existing models on the MVQA task, realizing efficient visual feature extraction, as well as flexible and accurate answer outputs with small computational costs in low-resource domains. It is of great significance for achieving personalized precision medicine, reducing medical burden, and improving medical diagnosis efficiency.

Citation: WANG Rui, MENG Jiana, YU Yuhai, HAN Siwei, LI Xinghao. Image-aware generative medical visual question answering based on image caption prompts. Journal of Biomedical Engineering, 2025, 42(3): 560-566, 574. doi: 10.7507/1001-5515.202412040 Copy

Copyright © the editorial department of Journal of Biomedical Engineering of West China Medical Publisher. All rights reserved

  • Previous Article

    Application of multi-scale spatiotemporal networks in physiological signal and facial action unit measurement
  • Next Article

    Thyroid nodule segmentation method integrating receiving weighted key-value architecture and spherical geometric features