1. |
Kovaleva O, Shivade C, Kashyap S, et al. Towards visual dialog for radiology//Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, Online: ACL, 2020: 60-69..
|
2. |
Dai W, Hou L, Shang L, et al. Enabling multimodal generation on clip via vision-language knowledge distillation. arXiv preprint, 2022, arXiv: 2203.06386..
|
3. |
Yang Z, He X, Gao J, et al. Stacked attention networks for image question answering//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online: CVPR, 2016: 21-29..
|
4. |
Joshi V, Mitra P, Bose S. Multi-modal multi-head self-attention for medical VQA. Multimedia Tools and Applications, 2024, 83(14): 42585-42608..
|
5. |
Liu B, Zhan L M, Wu X M. Contrastive pre-training and representation distillation for medical visual question answering based on radiology images//24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Cham: Springer International Publishing, 2021: 210-220..
|
6. |
Chen Z, Du Y, Hu J, et al. Multi-modal masked autoencoders for medical vision-and-language pre-training//International Conference on Medical Image Computing and Computer-Assisted Intervention, Cham: Springer Nature Switzerland, 2022: 679-689..
|
7. |
Ossowski T, Hu J. Multimodal prompt retrieval for generative visual question answering. arXiv preprint, 2023, arXiv: 2306.17675..
|
8. |
Chen J, Yang D, Jiang Y, et al. MISS: a generative pre-training and fine-tuning approach for Med-VQA//International Conference on Artificial Neural Networks. Cham: Springer Nature Switzerland, 2024: 299-313..
|
9. |
Marino K, Chen X, Parikh D, et al. Krisp: integrating implicit and symbolic knowledge for open-domain knowledge-based vqa//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online: CVPR, 2021: 14111-14121..
|
10. |
Lin B, Chen Z, Li M, et al. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. arXiv preprint, 2023, arXiv: 2304.14204..
|
11. |
Zhan J, Dai J, Ye J, et al. AnyGPT: unified multimodal LLM with discrete sequence modeling. arXiv preprint, 2024, arXiv: 2402.12226..
|
12. |
Du Z, Qian Y, Liu X, et al. GLM: general language model pretraining with autoregressive blank infilling. arXiv preprint, 2021, arXiv: 2103.10360..
|
13. |
Gu T, Yang K, Liu D, et al. LaPA: latent prompt assist model for medical visual question answering//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online: CVPR, 2024: 4971-4980..
|
14. |
Liu J, Hu T, Zhang Y, et al. Parameter-efficient transfer learning for medical visual question answering. IEEE Transactions on Emerging Topics in Computational Intelligence, 2023, 8(4): 2816-2826..
|
15. |
Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 5485-5551..
|
16. |
Kim J H, Jun J, Zhang B T. Bilinear attention networks. Advances in Neural Information Processing Systems, 2018, 31: 1-11..
|
17. |
Li J, Li D, Xiong C, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation//International conference on machine learning, Stockholm: PMLR, 2022: 12888-12900..
|
18. |
Guo J, Li J, Li D, et al. From images to textual prompts: zero-shot visual question answering with frozen large language models//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Online: CVPR, 22023: 10867-10877..
|
19. |
Lau J J, Gayen S, Ben Abacha A, et al. A dataset of clinically generated visual questions and answers about radiology images. Scientific Data, 2018, 5: 180251..
|
20. |
Liu B, Zhan L M, Xu L, et al. SLAKE: a semantically-labeled knowledge-enhanced dataset for medical visual question answering//2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice: IEEE, 2021: 1650-1654..
|
21. |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint, 2020, arXiv: 2010.11929..
|
22. |
Eslami S, de Melo G, Meinel C. Does clip benefit visual question answering in the medical domain as much as it does in the general domain?. arXiv preprint, 2021, arXiv: 2112.13906..
|
23. |
Nguyen B D, Do T T, Nguyen B X, et al. Overcoming data limitation in medical visual question answering//22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019), Cham: Springer International Publishing, 2019: 522-530..
|
24. |
Do T, Nguyen B X, Tjiputra E, et al. Multiple meta-model quantifying for medical visual question answering//24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021), Cham: Springer International Publishing, 2021: 64-74..
|
25. |
Gong H, Chen G, Mao M, et al. VQAMix: conditional triplet mixup for medical visual question answering. IEEE Transactions on Medical Imaging, 2022, 41(11): 3332-3343..
|
26. |
Pan H, He S, Zhang K, et al. AMAM: an attention-based multimodal alignment model for medical visual question answering. Knowledge-Based Systems, 2022, 255: 109763..
|
27. |
Lin W, Zhao Z, Zhang X, et al. PMC-CLIP: contrastive language-image pre-training using biomedical documents//International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCI 2023). Cham: Springer Nature Switzerland, 2023: 525-536..
|
28. |
Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization//Proceedings of the IEEE International Conference on Computer Vision, Online: ICCV, 2017: 618-626..
|