|
Nabil Walid Rafi
Md. Sazzadul Islam Prottasha
Sharmeen Jahan Seema
Prithviraj Chowdhury
Keywords:
Large Language Model (LLM); Machine Learning, Artificial Intelligence; Disease Diagnosis; Image Processing; MedGemma.
Abstract:
The integration of artificial intelligence (AI) has revolutionized medical image diagnostics. While Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have significant image classification contributions, their clinical utility is restricted due to a lack of medical specialization and natural language processing features. Multimodal Large Language Models (MLLMs) address these gaps with agentic capabilities in specializations like Radiology, Dermatology, Pathology, and Ophthalmology. This systematic review focuses on Novel Taxonomy of MLLM architectures, categorized by data fusion paradigm: Early Fusion, Late Fusion, and the highly effective Cognitive Integration method. This taxonomy highlights the efficiency of the Intermediate/Cognitive Integration approach, which is essential for effectively aligning features from separate modalities to support complex tasks like Radiology Report Generation (RRG) and Whole Slide Imaging (WSI) analysis. This paper further examines key barriers to clinical deployment, specifically data heterogeneity and hallucination risks. To transition MLLMs into trustworthy clinical assistants, the review proposes a Roadmap for Future Research. This roadmap recommends core, high-impact tasks, including frameworks for verifying results, implementing efficient architectural scaling, and addressing patient data security through privacy preserving architectures.
|
|

International Journal of Recent Research and Review
ISSN: 2277-8322
Vol. XVIII, Issue 3
September 2025
|
PDF View
PUBLISHED
September 2025
ISSUE
Vol. XVIII, Issue 3
SECTION
Articles
|