Home > Articles > All Issues > 2025 > Volume 13, No. 4, 2025 >
JOIG 2025 Vol.13(4):406-418
doi: 10.18178/joig.13.4.406-418

Analysis Evolution of Image Caption Techniques: Combining Conventional and Modern Methods for Improvement

Nuha M. Khassaf 1,* and Nada Hussein M. Ali 2
1. Informatics Institute for Postgraduate Studies, Information Technology & Communications University, Baghdad, Iraq
2. Department of Computer Science, college of science, University of Baghdad, Baghdad, Iraq
Email: phd202230705@iips.edu.iq (N.M.K.); nada.husn@sc.uobaghdad.edu.iq (N.H.M.A.)
*Corresponding author

Manuscript received March 1, 2025; revised March 25, 2025; accepted April 30, 2025; published August 7, 2025.

Abstract—This study explores the challenges in Artificial Intelligence (AI) systems in generating image captions, a task that requires effective integration of computer vision and natural language processing techniques. A comparative analysis between traditional approaches such as retrievalbased methods and linguistic templates) and modern approaches based on deep learning such as encoder-decoder models, attention mechanisms, and transformers). Theoretical results show that modern models perform better for the accuracy and the ability to generate more complex descriptions, while traditional methods outperform speed and simplicity. The paper proposes a hybrid framework that combines the advantages of both approaches, where conventional methods produce an initial description, which is then contextually, and refined using modern models. Preliminary estimates indicate that this approach could reduce the initial computational cost by up to 20% compared to relying entirely on deep models while maintaining high accuracy. The study recommends further research to develop effective coordination mechanisms between traditional and modern methods and to move to the experimental validation phase of the hybrid model in preparation for its application in environments that require a balance between speed and accuracy, such as real-time computer vision applications.

Keywords—Convolutional Neural Networks (CNN), image caption, conventional methods, modern methods, hybrid approach

Cite: Nuha M. Khassaf and Nada Hussein M. Ali, "Analysis Evolution of Image Caption Techniques: Combining Conventional and Modern Methods for Improvement," Journal of Image and Graphics, Vol. 13, No. 4, pp. 406-418, 2025.

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC-BY-4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.

Article Metrics in Dimensions