i-manager Publications

A Research on Development of an Image Caption Generator using AI and Image Processing

Yogesh Katre*, Sanika Meshram**, Harsh Nerkar***, Divya Pathrabe****, Shruti Ughade*****, Pranjali Jibhkate******

*-****** Department of Computer Science and Engineering, S. B. Jain Institute of Technology, Management and Research, Nagpur, Maharashtra, India.

Periodicity:October - December'2025
DOI : https://doi.org/10.26634/jcom.13.3.22429

Abstract

Image caption generation involves developing an appropriate textual description of an image through the combination of visual and textual information. Here, a deep learning pipeline with an encoder–decoder architecture is discussed, which uses a deep learning model, such as a convolutional neural network (for instance, ResNet50), to obtain feature representations from an image, and a sequence learning model that employs Long Short-Term Memory (LSTM) to generate the textual description of the image. Spatial attention is incorporated into the decoder to help generate more relevant and detailed captions by associating model attention across important image regions. The pipeline is evaluated using standard evaluation metrics such as BLEU, METEOR, and CIDEr, which provide scores showing how similar the newly generated captions are to human captions/annotations. Demonstrations on the standard Flickr8k dataset show that this approach produces fluent, accurate, and informative descriptions and discuss future applications of the approach, including accessibility, automated tagging, and human–computer interaction.

Keywords

Artificial Intelligence, Computer Vision, Deep Learning, Transformer Models, Image Processing, Multimodal Learning.

How to Cite this Article?

Katre, Y., Meshram, S., Nerkar, H., Pathrabe, D., Ughade, S., and Jibhkate, D. (2025). A Research on Development of an Image Caption Generator using AI and Image Processing. i-manager’s Journal on Computer Science, 13(3), 49-59. https://doi.org/10.26634/jcom.13.3.22429

References

[1]. Albadarneh, I. A., Hammo, B. H., & Al-Kadi, O. S. (2025). Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation. Computer Science Review, 58, 100766.

[2]. Chen, W., Hu, H., Li, Y., Ruiz, N., Jia, X., Chang, M. W., & Cohen, W. W. (2023). Subject-driven text-to-image generation via apprenticeship learning. Advances in Neural Information Processing Systems, 36, 30286-30305.

[3]. Dang, T. X., Oh, A., Na, I. S., & Kim, S. H. (2019). The role of attention mechanism and multi-feature in image captioning. In Proceedings of the 3rd International Conference on Machine Learning and Soft Computing (pp. 170-174).

[4]. Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. (2010). Every picture tells a story: Generating sentences from images. In European conference on computer vision (pp. 15-29). Berlin, Heidelberg: Springer Berlin Heidelberg.

[5]. Gupta, S. C., Singh, N. R., Sharma, T., Tyagi, A., & Majumdar, R. (2021). Generating image captions using deep learning and natural language processing. In 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) (pp. 1-4). IEEE.

[6]. Kesavan, V., Muley, V., & Kolhekar, M. (2019). Deep learning based automatic image caption generation. In 2019 Global Conference for Advancement in Technology (GCAT) (pp. 1-6). IEEE.

[7]. Khan, R., Huang, B., Hassan, H., Zaman, A., & Ye, Z. (2023). A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation. In 2023 5th International Conference on Robotics and Computer Vision (ICRCV) (pp. 92-99). IEEE.

[8]. Kumar, L., Kushwaha, A. S., Singh, O., Basnet, E., & Yadav, A. A. (2024). Advanced Image Captioning Using Deep Learning Techniques: A CNN-LSTM Approach. In International Conference on Intelligent Computing and Big Data Analytics (pp. 249-270). Cham: Springer Nature Switzerland.

[9]. Kushwaha, R., & Biswas, A. (2021). Hybrid feature and sequence extractor based deep learning model for image caption generation. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE.

[10]. Li, S., & Huang, L. (2021). Context-based image caption using deep learning. In 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP) (pp. 820-823). IEEE.

[11]. Mahalakshmi, P., & Fatima, N. S. (2022). Summarization of text and image captioning in information retrieval using deep learning techniques. IEEE Access, 10, 18289-18297.

[12]. Sankareswari, S., Dongarkar, B. Z., Dongarkar, H., Sarang, S., & Valke, M. (2023). Image caption generator using deep learning. International Journal of Creative Research Thoughts, 11(10).

[13]. Sharma, D., Dhiman, C., & Kumar, D. (2025). Unma-capsumt: unified and multi-head attention-driven caption summarization transformer. Journal of Visual Communication and Image Representation, 113, 104600.

[14]. Singh, V. (2025). Meet BLIP: The Vision-Language Model Powering Image Captioning. PyImageSearch.

[15]. Srinivasareddy, S., Longo, G., & Vitiello, A. (2024). Constructing an Image Captioning System using CNN and LSTM Architectures.

[16]. Srivastava, D., Singh, S. S., Rajitha, B., Verma, M., Kaur, M., & Lee, H.-N. (2023). Content-based image retrieval: A survey on local and global features selection, extraction, representation, and evaluation parameters. IEEE Access, 11.

[17]. Usman, M., & Syeed, P. S. (2022). Image caption generator using deep learning. Neuroquantology, 20(12), 2682-2691.

[18]. Yang, M., Liu, J., Shen, Y., Zhao, Z., Chen, X., Wu, Q., & Li, C. (2020). An ensemble of generation-and retrieval-based image captioning with dual generator generative adversarial network. IEEE Transactions on Image Processing, 29, 9627-9640.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

A Research on Development of an Image Caption Generator using AI and Image Processing

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: