Dense Captioning of Images

Shreyas More *, Amrutesh Taral**, Bhavya Shah***, Sneh Joshi****, Kirankumari Sinha *****
* Department of Computer Science, Indian Institute of Technology, Kharagpur, West Bengal, India.
**-***** Department of Information Technology, K J Somaiya College of Engineering, Mumbai, Maharashtra, India.
Periodicity:March - May'2020
DOI : https://doi.org/10.26634/jit.9.2.17308

Abstract

Dense Image Captioning task describes the objects within an image by identifying them and their surroundings and establishing a relationship between them. The architecture comprises a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) language model that generates the captions. This project requires a system making use of computer vision to both find regions and describe them in natural language. The images are passed through a Convolutional network to identify the region features. These features then form the input for the Recurrent neural network, which generates the captions for the regions encompassing the relationships between the objects.

Keywords

Natural Language Processing, Deep Learning, Machine Learning, Computer Vision.

How to Cite this Article?

More, S., Taral, A., Shah, B., Joshi, S., and Sinha, K. (2020). Dense Captioning of Images. i-manager's Journal on Information Technology, 9(2), 20-26. https://doi.org/10.26634/jit.9.2.17308

References

[1]. Barnard, K., Duygulu, P., Forsyth, D., Freitas, N. D., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3 (Feb), 1107-1135.
[2]. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137-1155.
[3]. Chen, X., & Lawrence Zitnick, C. (2015). Mind's eye: A recurrent visual representation for image caption generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2422- 2431).
[4]. Chen, X., Fang, H., Lin, T. Y., Vedantam, R., Gupta, S., Dollár, P., & Zitnick, C. L. (2015). Microsoft coco captions: Data collection and evaluation server. Computer Vision and Pattern Recognition. Retrieved from https://arxiv.org/ abs/1504.00325
[5]. Faust, O., Hagiwara, Y., Hong, T. J., Lih, O. S., & Acharya, U. R. (2018). Deep learning for healthcare applications based on physiological signals: A review. Computer Methods and Programs in Biomedicine, 161, 1-13. https://doi.org/10.1016/j.cmpb.2018.04.005
[6]. Jianglin, Y., Zhigang, G., & Gang, C. (2019, February). Recurrent convolution attention model (RCAM) for text generation based on title. In Journal of Physics: Conference Series (Vol. 1168, No. 5, p. 052049). IOP Publishing.
[7]. Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4565- 4574).
[8]. Li, X., Jiang, S., & Han, J. (2019, July). Learning object context for dense captioning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 8650- 8657). https://doi.org/10.1609/aaai.v33i01.33018650
[9]. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3156-3164).
[10]. Ya, J., Liu, T., Li, Q., Lv, P., Shi, J., & Guo, L. (2018, October). Fast and accurate typosquatting domains evaluation with Siamese networks. In Proceedings of IEEE Military Communications Conference (MILCOM 2018) (pp. 58-63). IEEE. https://doi.org/10.1109/MILCOM.2018. 85 99686
[11]. Zitnick, C. L., & Dollár, P. (2014, September). Edge boxes: Locating object proposals from edges. In European Conference on Computer Vision (pp. 391- 405). Cham: Springer. https://doi.org/10.1007/978-3-319- 1060 2-1_26
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.