References
[1]. Barnard, K., Duygulu, P., Forsyth, D., Freitas, N. D., Blei,
D. M., & Jordan, M. I. (2003). Matching words and
pictures. Journal of Machine Learning Research, 3 (Feb),
1107-1135.
[2]. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C.
(2003). A neural probabilistic language model. Journal of
Machine Learning Research, 3(Feb), 1137-1155.
[3]. Chen, X., & Lawrence Zitnick, C. (2015). Mind's eye: A
recurrent visual representation for image caption
generation. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 2422-
2431).
[4]. Chen, X., Fang, H., Lin, T. Y., Vedantam, R., Gupta, S.,
Dollár, P., & Zitnick, C. L. (2015). Microsoft coco captions:
Data collection and evaluation server. Computer Vision
and Pattern Recognition. Retrieved from https://arxiv.org/
abs/1504.00325
[5]. Faust, O., Hagiwara, Y., Hong, T. J., Lih, O. S., &
Acharya, U. R. (2018). Deep learning for healthcare
applications based on physiological signals: A review. Computer Methods and Programs in Biomedicine, 161,
1-13. https://doi.org/10.1016/j.cmpb.2018.04.005
[6]. Jianglin, Y., Zhigang, G., & Gang, C. (2019,
February). Recurrent convolution attention model
(RCAM) for text generation based on title. In Journal of
Physics: Conference Series (Vol. 1168, No. 5, p. 052049).
IOP Publishing.
[7]. Johnson, J., Karpathy, A., & Fei-Fei, L. (2016).
Densecap: Fully convolutional localization networks for
dense captioning. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (pp. 4565-
4574).
[8]. Li, X., Jiang, S., & Han, J. (2019, July). Learning object
context for dense captioning. In Proceedings of the AAAI
Conference on Artificial Intelligence (Vol. 33, pp. 8650-
8657). https://doi.org/10.1609/aaai.v33i01.33018650
[9]. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015).
Show and tell: A neural image caption generator. In
Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (pp. 3156-3164).
[10]. Ya, J., Liu, T., Li, Q., Lv, P., Shi, J., & Guo, L. (2018,
October). Fast and accurate typosquatting domains
evaluation with Siamese networks. In Proceedings of IEEE
Military Communications Conference (MILCOM 2018)
(pp. 58-63). IEEE. https://doi.org/10.1109/MILCOM.2018.
85 99686
[11]. Zitnick, C. L., & Dollár, P. (2014, September). Edge
boxes: Locating object proposals from edges. In
European Conference on Computer Vision (pp. 391-
405). Cham: Springer. https://doi.org/10.1007/978-3-319-
1060 2-1_26