Synthetic Audio and Video Generation for Language Translation using GANs

Aynaan Quraishi*, Jaydeep Jethwa**, Shiwani Gupta***
*-***Department of Computer Engineering, Thakur College of Engineering, & Technology, Mumbai, India.
Periodicity:January - June'2023
DOI : https://doi.org/10.26634/javr.1.1.19412

Abstract

Language barriers create a digital divide that prevents people from benefiting from the vast amount of content produced worldwide. In addition, content creators face challenges in producing content in multiple languages to reach a wider audience. To address this problem, this study proposed a solution through a survey that utilized Generative Adversarial Networks (GAN), Natural Language Processing (NPL), and Computer Vision. A Generative Adversarial Network (GAN) is a Machine Learning (ML) model in which two neural networks compete with each other by using deep learning methods to obtain more accurate predictions. The solution provided in this study can generate synthesized videos that are close to reality, ultimately bridging the language barrier and providing access to content.

Keywords

Generative Adversarial Networks (GANs), Machine Learning (ML), Natural Language Processing, Language Barrier, Computer Vision.

How to Cite this Article?

Quraishi, A., Jethwa, J., and Gupta, S. (2023). Synthetic Audio and Video Generation for Language Translation using GANs. i-manager's Journal on Augmented & Virtual Reality, 1(1), 1-8. https://doi.org/10.26634/javr.1.1.19412

References

[3]. Denton, E. L., Chintala, S., & Fergus, R. (2015). Deep generative image models using a laplacian pyramid of adversarial networks. Advances in Neural Information Processing Systems, 28, 1-9.
[7]. Kushal, L., Evgeny, K., Wei-Ning, H., Yossi, A., Adam, P., Tu-Anh, N., ... & Emmanuel, D. (2021). Generative spoken language modeling from raw audio. Transactions of the Association for Computational Linguistics, 9, 1336–1354.
[10]. Mathieu, M. F., Zhao, J. J., Zhao, J., Ramesh, A., Sprechmann, P., & LeCun, Y. (2016). Disentangling factors of variation in deep representation using adversarial training. Advances in Neural Information Processing Systems, 29, 1-9.
[12]. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2536-2544).
[15]. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016, June). Generative adversarial text to image synthesis. In International Conference on Machine Learning (pp. 1060-1069). PMLR.
[20]. Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. Advances in Neural Information Processing Systems, 29, 1-9.
[21]. Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Advances in Neural Information Processing Systems, 29, 1-9.
[24]. Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., & Efros, A. A. (2016). Learning dense correspondence via 3d-guided cycle consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 117-126).
[26]. Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycleconsistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2223-2232).
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 40 40 300
Online 40 40 300
Pdf & Online 40 40 300

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.