i-manager Publications

Enhancing the Quality of Speech using RNN and CNN

P. Vamsikrishna Mangaraya Chowdary *, G. Appala Naidu **

* Department of Systems and Signal Processing, JNTU-K University College of Engineering, Vizianagaram, Andhra Pradesh, India.

** Department of Electronics and Communication Engineering, JNTUK-University College of Engineering, Vizianagaram, Andhra Pradesh, India.

Periodicity:October - December'2019
DOI : https://doi.org/10.26634/jdp.7.4.17682

Abstract

Most of the present literature on speech enhancement focus totally on existence of noise in corrupted speech which is way from real-world environments. In this project we choose to enhancing the speech signal from the noise and reverberant using RNN and CNN. We trained separate networks for both RNN and CNN with noise, reverberation and both combination of reverberant and noise data. A simple way to enhance the quality of speech is raise the quality of the previous recordings by using speech training with speech enhancement methods like noise suppression and dereverberation using Neural Networks. The quality of voices trained with lower quality data that are enhanced using these networks was significantly higher. The comparison of RNN and CNN is shown and the experimental results are performed using MATLAB tool.

Keywords

Neural Networks, Recurrent Neural Network (RNN), Convolution Neural Network (CNN).

How to Cite this Article?

Chowdary, P. V. M., and Naidu, G. A. (2019). Enhancing the Quality of Speech using RNN and CNN. i-manager's Journal on Digital Signal Processing, 7(4), 22-29. https://doi.org/10.26634/jdp.7.4.17682

References

[1]. Hu, Y., & Loizou, P. C. (2006, May). Subjective comparison of speech enhancement algorithms. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 1). IEEE. https://doi. org/10.1109/ICASSP.2006.1659980

[2]. Karhila, R., Remes, U., & Kurimo, M. (2013). Noise in HMM-based speech synthesis adaptation: Analysis, evaluation methods and experiments. IEEE Journal of Selected Topics in Signal Processing, 8(2), 285-295. https://doi.org/10.1109/JSTSP.2013.2278492

[3]. Kinoshita, K., Delcroix, M., Ogawa, A., & Nakatani, T. (2015). Text-informed speech enhancement with deep neural networks. In Sixteenth Annual Conference of the International Speech Communication Association.

[4]. Stan, A., Watts, O., Mamiya, Y., Giurgiu, M., Clark, R. A., Yamagishi, J., & King, S. (2013, August). TUNDRA: A multilingual corpus of found data for TTS research created with light supervision. In INTERSPEECH (pp. 2331-2335).

[5]. Toda, T., & Tokuda, K. (2007). A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Information and Systems, 90(5), 816-824.

[6]. Wang, Y., & Wang, D. (2015, April). A deep neural network for time-domain signal reconstruction. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4390-4394). IEEE. https://d oi.org/10.1109/ICASSP.2015.7178800

[7]. Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Roux, J., Hershey, J. R., & Schuller, B. (2015). Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR: Latent variable analysis and signal separation. Springer International Publishing (pp. 91-99).

[8]. Weninger, F., Hershey, J. R., Le Roux, J., & Schuller, B. (2014, December). Discriminatively trained recurrent neural networks for single-channel speech separation. In IEEE Global Conference on Signal and Information Processing (GlobalSIP) (pp. 577-581). IEEE. https://doi.org /10.1109/GlobalSIP.2014.7032183

[9]. Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 1(23), 7-19. https://doi.org/10.1109%2FTASLP.2014.2364452

[10]. Yamagishi, J., Veaux, C., King, S., &Renals, S. (2012). Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1), 1-5. https://doi.org/10.12 50/ast.33.1

[11]. Alasadi, A. A., Aldhayni, T. H., Deshmukh, R. R., Alahmadi, A. H., & Alshebami, A. S. (2020). Efficient feature extraction algorithms to develop an arabic speech recognition system. Engineering, Technology & Applied Science Research, 10(2), 5547-5553. https://doi.org/ 10.48084/etasr.3465

[12]. Rabiner, L. (2012). Digital Speech Processing - Lecture 4: Speech PerceptionAuditory Models, Sound Perception Models, MOS Methods [Presentation]. Department of Electrical and Computer Engineering, University of California, Santa Barbara, USA. Retrieved from https://web. ece.ucsb.edu/Faculty/Rabiner/ece259/digital%20speech %20processing%20course/lectures_new/Lecture%204_wi nter_2012.pdf

[13]. Rani, P.M.K., Kumar, D.V., Gubbi, A., & Dattatraya. (2018). Speaker recognition technique for web browser using MFCC algorithm and RGB colour detection for mouse curser movement. International Journal of Engineering Research & Technology, 6(13), 1-6.

[14]. Vaseghi, S. V. (2007). Multimedia signal processing: Theory and applications in speech, music and communications. John Wiley & Sons.

[15]. Naylor, P. A., Gaubitch, N. D., & Habets, E. A. (2010). Signal-based performance evaluation of dereverberation algorithms. Journal of Electrical and Computer Engineering, 2010. https://doi.org/10.1155/2010/127513

[16]. Abeßer, J. (2020). A review of deep learning based methods for acoustic scene classification. Applied Sciences, 10(6). https://doi.org/10.3390/app10062020

[17]. Jiang, W., Liu, P., & Wen, F. (2018). Speech magnitude spectrum reconstruction from MFCCs using deep neural network. Chinese Journal of Electronics, 27(2), 393-398. https://doi.org/10.1049/cje.2017.09.018

[18]. NPTEL. (n.d.). Short-Time Fourier transform (STFT) [presentation]. National Programme on Technology Enhanced Learning. Retrieved from https://nptel.ac.in/ content/storage2/courses/117105145/pdf/Week_5_Lectur e_Material.pdf

[19]. Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., ... & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377.

[20]. Gopika, P., Krishnendu, C. S., Chandana, M. H., Ananthakrishnan, S., Sowmya, V., Gopalakrishnan, E. A., & Soman, K. P. (2020). Single-layer convolution neural network for cardiac disease classification using electrocardiogram signals. In Deep Learning for Data Analytics (pp. 21-35). Academic Press.

Enhancing the Quality of Speech using RNN and CNN

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content:

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25