Enhancing the Quality of Speech using RNN and CNN

P. Vamsikrishna Mangaraya Chowdary *, G. Appala Naidu **
* Department of Systems and Signal Processing, JNTU-K University College of Engineering, Vizianagaram, Andhra Pradesh, India.
** Department of Electronics and Communication Engineering, JNTUK-University College of Engineering, Vizianagaram, Andhra Pradesh, India.
Periodicity:October - December'2019
DOI : https://doi.org/10.26634/jdp.7.4.17682

Abstract

Most of the present literature on speech enhancement focus totally on existence of noise in corrupted speech which is way from real-world environments. In this project we choose to enhancing the speech signal from the noise and reverberant using RNN and CNN. We trained separate networks for both RNN and CNN with noise, reverberation and both combination of reverberant and noise data. A simple way to enhance the quality of speech is raise the quality of the previous recordings by using speech training with speech enhancement methods like noise suppression and dereverberation using Neural Networks. The quality of voices trained with lower quality data that are enhanced using these networks was significantly higher. The comparison of RNN and CNN is shown and the experimental results are performed using MATLAB tool.

Keywords

Neural Networks, Recurrent Neural Network (RNN), Convolution Neural Network (CNN).

How to Cite this Article?

Chowdary, P. V. M., and Naidu, G. A. (2019). Enhancing the Quality of Speech using RNN and CNN. i-manager's Journal on Digital Signal Processing, 7(4), 22-29. https://doi.org/10.26634/jdp.7.4.17682

References

[1]. Hu, Y., & Loizou, P. C. (2006, May). Subjective comparison of speech enhancement algorithms. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 1). IEEE. https://doi. org/10.1109/ICASSP.2006.1659980
[2]. Karhila, R., Remes, U., & Kurimo, M. (2013). Noise in HMM-based speech synthesis adaptation: Analysis, evaluation methods and experiments. IEEE Journal of Selected Topics in Signal Processing, 8(2), 285-295. https://doi.org/10.1109/JSTSP.2013.2278492
[3]. Kinoshita, K., Delcroix, M., Ogawa, A., & Nakatani, T. (2015). Text-informed speech enhancement with deep neural networks. In Sixteenth Annual Conference of the International Speech Communication Association.
[4]. Stan, A., Watts, O., Mamiya, Y., Giurgiu, M., Clark, R. A., Yamagishi, J., & King, S. (2013, August). TUNDRA: A multilingual corpus of found data for TTS research created with light supervision. In INTERSPEECH (pp. 2331-2335).
[5]. Toda, T., & Tokuda, K. (2007). A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Information and Systems, 90(5), 816-824.
[6]. Wang, Y., & Wang, D. (2015, April). A deep neural network for time-domain signal reconstruction. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4390-4394). IEEE. https://d oi.org/10.1109/ICASSP.2015.7178800
[7]. Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Roux, J., Hershey, J. R., & Schuller, B. (2015). Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR: Latent variable analysis and signal separation. Springer International Publishing (pp. 91-99).
[8]. Weninger, F., Hershey, J. R., Le Roux, J., & Schuller, B. (2014, December). Discriminatively trained recurrent neural networks for single-channel speech separation. In IEEE Global Conference on Signal and Information Processing (GlobalSIP) (pp. 577-581). IEEE. https://doi.org /10.1109/GlobalSIP.2014.7032183
[9]. Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 1(23), 7-19. https://doi.org/10.1109%2FTASLP.2014.2364452
[10]. Yamagishi, J., Veaux, C., King, S., &Renals, S. (2012). Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1), 1-5. https://doi.org/10.12 50/ast.33.1
[11]. Alasadi, A. A., Aldhayni, T. H., Deshmukh, R. R., Alahmadi, A. H., & Alshebami, A. S. (2020). Efficient feature extraction algorithms to develop an arabic speech recognition system. Engineering, Technology & Applied Science Research, 10(2), 5547-5553. https://doi.org/ 10.48084/etasr.3465
[12]. Rabiner, L. (2012). Digital Speech Processing - Lecture 4: Speech PerceptionAuditory Models, Sound Perception Models, MOS Methods [Presentation]. Department of Electrical and Computer Engineering, University of California, Santa Barbara, USA. Retrieved from https://web. ece.ucsb.edu/Faculty/Rabiner/ece259/digital%20speech %20processing%20course/lectures_new/Lecture%204_wi nter_2012.pdf
[13]. Rani, P.M.K., Kumar, D.V., Gubbi, A., & Dattatraya. (2018). Speaker recognition technique for web browser using MFCC algorithm and RGB colour detection for mouse curser movement. International Journal of Engineering Research & Technology, 6(13), 1-6.
[14]. Vaseghi, S. V. (2007). Multimedia signal processing: Theory and applications in speech, music and communications. John Wiley & Sons.
[15]. Naylor, P. A., Gaubitch, N. D., & Habets, E. A. (2010). Signal-based performance evaluation of dereverberation algorithms. Journal of Electrical and Computer Engineering, 2010. https://doi.org/10.1155/2010/127513
[16]. Abeßer, J. (2020). A review of deep learning based methods for acoustic scene classification. Applied Sciences, 10(6). https://doi.org/10.3390/app10062020
[17]. Jiang, W., Liu, P., & Wen, F. (2018). Speech magnitude spectrum reconstruction from MFCCs using deep neural network. Chinese Journal of Electronics, 27(2), 393-398. https://doi.org/10.1049/cje.2017.09.018
[18]. NPTEL. (n.d.). Short-Time Fourier transform (STFT) [presentation]. National Programme on Technology Enhanced Learning. Retrieved from https://nptel.ac.in/ content/storage2/courses/117105145/pdf/Week_5_Lectur e_Material.pdf
[19]. Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., ... & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377.
[20]. Gopika, P., Krishnendu, C. S., Chandana, M. H., Ananthakrishnan, S., Sowmya, V., Gopalakrishnan, E. A., & Soman, K. P. (2020). Single-layer convolution neural network for cardiac disease classification using electrocardiogram signals. In Deep Learning for Data Analytics (pp. 21-35). Academic Press.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.