Noise Robust Speech Recognition under Noisy Environments

P. Sunitha*, V. Sailaja **, B. Vasantha Lakshmi ***
*-*** Department of Electronics and Communications Engineering, Pragati Engineering College, Andhra Pradesh, India.
Periodicity:July - December'2020
DOI : https://doi.org/10.26634/jpr.7.2.18094

Abstract

This paper presents a new method for improving the recognition accuracy of a speech recognition system in a noisy environment by using robust speech enhancement technique with the aid of noise estimation algorithm. The robustness of a speech recognition system can be improved by improving the speech quality at signal level by means of noise suppression algorithms, feature extraction level or at modelling phase. The proposed method uses robust speech enhancement technique as a pre-processing operation to improve the recognition accuracy in presence of noise. The suggested method is evaluated in terms of recognition accuracy. The suggested method yields better results in terms of recognition accuracy in presence of eight different types of non-stationary noises under different SNR levels when compared with the baseline speech recognition system.

Keywords

Speech Recognition, Speech Enhancement, Noise Estimation, Spectral Subtraction.

How to Cite this Article?

Sunitha, P., Sailaja, V., and Lakshmi, B. V. (2020). Noise Robust Speech Recognition under Noisy Environments. i-manager's Journal on Pattern Recognition, 7(2), 23-28. https://doi.org/10.26634/jpr.7.2.18094

References

[1]. Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113-120. https://doi. org/10.1109/TASSP.1979.1163209
[2]. Chugh, A., Rana, P., & Rana, S. (2014). Speech recognition system using wavelet transform. International Journal of Computer Science and Mobile Computing, 3(8), 63-71.
[3]. Gupta, V. K., Bhowmick, A., Chandra, M., & Sharan, S. N. (2011, February). Speech enhancement using MMSE estimation and spectral subtraction methods. In 2011, International Conference on Devices and Communications (ICDeCom) (pp. 1-5). IEEE. https://doi.org/10.1109/ICDE COM.2011.5738532
[4]. Hidayat, R., Bejo, A., Sumaryono, S., & Winursito, A. (2018, July). Denoising speech for MFCC feature extraction using wavelet transformation in speech recognition system. In 2018, 10th International Conference on Information Technology and Electrical Engineering (ICITEE) (pp. 280-284). IEEE. https://doi.org/10.1109/ICITEED.2018. 8534807
[5]. Hirsch, H. G., & Ehrlicher, C. (1995, May). Noise estimation techniques for robust speech recognition. In 1995, International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 153-156). IEEE. https://doi.org/ 10.1109/ICASSP.1995.479387
[6]. Karray, L., & Martin, A. (2003). Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Communication, 40(3), 261- 276. https://doi.org/10.1016/S0167-6393(02)00066-3
[7]. Kinnunen, T., Saeidi, R., Sedlák, F., Lee, K. A., Sandberg, J., Hansson-Sandsten, M., & Li, H. (2012). Low-variance multitaper MFCC features: A case study in robust speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 20(7), 1990-2001. https://doi.org/ 10.1109/TASL.2012.2191960
[8]. Lockwood, P., Boudy, J., & Blanchet, M. (1992, March). Non-linear spectral subtraction (NSS) and hidden Markov models for robust speech recognition in car noise environments. In Acoustics, Speech, and Signal Processing, IEEE International Conference (Vol. 1, pp. 265- 268). IEEE Computer Society. https://doi.ieeecomputer society.org/10.1109/ICASSP.1992.225921
[9]. Martin, R. (1994). Spectral subtraction based on minimum statistics. In Proceedings of European Signal Processing (pp.1182-1185).
[10]. Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504-512. https://doi.org/10.1109/89.928915
[11]. Ris, C., & Dupont, S. (2001). Assessing local noise level estimation methods: Applications to noise robust ASR. Speech Communication, 34(1-2), 141-158. https://doi. org/10.1016/S0167-6393(00)00051-0
[12]. Shao, Y., & Chang, C. H. (2010). Bayesian separation with sparsity promotion in perceptual wavelet domain for speech enhancement and hybrid speech recognition. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 41(2), 284-293. https://doi.org/ 10.1109/TSMCA.2010.2069094
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.