i-manager Publications

Review on Acoustic Modeling for Continuous Speech Recognition

R.Mohan*, M.Kalamani**

* M.E Scholar, Applied Electronics, Bannari Amman Institute of Technology, Sathyamangalam, India.

** Assistant Professor (Sr.G) Department of ECE, Bannari Amman Institute of Technology, Sathyamangalam, India.

Periodicity:October - December'2014
DOI : https://doi.org/10.26634/jdp.2.4.3145

Abstract

The speech recognition is the most important research area to recognize the speech signal by the computer. To develop the recognition rate of the continuous speech signal, we preferred frontend process such as speech segmentation, feature extraction (MFCC) and clustering techniques i.e., Fuzzy c means clustering is the formation of clusters from the extracted features based on similar sense and form the optimum number of clusters. In speech recognition the acoustic models are the major role to testing the trained data. Here the acoustic models for continuous speech recognition was discussed i.e., The Hidden morkov model (HMM),Gaussian mixture model(GMM) and GMM-UBM(Universal Background Model) are the most suitable acoustic models which are used for train the speech signal and recognize the corresponding text data.

Keywords

Hidden Markov Model (HMM), Gaussian Mixture Model, GMM-UBM, Mel Frequency Cepstral Coefficients (MFCC), Fuzzy c means (FCM) Clustering

How to Cite this Article?

Mohan.R., and Kalamani.M. (2014). Review On Acoustic Modeling For Continuous Speech Recognition. i-manager’s Journal on Digital Signal Processing, 2(4), 30-33. https://doi.org/10.26634/jdp.2.4.3145

References

[1]. Fazel and S. Chakrabartty, (2011). "An Overview of Statistical Pattern Recognition Techniques for Speaker Verification," Circuits and Systems Magazine, IEEE, Vol. 11, pp. 62-81.

[2]. B. H. Juang, S. Levinson, and M. Sondhi, (1986). “Maximum likelihood estimation for multivariate mixture observations of Markov chains,” IEEE Transactions on Information Theory, Vol. 32, No. 2, pp. 307–309.

[3]. D. A. Reynolds and R. C. Rose, (1995). “Robust textindependent speaker identification using Gaussian mixture speaker models.” IEEE Trans. on Speech and Audio Processing, Vol. 3, No. 1, pp. 72–83.

[4]. S. B. Davis and P. Mermelstein, (1980). “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.” IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. 28, No. 4, pp. 357–366.

[5]. R. Auckenthaler, E. S. Parris, and M. J. Carey, (1999). “Improving a GMM speaker verification system by phonetic weighting.”in Proc. IEEE Int'lConf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 313–316.

[6]. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, (2000). “Speaker verification using adapted gaussian mixtures models.” Digital Signal Processing, Vol. 10, pp. 19–41.

[7]. P. Maragos and A. Potamianos, (1999). “Fractal dimensions of speech sounds: computation and application to automatic speech recognition”, Journal of the Acoustical Society of America, Vol.105, No 3, pp. 1925- 1932.

[8]. V. Purushothama, S. Narayanana, Suryana-rayana, A.N. Prasadb, (2005). “Multi-fault diagnosis of rolling bearing elements using wavelet analysis and hidden Markov model based fault recognition”, NDT&E International, Vol. 38, pp. 654–664.

[9]. N. Karamangala and R. Kumaraswamy, (2013). "Speaker Recognition in Uncontrolled Environment: A Review," Journal of Intelligent Systems, Vol. 22, pp. 49-65.

[10]. T. May, et al., (2012). "Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling," Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 20, pp. 108-121.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

Review on Acoustic Modeling for Continuous Speech Recognition

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: