Language Identification Using MFCC Features Derived During Oration

Manogna Maddali*, Shoba Bindu C**
* P.G Student, Department of Computer Science and Engineering, JNTUA College of Engineering, Anantapuramu, A.P, India.
** Associate Professor, Department of Computer Science and Engineering, JNTUA College of Engineering, Anantapuramu, A.P, India.
Periodicity:March - May'2015
DOI : https://doi.org/10.26634/jpr.2.1.3373

Abstract

Automatic Language Identification is the task of identifying the Spoken Language, given utterance of speech. Many Communication Systems make use of this LID. Acoustic properties are used in many experiments, as it is easy to differentiate. Instead of using these features, prosodic properties can be used to identify the Language. The main idea is to explore the duration of neighboring syllable like units as a language discriminative feature. This paper proposes a LID which uses the rhythmic properties of Spoken Speech. Prosodic Features are extracted using Mel Frequency Cepstral Coefficients (MFCC). Based on the energy levels in the Signal, Phoneme Recognition is done to identify the syllable, like units. ANN is used to train the system and results are generated. The main focus of this paper is to improve the Recognition Accuracy. The error rate is reduced when compared with other systems.

Keywords

Automatic Language Identification (LID), Speech Rhythm, Mel Frequency Cepstral Coefficients, Phoneme Recognition, Language Model

How to Cite this Article?

Maddali, M., and Bindu, C. S. (2015). Language Identification Using MFCC Features Derived During Oration. i-manager’s Journal on Pattern Recognition, 2(1), 23-28. https://doi.org/10.26634/jpr.2.1.3373

References

[1]. Jacob L Newman and Stephen J Cox, (2012). “Language Identification Using Visual Features,” IEEE Transaction on Audio, Speech, and Language Processing, Vol. 20, No. 7, pp. 1936–1947.
[2]. Zissman, Marc A., and Kay M. Berkling. “Automatic Language Identification”. Speech Communication, Vol. 35, No. 1, pp. 115-124.
[3]. R. Ramus and J. Mehler (1999). “Language Identification with Suprasegmental cues: A Study Based On Speech Resynthesis”. Journal of Acoustic Society of America, Vol. 5.
[4]. T. Nazzi, J. Bertoncini, and J. Mehler (1998). “Language Discrimination by Newborns: Towards an Understanding of the Role of Rhythm”. In Journal of Experimental Psychology: Human Perception and Performance, Vol. 24, No. 3, pp. 756–766.
[5]. G. Rigoll (2006). “Lecture Script Pattern Recognition”. Lehrstuhl für Mensch–Machine– Communication der Technischen Universität München.
[6]. S. Nakagawa, T. Seino, and Y. Ueda (1994). “Spoken language identification by ergodic HMMs and its state sequences”. Electronics Communication Japan, Pt. 3, Vol. 77, pp. 70–79.
[7]. M. A. Zissman (1996). “Comparison of Four Approaches to Automatic Language Identification of Telephone Speech”. IEEE Transactions on Speech and Audio Processing, Vol. 4, pp. 31–44.
[8]. Thymé-Gobbel, A., and Hutchins, S. E., (1999). “Prosodic Features in Automatic Language Identification Reflect Language Typology”, Proceedings of ICPhS'99, San Francisco.
[9]. Timoshenko, Ekaterina (2012). “Rhythm Information for Automated Spoken Language Identification”. Dissertation, PhD 2012.
[10]. Y. K. Muthusamy, N. Jan, and R. A. Cole (2007). “Perceptual Benchmarks for Automatic Language Identification”. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 333–336, Adelaide, Australia.
[11]. Y. K. Muthusamy, R. A. Cole, and B. Oshika (1992). “The OGI Multi-Language Telephone Speech Corpus”. Proceedings of IEEE International Conference on Spoken Language Processing (ICSLP), pp. 895–898, Banff, Alberta, Canada.
[12]. K. Berkling, T. Arai, R. A. Cole, and E. Barnard, Y. K. Muthusamy (1993). “A Comparison of Approaches to Automatic Language Identification using Telephone Speech”. Proceedings of European Conference on Speech Communication and Technology (Euro speech), pp. 1307–1310, Geneva, Switzerland.
[13]. Farinas, Jérôme, and François Pellegrino (2001). “Automatic rhythm modeli g for language identification”, Proceedings of Eurospeech, Vol. 4, Scandinavia, Aaldorg, pp. 2539-2542.
[14]. F. Ramus, (2002). “Language Discrimination by Newborns: Teasing Apart Phonotactic, Rhythmic, and Intonational Cues,” Annu. Rev. Lang.Acquisit., Vol. 2, No. 1, pp. 85–115.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.