Multilingual Speaker Identification System through Multiple Features Analysis of Speech Signal in Multilingual Environment

Vinay Kumar Jain*
Department of Electronics & Telecommunication Engineering, Shri Shankaracharya Technical Campus, Shri Shankaracharya Group of Institutions, Bhilai, Chhattisgarh, India.
Periodicity:January - June'2020
DOI : https://doi.org/10.26634/jdp.8.1.17838

Abstract

The Multilingual Speech Processing is the field of speech technology in which the speech signal of multiple languages of a speaker has been analyzed to observe the effect of the language on the speech features. On the basis of observation, a Multilingual Speaker Identification system can be designed for identification of the speaker in multilingual environments. For present study Multilingual Speech Processing database of different speakers has been recorded in three Indian languages, i.e., Hindi, Marathi, and Rajasthani. The sentences consist of consonants, i.e., “Cha”, “Sha” and “Jha”. Total numbers of speakers involved are 30 including males and females. The basic features of the speech signal: Pitch and first three Formant F1, F2 and F3 are calculated through PRAAT software where as cepstral features like Mel-Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) has been extracted from MATLAB software. A model is proposed to identify the speaker by multi language speech signal of a speaker using MFCC, GFCC and combined features as acoustic features. For training and testing, it is performed using neural network function Resilient Back Propagation Algorithm and Radial Basis Functions and results are compared. In this experiment accuracy of multilingual speaker identification is 94.77% using BPA and 96.52% using RBF neural network.

Keywords

Pitch, Formant, MFCC, GFCC, Multilingual.

How to Cite this Article?

Jain, V. K. (2020). Multilingual Speaker Identification System through Multiple Features Analysis of Speech Signal in Multilingual Environment. i-manager's Journal on Digital Signal Processing, 8(1), 27-33. https://doi.org/10.26634/jdp.8.1.17838

References

[1]. Bashar, M. A., Ahmed, M. T., Syduzzaman, M., Ray, P. J., & Islam, A. T. (2014). Text-independent speaker identification system using average pitch and formant analysis. International Journal on Information Theory (IJIT), 3(3), 23-30.
[2]. Bhattacharjee, U., & Sarmah, K. (2012, March). A multilingual speech database for speaker recognition. In 2012 IEEE International Conference on Signal Processing, Computing and Control (pp. 1-5). IEEE.
[3]. Bourlard, H., Dines, J., Magimai-Doss, M., Garner, P. N., Imseng, D., Motlicek, P., ..., & Valente, F. (2011). Current trends in multilingual speech processing. Sadhana, 36(5), 885-915. https://doi.org//10.1007/s12046-011-0050-4
[4]. Burgos, W. (2014). Gammatone and MFCC features in speaker recognition (Doctoral dissertation). Florida Institute of Technology Melbourne, Florida.
[5]. Moinuddin, M., & Kanthi, A. N. (2014). Speaker Identification based on GFCC using GMM. International Journal of Innovative Research in Advanced Engineering (IJIRAE), 1(8), 224-232.
[6]. Pahwa, A., & Aggarwal, G. (2016). Speech feature extraction for gender recognition. International Journal of Image, Graphics and Signal Processing, 8(9).
[7]. Rathore, P. S., & Tripathi, N. (2014). Multilingual person identification. International Journal of Engineering Trends and Technology (IJETT), 10(1), 1-3. https://doi.org/10.14445/ 22315381/IJETT-V10P201
[8]. Sarkar, S., Rao, K. S., Nandi, D., & Kumar, S. S. (2013, December). Multilingual speaker recognition on Indian languages. In 2013 Annual IEEE India Conference (INDICON) (pp. 1-5). IEEE.
[9]. Sharma, S., Shukla, A., & Mishra, P. (2014). Speech and language recognition using MFCC and DELTA-MFCC. International Journal of Engineering Trends and Technology (IJETT), 12(9), 449-452. https://doi.org/10.14445/22315381/ IJETT-V12P286
[10]. Tripathi, N. (2006). Study of face and speech parameters and identification of their relationship for emotional status recognition (Doctoral dissertation). National Institute of Technology, Raipur, India.
[11]. Vimala, C., & Radha, V. (2014). Suitable feature extraction and speech recognition technique for isolated Tamil spoken words. International Journal of Computer Science and Information Technologies (IJCSIT), 5(1), 378-3 83.
[12]. Zhao, X., & Wang, D. (2013, May). Analyzing noise robustness of MFCC and GFCC features in speaker identification. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7204-7208). IEEE.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.