JELE_V7_N1_RP2
Voice Conversion using GMM with Minimum Distance Spectral Mapping Plus Amplitude Scaling
Neha Yadav
Vinay Kumar Jain
Journal on Electronics Engineering
2249 – 0760
7
1
9
15
Gaussian Mixture Model, Minimum Distance Spectral Mapping, Dynamic Time Warping Amplitude Scaling, Speech-to-Speech, Text-to-Speech
VC is one of the fields of the speech processing voice transformation approaches for transforming the characteristics of voice produced by a person speaking, singing or audio samples, transforming voice into simple and flexible ways, so that a listener would be able to identify the speech uttered by the target speaker. Speech processing is widely used in the research for last two decades, with an increasing commercial interest and applications of VC such as Speech-to- Speech Translation (SST), and Text-To-Speech (TTS). More traditional methods are available for voice conversion, but they do not provide better converted speech; such as GMM doesn't generate high quality converted voice because GMM based VC creates over-smoothing. Hence in this research, the authors have proposed a new voice conversion algorithm, Minimum Distance Spectral Mapping (MDSM) based on the idea of Dynamic Time Warping (DTW), where point-to-point mapping is used in the warping function. Also amplitude scaling function is intended to adjust mean source log amplitude spectrum to the mean target log amplitude spectrum, in order to reduce over-smoothing. Since most of the spectral envelopes in amplitude scaling mean log spectral envelopes are smooth, there is no necessity in finding the smoothing factors. The proposed MDSMAS preserves spectral details, provides improved speech quality and identifies the similarity between source and target data and also provides improved result in objective tests.
September – November 2016
Copyright © 2016 i-manager publications. All rights reserved.
i-manager Publications
http://www.imanagerpublications.com/Article.aspx?ArticleId=8273