Speaker Verification System using Wavelet Transform and Neural Network for short utterances

  • Krishna Sarma
  • Fidalizia Pyrtuh
  • Debarun Chakraborty
Keywords: Speaker verification system, short utterances, Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), Feedforward Neural Network (FFNN), Hillbert Transform.

Abstract

 In this paper, wavelet transform technique and neural network is used for development of Speaker Verification System for short utterances. The sampled data undergo 4-level decomposition in wavelet decomposition technique. DCT (Discrete Cosine Transform) is performed on the dataset, to improve the features extraction process. This study includes Hilbert Transform, which shows the importance of magnitude and phase for speaker classification and their performance was shown. Hilbert Transform is explored, to analyze performance of phase for the data. The features are then, fed to feed-forward back propagation neural network for further classification. The proposed technique is evaluated on fixed phrase of the RedDots dataset and self-recorded numerical dataset. The proposed method performs effectively up to 95% recognition rate.          

References

[1]S. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, August 1980.
[2]L. R. Rabiner and R. W. Schafer, “Digital Processing of Speech Signals,” Prentice- Hall Englewood Cliffs,1978.
[3]E. R. Rady, A. H. Yahia, El-Dahshan, El-S.A. and H. El-Borey, “Speech Recognition System Based on Wavelet Transform and Artificial Neural Network,” Egyptian Computer Science Journal, ECS, Vol. 37 No. 3, ISSN-1110-2586 , 2013.
[4] N. Almaadeed, A. Aggoun and A. Amira, “Speaker Identification Using Multimodal Neuralnetworks and Wavelet Analysis,” IET-BMT, 2014.
[5]M. Siafarikas, T. Ganchev and Fakotakis, “Wavelet Packets Based Speaker Verification,” Proceedings of the ISCA speaker and language recognition workshop Odyssey, Toledo, Spain, May 31–June 3, pp. 257–264, 2004.
[6]T. B. Adam, M.S. Salam and T.S. Gunawan, “Wavelet Based Cepstral Coefficients for Neural Network Speech Recognition,” IEEE International Conference on Signal and Image Processing Applications( ICSIPA), October 2013.
[7]J.B. Buckheit and D.L. Donoho, Wave Lab and Reproducible Research, Dept. of Statistics, Stanford University, Tech. Rep. 474, 1995.
[8]E. Wesfred and V. Wickerhauser, “Adapted local trigonometric transforms and speech processing,” IEEE trans. on Signal Proc. 41 N.12, pp-3596-3600, 1993.
[9]E. Visser, M. Otsuka and Lee, “A Spatio-Temporal Speech Enhancement Scheme for Robust Speech Recognition in Noisy Environment,” Speech Communication, pp. 393-407, 2003.
[10]Y.A. Alotaibi, Investigation of Spoken Arabic Digits in Speech Recognition Setting. Informatics and Computer Sciences 173 (1–3)105–139, 2005.
[11]J. Lampinen and E. Oja, “Fast Self-organization by the Probing Algorithm,” Proceedings of the International Joint Conference on Neural Networks (IJCNN), volume II, pp. 503-507, Piscataway, NJ. IEEE Service Center, 1989.
[12]S. Haykin, Neural Networks: A Comprehensive Foundation, Edition 2, Prentice Hall, 1999.
[13]S. Mallat, A Wavelet Tour of Signal Processing, Elsevier, UK, 1999.
[14]S. Lung and C. Chen, “Further Reduced Form of Karhunen–Loeve transform for Text Independent Speaker Recognition ,” Electronics Letters, Volume 34, ISSN 0013-5194, pp. 1380–1382 , July 1998.
[15]M. Vetterli and J. Kovacevic, Wavelets and Subband Coding, Prentice-Hall, NewJersey, 1995.
[16]A. Shukla, R. Tiwari, H.K. Meena and R. Kala, “Speaker Identification using Wavelet Analysis and Modular Neural Networks,” J. Acoust. Soc. India (JASI), Volume 36, (1), pp. 14–19, 2009.
[17]M. Sifuzzaman, M.R. Islam and M.Z. Ali, “Application of Wavelet Transform and its Advantages Compared to Fourier Transform,” Journal of Physical Sciences, Vol. 13, pp. 121-134, ISSN: 0972-8791, 2009.
[18]V.R. Vimal Krishnan and P. Babu Anto, “Feature Parameter Extraction from Wavelet Sub band Analysis for the Recognition of Isolated Malayalam Spoken Words,” International Journal of Computer and Network Security(IJCNS), 1(1), October 2009.
[19]H. Amhia and R. Kumar, “A New Approach of Speech Compression by Using DWT & DCT,” IJAREEIE3(7), pp. 10762-10765, 2014.
[20]K.A. Lee, A. Larcher, G. Wang, P. Kenny, N. Li. H. Brummer, T. Stafylakis, J. Alam, A. Swart and J. Perrez, “The RedDots Data Collection for Speaker Recognition,” INTERSPEECH, 2015.
Published
2020-04-15
How to Cite
Sarma, K., Pyrtuh, F., & Chakraborty, D. (2020). Speaker Verification System using Wavelet Transform and Neural Network for short utterances. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146, 6(1), 30-35. https://doi.org/10.33130/AJCT.2020v06i01.006

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.