Novel Turbo Cancellation ISI/ICI DSP Concepts for FBMC-OQAM Based MIMO 5G Digital Communication Multi-Carrier Systems
Abstract
This article provides an all-encompassing survey on voice activity detection (VAD) a critical component crucial to speech processing systems responsible for detecting speech presence within an audio signal. The presented survey covers fundamental aspects related to VAD including its importance, applications, and inherent challenges faced during implementation. Our exploration initiates with establishing a solid foundation concerning the basics of VAD encompassing features and techniques in detail. Additionally key issues encountered along with challenges faced when implementing this technology efficiently is addressed. Fur- thermoses, it also delves into evaluation metrics commonly utilized for assessing overall performance whilst providing a comprehensive overview of readily accessible VAD databases. Overall, this survey predominantly presents a clear comprehend- Sion of VAD, the encountered challenges and the utilized techniques designed to overcome them ultimately serving as an esteemed resource for both researchers and professionals functioning within the speech processing field.
References
[2] Nagaraja, B.G. and Jayanna, H.S., 2016. Feature extraction and modelling tech- niques for multilingual speaker recognition: a review. International Journal of Signal and Imaging Systems Engineering, 9(2), pp.67-78.
[3] Ramirez, J., G´orriz, J.M. and Segura, J.C., 2007. Voice activity detection. funda- mentals and speech recognition system robustness. Robust speech recognition and understanding, 6(9), pp.1-22.
[4] Jainar, S.J., Sale, P.L. and Nagaraja, B.G., 2020. VAD, feature extraction and mod- elling techniques for speaker recognition: a review. International Journal of Signal and Imaging Systems Engineering, 12(1-2), pp.1-18.
[5] Heo, Y. and Lee, S., 2023. Supervised Contrastive Learning for Voice Activity De- tection. Electronics, 12(3), p.705.
[6] Graf, S., Herbig, T. and Buck, M., 2023. 13 Voice Activity Detection for In-Car Communication Systems. Towards Human-Vehicle Harmonization, 3, p.163.
[7] Zhu, Z. and Pei, K., A Robust Soft Voice Activity Detection Algorism Based on Multi-Feature Fusion Cosine Similarity at Low Signal-toNoise Ratio. Available at SSRN 4345665.
[8] Bendoumia, R., Hassani, I. and Guessoum, A., 2023. Recursive adaptive filtering algorithms for sparse channel identification and acoustic noise reduction. Analog Integrated Circuits and Signal Processing, 114(1), pp.51-73.
[9] Chien, Y.R., Zhou, M., Peng, A., Zhu, N. and Torres-Sospedra, J., 2023. Signal Processing and Machine Learning for Smart Sensing Applications. Sensors, 23(3), p.1445.
[10] Pang, J., 2017, January. Spectrum energy-based voice activity detection. In 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 1-5). IEEE.
[11] Tan, Y.W., Liu, W.J., Jiang, W. and Zheng, H., 2014, July. Hybrid svm/hmm architectures for statistical model-based voice activity detection. In 2014 International Joint Conference on Neural Networks (IJCNN) (pp. 2875-2878). IEEE.
[12] Ouahabi, S.E., Atounti, M. and Bellouki, M., 2020. HMM-GMM based Amazigh speech recognition system. International Journal of Signal and Imaging Systems Engineering, 12(1-2), pp.47-53.
[13] Sholokhov, A., Sahidullah, M. and Kinnunen, T., 2018. Semisupervised speech ac- tivity detection with an application to automatic speaker verification. Computer Speech & Language, 47, pp.132-156.
[14] Ouahabi, S.E., Atounti, M. and Bellouki, M., 2020. HMM-GMM based Amazigh speech recognition system. International Journal of Signal and Imaging Systems Engineering, 12(1-2), pp.47-53.
[15] Sandabad, S., Benba, A., Tahri, Y.S. and Hammouch, A., 2016. Novel extraction and tumour detection method using histogram study and SVM classification. Inter- national Journal of Signal and Imaging Systems Engineering, 9(4-5), pp.202-208.
[16] Fredj, I.B., Zouhir, Y. and Ouni, K., 2018. Fusion features for robust speaker identification. International Journal of Signal and Imaging Systems Engineering, 11(2), pp.65-72.
[17] Tan, Z.H. and Dehak, N., 2020. rVAD: An unsupervised segmentbased robust voice activity detection method. Computer speech & language, 59, pp.1-21.
[18] Ashwini, B. and Yuvaraju, B.N., 2017. Application of machine learning approach in detection and classification of cars of an image. International Journal of Signal and Imaging Systems Engineering, 10(1-2), pp.8-13.
[19] Graf, S., Herbig, T., Buck, M. and Schmidt, G., 2015. Features for voice activity detection: a comparative analysis. EURASIP Journal on Advances in Signal Pro- cessing, 2015, pp.1-15.
[20] Drugman, T., Stylianou, Y., Kida, Y. and Akamine, M., 2015. Voice activity detection: Merging source and filter-based information. IEEE Signal Processing Letters, 23(2), pp.252-256.
[21] Sehgal, A. and Kehtarnavaz, N., 2018. A convolutional neural network smartphone app for real-time voice activity detection. IEEE Access, 6, pp.9017-9026.
[22] Bai, Y., Yi, J., Tao, J., Wen, Z. and Liu, B., 2019, November. Voice activity detection based on time-delay neural networks. In 2019 AsiaPacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 1173- 1178). IEEE.
[23] Zhang, X.L. and Xu, M., 2022. AUC optimization for deep learningbased voice activity detection. EURASIP Journal on Audio, Speech, and Music Processing, 2022(1), pp.1-12.
[24] Lee, J., Jung, Y. and Kim, H., 2020. Dual attention in time and frequency domain for voice activity detection. arXiv preprint arXiv:2003.12266.
[25] Martinelli, F., Dellaferrera, G., Mainar, P. and Cernak, M., 2020, May. Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8544-8548). IEEE.
[26] Neo, V.W., Weiss, S., McKnight, S.W., Hogg, A.O. and Naylor, P.A., 2022, September. Polynomial eigenvalue decomposition-based target speaker voice activity detection in the presence of competing talkers. In 2022 International Workshop on Acoustic Signal Enhancement (IWAENC) (pp. 1-5). IEEE.
[27] Rho, D., Park, J. and Ko, J.H., 2022. Nas-vad: Neural architecture search for voice activity detection. arXiv preprint arXiv:2201.09032.
[28] Tong, S., Gu, H. and Yu, K., 2016, March. A comparative study of robustness of deep learning approaches for VAD. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5695-5699). IEEE.
[29] Nautsch, A., Bamberger, R. and Busch, C., 2016, September. Decision robustness of voice activity segmentation in unconstrained mobile speaker recognition environments. In 2016 International Conference of the Biometrics Special Interest Group (BIOSIG) (pp. 1-7). IEEE.
[30] Jayaprakash H., and Nagaraja B. G., 2020, A comparison of features for voice activity detection - a review and some experimental results, Vidyabharati International Interdisciplinary Research Journal, 9(2) (pp. 91-94)
[31] Zue, V., Seneff, S. and Glass, J., 1990. Speech database development at MIT: TIMIT and beyond. Speech communication, 9(4), pp.351356.
[32] Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F. and Matassoni, M., 2013, December. The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (pp. 162-167). IEEE.
[33] Nagrani, A., Chung, J.S. and Zisserman, A., 2017. VoxCeleb: a largescale speaker identification dataset. arXiv preprint arXiv:1706.08612.
[34] Snyder, D., Chen, G. and Povey, D., 2015. Musan: A music, speech, and noise corpus. arXiv preprint arXiv:1510.08484.
[35] Thiemann, J., Ito, N. and Vincent, E., 2013, June. The diverse environments multi- channel acoustic noise database (demand): A database of multichannel environmental noise recordings. In Proceedings of Meetings on Acoustics ICA2013 (Vol. 19, No. 1, p. 035081). Acoustical Society of America.
[36] Panayotov, V., Chen, G., Povey, D. and Khudanpur, S., 2015, April. Librispeech: an as corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206-5210). IEEE.
[37] Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F.M. and Weber, G., 2019. Common voice: A massively- multilingual speech corpus. arXiv preprint arXiv:1912.06670.
[38] Hu, Y. and Loizou, P.C., 2007. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on audio, speech, and language processing, 16(1), pp.229-238.
[39] Ma, J., Hu, Y. and Loizou, P.C., 2009. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. The Journal of the Acoustical Society of America, 125(5), pp.3387-3405.
[40] A.A. Zamyatnin, A.S. Borchikov, M.G. Vladimirov, O.L. Voronina, The EROP- Moscow oligopeptide database, Nucleic acids research,
34 (suppl 1) D261-D266 (2006)
To ensure uniformity of treatment among all contributors, other forms may not be substituted for this form, nor may any wording of the form be changed. This form is intended for original material submitted to AJCT and must accompany any such material in order to be published by AJCT. Please read the form carefully.
The undersigned hereby assigns to the Asian Journal of Convergence in Technology Issues ("AJCT") all rights under copyright that may exist in and to the above Work, any revised or expanded derivative works submitted to AJCT by the undersigned based on the Work, and any associated written, audio and/or visual presentations or other enhancements accompanying the Work. The undersigned hereby warrants that the Work is original and that he/she is the author of the Work; to the extent the Work incorporates text passages, figures, data or other material from the works of others, the undersigned has obtained any necessary permission. See Retained Rights, below.
AUTHOR RESPONSIBILITIES
AJCT distributes its technical publications throughout the world and wants to ensure that the material submitted to its publications is properly available to the readership of those publications. Authors must ensure that The Work is their own and is original. It is the responsibility of the authors, not AJCT, to determine whether disclosure of their material requires the prior consent of other parties and, if so, to obtain it.
RETAINED RIGHTS/TERMS AND CONDITIONS
1. Authors/employers retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
2. Authors/employers may reproduce or authorize others to reproduce The Work and for the author's personal use or for company or organizational use, provided that the source and any AJCT copyright notice are indicated, the copies are not used in any way that implies AJCT endorsement of a product or service of any employer, and the copies themselves are not offered for sale.
3. Authors/employers may make limited distribution of all or portions of the Work prior to publication if they inform AJCT in advance of the nature and extent of such limited distribution.
4. For all uses not covered by items 2 and 3, authors/employers must request permission from AJCT.
5. Although authors are permitted to re-use all or portions of the Work in other works, this does not include granting third-party requests for reprinting, republishing, or other types of re-use.
INFORMATION FOR AUTHORS
AJCT Copyright Ownership
It is the formal policy of AJCT to own the copyrights to all copyrightable material in its technical publications and to the individual contributions contained therein, in order to protect the interests of AJCT, its authors and their employers, and, at the same time, to facilitate the appropriate re-use of this material by others.
Author/Employer Rights
If you are employed and prepared the Work on a subject within the scope of your employment, the copyright in the Work belongs to your employer as a work-for-hire. In that case, AJCT assumes that when you sign this Form, you are authorized to do so by your employer and that your employer has consented to the transfer of copyright, to the representation and warranty of publication rights, and to all other terms and conditions of this Form. If such authorization and consent has not been given to you, an authorized representative of your employer should sign this Form as the Author.
Reprint/Republication Policy
AJCT requires that the consent of the first-named author and employer be sought as a condition to granting reprint or republication rights to others or for permitting use of a Work for promotion or marketing purposes.
GENERAL TERMS
1. The undersigned represents that he/she has the power and authority to make and execute this assignment.
2. The undersigned agrees to indemnify and hold harmless AJCT from any damage or expense that may arise in the event of a breach of any of the warranties set forth above.
3. In the event the above work is accepted and published by AJCT and consequently withdrawn by the author(s), the foregoing copyright transfer shall become null and void and all materials embodying the Work submitted to AJCT will be destroyed.
4. For jointly authored Works, all joint authors should sign, or one of the authors should sign as authorized agent
for the others.
Licenced by :
Creative Commons Attribution 4.0 International License.
