Stages of Development of Speech Signal Processing, Problems and Algorithms
DOI:
https://doi.org/10.48149/jciees.2024.4.1.2Keywords:
Hidden Markov models, convolutional neural networks, Dynamic Time Wrapping, artificial intelligence algorithms, informative propertiesAbstract
This article develops mathematical methods of learning and decision-making in speech recognition, when extracting informative properties from speech signals, intelligent algorithms based on a data set consisting of properties. Effective ways to isolate features from speech signals are explained in sequential steps. These sequences are reflected in the database of several solutions (HMM, CNN and DTV) based on their comparative analysis, and the results are presented in the form of a table. In the experiments, sounds, syllables and words of speech were evaluated based on familiar intelligent algorithms, and a high result was achieved using the CNN mode.
Metrics
References
Abdullaeva, M., I. Khujayorov and M. Ochilov (2021). Formant set as a main parameter for recognizing vowels of the Uzbek language, International Conference on Information Science and Communications Technologies (ICISCT), Tashkent, Uzbekistan, 2021, pp. 1-5.
M. M. Mahmudovich, A. M. Ilkhamovna and T. B. Shukhrat Ogli, (2022). Image Approach to Uzbek Speech Recognition, IEEE 22nd International Conference on Communication Technology (ICCT), Nanjing, China, 2022, pp. 1201-1206.
Musaev M., Abdullaeva M., Ochilov M. (2022). Advanced Feature Extraction “Method for Speaker Identification Using a Classification Algorithm”. AIP Conference Proceedings, 2022, pp 256-265.
W. Chan (2016). Speech recognition with attention-based recurrent neural networks, US Patent 9,990,918, 2018.
W. Chan (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).–– IEEE. 2016, pp. 4960––4964.
W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig (2016) Achieving human parity in conversational speech recognition, Technical Report MSR-TR-2016 pp.71.
O. Abdel-Hamid (2014) Convolutional neural networks for speech recognition, IEEE/ACM Transactions on audio, speech, and language processing. Vol. 22, 2014, pp. 1533––1545.
Lama, P. (2010) Speech recognition with dynamic time warping using MATLAB, Project Report, CS 525, Spring 2010.
B. W. Gawali. (2010) Marathi isolated word recognition system using MFCC and DTW features, Proc. of Int. Conf. on Advances in Computer Science. Vol. 1, Citeseer. 2010, pp. 21––24.
Y. LeCun, Y. Bengio (1995). Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks Vol. 3361, no. 10, 1995.
P. Ahmadi and M. Joneidi (2014). A new method for voice activity detection based on sparse representation, 2014 7th International Congress on Image and Signal Processing, Dalian, 2014, pp. 878-882.
V. A. Volchenkov and V. V. Vityazev (2016) Development and testing of the voice activity detector based on use of special pilot signal, 5th Mediterranean Conference on Embedded Computing (MECO), Bar, 2016, pp. 108-111.
Z. Fan, Z. Bai, X. Zhang, S. Rahardja and J. Chen (2019). AUC Optimization for Deep Learning Based Voice Activity Detection, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp.
Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, and Yoshua Bengio, (2018) Light gated recurrent units for speech recognition, IEEE Transactions on Emerging Topics in Computational Intelligence, 2018, pp. 92–102.
D. Palaz, M. Magimai.-Doss, and R. Collobert (2015). Convolutional neural networks-based continuous speech recognition using raw speech signal, in Proc. of ICASSP, April 2015.
S. M. Mon and H. M. Tun (2015). Speech-to-text conversion (STT) system using hidden Markov model (HMM), International Journal of Scientific & Technology Research, vol. 4, no. 6, Jun. 2015 pp.349-352.
Savchenko V. V. (2017). Study of the stationarity of random time series using the principle of the information-divergence mini mum, Radiophysics and Quantum Electronics, vol. 60, no. 1, 2017, pp. 81—87.
Pahini A. Trivedi (2014). Introduction to Various Algorithms of Speech Recognition: Hidden Markov Model, Dynamic Time Warping and Artificial Neural Networks, International Journal of Engineering Development and Research, Volume 2, Issue 4, 2014.
G. Harsha Vardhan, G. Hari Charan (2014). Artificial Intelligence & its Applications for Speech Recognition, International Journal of Science and Research (IJSR) Volume 3 Issue 8, August 2014.
Weng C., Yu D., Seltzer M. L., Droppo J. (2014). Single-channel mixed speech recognition using deep neural networks, IEEE ICASSP. 2014, pp. 5632-5636.
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury, (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition” - IEEE, Signal Processing Magazine, 2012.
Shaw Talebi. (2020). The Fast Fourier Transform (FFT)”, Published in The Startup 7 min read Dec 4, 2020.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 The Journal of CIEES

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.