Vowel Recognition for Rehabilitation Assessment of Speech Disorder Patients via Multi-source Frequency Spectrum Images
DOI:
https://doi.org/10.21123/bsj.2024.9202Keywords:
Convolutional Neural Network (CNN), Deep Learning, Mel-Frequency Cepstral Coefficient (MFCC), Rehabilitation, Spectrogram, Vowel RecognitionAbstract
Communication impairments have a broad spectrum of medical causes, such as speech disorders, hearing loss, brain injury, stroke, and physical impairments. As a result, communication disorders can affect social development and interpersonal relationships. Speech impairments can benefit from early speech treatments; however, the majority of rehab facilities across the world still carry out this process manually. A wide range of studies has been conducted on speech processing for various human languages. Machine learning and deep learning have been applied to the medical and healthcare industry to enhance rehabilitation by utilizing the new technology. This study analyzed the classification accuracy of the designed network and other pre-trained models (VGG-Net, AlexNet, and Inception) and performed a complete comparative analysis to assess the classification accuracy of several pre-trained models. The sound is converted to the image as a new way to see them in the neural network via a newly proposed concept named image-profiled data. These image-profiled datasets that used a spectrogram and a Mel-frequency cepstral coefficient (MFCC) produced this study's best results and accuracy. This project aims to develop a new neural network that can successfully distinguish between the vowels from the voices of normal people, patients with speech disorders and the mix from the prior two groups using the six and twelve classes of Malay vowels. The designed network model, which used 6 batch sizes, 20 epochs, and ADAM as the optimizer, this study presented and achieved the maximum accuracy values of both classes for image-profiled audio data in analyses conducted.
Received 08/06/2023
Revised 02/02/2024
Accepted 04/02/2024
Published Online First 20/08/2024
References
Peter L, Keith J. A Course in Phonetics. 6th edition. Cengage Learning; 2010. 1-336.
Julio Cesar CV, Anders E. Acoustic Analysis of Vowel Formant Frequencies in Genetically Related and Non-Genetically Related Speakers with Implications for Forensic Speaker Comparison. PLoS ONE. 2021; 1-31. https://doi.org/10.1371/journal.pone.0246645
Rebecca T, Victor B, Ruth T, Kira R. Children's Phonology Awareness: Confusions between Phonemes that Differ Only in Voicing. J Exp Child Psychol. 1998; 68(1): 3-21. https://doi.org/10.1006/jecp.1997.2410
Halil I. CERF-oriented Probe into Pronunciation: Implications for Language Leaners and Teachers. J Lang Linguist Stud. 2019; 2(4): 420-436. http://dx.doi.org/10.17263/jlls.586087
Susan WJ, Dylan E. Postproke Aphasia Rehabilitation: Why All Talk and No Action.
Neurorehabil Neural Repair. 2019; 33(4): 235-244. http://dx.doi.org/10.1177/1545968319834901
Perrotta G. Aphasia: Definition, Clinical Contexts, Neurobiological Profiles and Clinical Treatments. Ann Alzheimers Dement Care. 2020; 4(1): 21-26. http://dx.doi.org/10.17352/aadc.000014.
Perrotta G. Dysarthria: Definition, Clinical Contexts, Neurobiological Profiles and Clinical Treatments. Arch Community Med Public Health. 2020; 6(2): 142-145. http://dx.doi.org/10.17352/2455-5479.000094
Aisha J, Fernando L, Omer R. Interaction between People with Dysarthria and Speech Recognition Systems:A Review. Assistive Technology: Assist Technol. 2023; 35(4): 330-338. http://dx.doi.org/10.1080/10400435.2022.2061085.
Jung EP. Apraxia: Review and Update. J Clin Neurol. 2017; 13(4): 317-324. http://dx.doi.org/10.3988/jcn.2017.13.4.317
Jeremy L, Alexander N, Yehoshua YZ. Classification of Audio Signals using Spectrogram Surfaces and Extrinsic Distortion Measures. EURASIP J Adv Signal Process. 2022. https://doi.org/10.1186/s13634-022-00933-9.
Nurul AT, Siuly S, Hua W, Frank W, Kate W, Yanchun Z. A Spectrogram Image based Intelligent Technique for Automatic Detection of Autism Spectrum Disorder from EEG. PLoS ONE. 2021; 16(6), e0253094. https://doi.org/10.1371/journal.pone.0253094
Prabakaran D, Sriuppili S. Speech Processing: MFCC Based Feature Extraction Techniques- An Investigation. J Phys Conf Ser . 2021. http://dx.doi.org/10.1088/1742-6596/1717/1/012009
Shikha G, Jafreezal J, Fatimah WA, Arpit B. Feature Extraction Using MFCC. Signal and Image Processing an International Journal. 2013; 4(4): 101-108. http://dx.doi.org/10.5121/sipij.2013.4408
Shalbbya A, Safdar T, Syed SK, Naseem R. Mel Frequency Cepstral Coefficient: A Review. Proceedings of the 2nd International Conference of ICT for Digital, Smart, aand Sustainable Development (ICIDSSD). 2021. http://dx.doi.org/10.4108/eai.27-2-2020.2303173
Niyada R, Sunee P. An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition. Appl Sci. 2022. Vol and pages?? http://dx.doi.org/10.1109/iSAI-NLP48611.2019.9045520
Amna A, Hamid M, Fatimah A, Hafiz FA, Abdulaziz A. An Approach for Pronunciation Classification of Classical Arabic Phonemes Using Deep Learning. Appl Sci. 2022; 12: 1-19. https://doi.org/10.3390/app12010238
Chandra KD, Afiahayati. Suitable CNN Weight Initialization and Activation Function for Javanese Vowels Classification. Procedia Comput Sci. 2018; vol.? 124-132. https://doi.org/10.1016/j.procs.2018.10.512
Md. N. A Spectrogram Image based Intelligent Technique for Automatic Detection of Autism Spectrum Disorder from EEG. PLoS ONE. 2021; 16(6): 1-20. https://doi.org/10.1371/journal.pone.0253094
Shikha G, Jafreezal J, Wan FW, Arpit B. Feature Extraction using MFCC. Signal and Image Processing: An International Journal (SIPIJ). 2013; 4(4): 101-108. http://dx.doi.org/10.5121/sipij.2013.4408
Luis CS, Sergio V, Omar L, Ana CC, Jhon S, Jan BR. Recognition of EEG Signals from Imagined Vowels Using Deep Learning Methods. Sensors. 2021; 21(9): 6503. https://doi.org/10.3390/s21196503
Nemanja M. Introduction to Convolutional Neural Networks: with Image Classification using PyTorch. Apress. 2020.
Shawn H, Sourish C, Daniel P, Jort FG. CNN Architectures for Large-scale Audio Classification. In IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings. 2017; p. 131-135. https://doi.org/10.48550/arXiv.1609.09430
Parashar D, Praveen D, Ahmad YJ, Vijay D. A Near Real-time Automatic Speaker Recognition Architecture for Voice-based User Interface. Mach Learn Knowl Extr. 2019; 1(1): 504-520. https://doi.org/10.3390/make1010031
Zrifie H, Adilah HZ, Juzaila AL, Rostam AH, Farizal H, Maisarah K. Analysis on Vowel /E/ in Malay Language Recognition Via Convolution Neural Network (CNN). Theor Appl Inf Technol. 2022; 5: 1301-1318.
Giulio P. Aphasia: Definition, Clinical Contexts, Neurobiological Profiles and Clinical Treatments. Psychologist sp.ed Strategic Psychotherapist. 2020; 4(1): 21-26. http://dx.doi.org/10.17352/aadc.000014
Tarza HA, Fattah A, Berivan HA. COVID-19 Diagnosis System using SimpNet Deep Model. Baghdad Sci J. 2022; 19(5): 1078-1089. https://doi.org/10.21123/bsj.2022.6074
Osamah YF, Bashar SM, Ayad RA. Using VGG Models with Intermediate Layer Feature Maps for Static Hand Gesture Recognition. Baghdad Sci J. 2023; 20(5): 1808-1816. https://doi.org/10.21123/bsj.2023.7364
Downloads
Issue
Section
License
Copyright (c) 2024 Nur Syahmina Ahmad Azhar, Nik Mohd Zarifie Hashim, Masrullizam Mat Ibrahim, Mahmud Dwi Sulistiyo
This work is licensed under a Creative Commons Attribution 4.0 International License.