Vowel Recognition for Rehabilitation Assessment of Speech Disorder Patients via Multi-source Frequency Spectrum Images

Nur Syahmina Ahmad Azhar; Nik Mohd Zarifie Hashim; Masrullizam Mat Ibrahim; Mahmud Dwi Sulistiyo

doi:10.21123/bsj.2024.9202

المؤلفون

Nur Syahmina Ahmad Azhar كلية التكنولوجيا والإلكترونيات وهندسة الحاسوب، جامعة ملقا التقنية الماليزية، ماليزيا.
Nik Mohd Zarifie Hashim كلية التكنولوجيا والإلكترونيات وهندسة الحاسوب، جامعة ملقا التقنية الماليزية، ماليزيا.
Masrullizam Mat Ibrahim كلية التكنولوجيا والإلكترونيات وهندسة الحاسوب، جامعة ملقا التقنية الماليزية، ماليزيا.
Mahmud Dwi Sulistiyo كلية الحاسبات، جامعة تيلكوم، جاوة الغربية، إندونيسيا.

DOI:

https://doi.org/10.21123/bsj.2024.9202

الكلمات المفتاحية:

شبكة عصبية ملتوية (CNN)، التعلم العميق، معامل ميل التردد الرأسي MFCC))، إعادة التأهيل، الطيف، التعرف على حروف العلة

الملخص

هناك مجموعة واسعة من الأسباب الطبية لضعف الاتصال ، مثل اضطرابات الكلام ، وفقدان السمع ، وإصابات الدماغ ، والسكتة الدماغية ، والإعاقات الجسدية. نتيجة لذلك ، يمكن أن يؤثر اضطراب التواصل مدى الحياة على التنمية الاجتماعية والعلاقة الشخصية. يمكن أن تستفيد اضطرابات النطق من علاجات النطق المبكرة ؛ ومع ذلك ، لا تزال غالبية مرافق إعادة التأهيل في جميع أنحاء العالم تنفذ هذه العملية يدويًا. من وجهة نظر عالمية ، تم إجراء مجموعة واسعة من الدراسات حول معالجة الكلام لمختلف اللغات البشرية. نظرًا لأن رؤية الكمبيوتر قد أثرت على هذا المجال ، فقد تم تطبيق التعلم الآلي والتعلم العميق في الصناعة الطبية والرعاية الصحية لتعزيز إعادة التأهيل من خلال استخدام التكنولوجيا الجديدة. حللت هذه الدراسة دقة تصنيف الشبكة المصممة والنماذج الأخرى المدربة مسبقًا (VGG-Net و AlexNet و Inception) وأجرت تحليلًا مقارنًا كاملًا لتقييم دقة التصنيف للعديد من النماذج المدربة مسبقًا. في هذا العمل المقترح ، لإنجاز مهمة التصنيف هذه ، يتم تحويل الصوت لاحقًا إلى الصورة كطريقة جديدة لرؤيتها في الشبكة العصبية عبر مفهوم مقترح حديثًا يسمى بيانات ملف تعريف الصورة. أنتجت مجموعات البيانات التي تم تصنيفها عن طريق الصور والتي استخدمت مخططًا طيفيًا ومعامل تردد ميل التردد (MFCC) أفضل نتائج هذه الدراسة ودقتها. يهدف هذا المشروع إلى تطوير شبكة عصبية جديدة يمكنها التمييز بنجاح بين أحرف العلة من أصوات الأشخاص العاديين والمرضى الذين يعانون من اضطرابات الكلام والمزيج من المجموعتين السابقتين باستخدام الفئتين الستة والثاني عشر من حروف العلة الملايو. وفقًا للبيانات التجريبية التي تم إجراؤها ، ونموذج الشبكة المصمم ، والذي استخدم 6 أحجام دفعات ، و 20 حقبة ، و ADAM كمحسِّن ، قدم هذا المشروع وحقق قيم الدقة القصوى لكلا الفئتين لبيانات الصوت الخاصة بالصور في جميع التحليلات التي تم إجراؤها.

Received 08/06/2023

Revised 02/02/2024

Accepted 04/02/2024

Published Online First 20/08/2024

المراجع

Peter L, Keith J. A Course in Phonetics. 6th edition. Cengage Learning; 2010. 1-336.

Julio Cesar CV, Anders E. Acoustic Analysis of Vowel Formant Frequencies in Genetically Related and Non-Genetically Related Speakers with Implications for Forensic Speaker Comparison. PLoS ONE. 2021; 1-31. https://doi.org/10.1371/journal.pone.0246645

Rebecca T, Victor B, Ruth T, Kira R. Children's Phonology Awareness: Confusions between Phonemes that Differ Only in Voicing. J Exp Child Psychol. 1998; 68(1): 3-21. https://doi.org/10.1006/jecp.1997.2410

Halil I. CERF-oriented Probe into Pronunciation: Implications for Language Leaners and Teachers. J Lang Linguist Stud. 2019; 2(4): 420-436. http://dx.doi.org/10.17263/jlls.586087

Susan WJ, Dylan E. Postproke Aphasia Rehabilitation: Why All Talk and No Action.

Neurorehabil Neural Repair. 2019; 33(4): 235-244. http://dx.doi.org/10.1177/1545968319834901

Perrotta G. Aphasia: Definition, Clinical Contexts, Neurobiological Profiles and Clinical Treatments. Ann Alzheimers Dement Care. 2020; 4(1): 21-26. http://dx.doi.org/10.17352/aadc.000014.

Perrotta G. Dysarthria: Definition, Clinical Contexts, Neurobiological Profiles and Clinical Treatments. Arch Community Med Public Health. 2020; 6(2): 142-145. http://dx.doi.org/10.17352/2455-5479.000094

Aisha J, Fernando L, Omer R. Interaction between People with Dysarthria and Speech Recognition Systems:A Review. Assistive Technology: Assist Technol. 2023; 35(4): 330-338. http://dx.doi.org/10.1080/10400435.2022.2061085.

Jung EP. Apraxia: Review and Update. J Clin Neurol. 2017; 13(4): 317-324. http://dx.doi.org/10.3988/jcn.2017.13.4.317

Jeremy L, Alexander N, Yehoshua YZ. Classification of Audio Signals using Spectrogram Surfaces and Extrinsic Distortion Measures. EURASIP J Adv Signal Process. 2022. https://doi.org/10.1186/s13634-022-00933-9.

Nurul AT, Siuly S, Hua W, Frank W, Kate W, Yanchun Z. A Spectrogram Image based Intelligent Technique for Automatic Detection of Autism Spectrum Disorder from EEG. PLoS ONE. 2021; 16(6), e0253094. https://doi.org/10.1371/journal.pone.0253094

Prabakaran D, Sriuppili S. Speech Processing: MFCC Based Feature Extraction Techniques- An Investigation. J Phys Conf Ser . 2021. http://dx.doi.org/10.1088/1742-6596/1717/1/012009

Shikha G, Jafreezal J, Fatimah WA, Arpit B. Feature Extraction Using MFCC. Signal and Image Processing an International Journal. 2013; 4(4): 101-108. http://dx.doi.org/10.5121/sipij.2013.4408

Shalbbya A, Safdar T, Syed SK, Naseem R. Mel Frequency Cepstral Coefficient: A Review. Proceedings of the 2nd International Conference of ICT for Digital, Smart, aand Sustainable Development (ICIDSSD). 2021. http://dx.doi.org/10.4108/eai.27-2-2020.2303173

Niyada R, Sunee P. An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition. Appl Sci. 2022. Vol and pages?? http://dx.doi.org/10.1109/iSAI-NLP48611.2019.9045520

Amna A, Hamid M, Fatimah A, Hafiz FA, Abdulaziz A. An Approach for Pronunciation Classification of Classical Arabic Phonemes Using Deep Learning. Appl Sci. 2022; 12: 1-19. https://doi.org/10.3390/app12010238

Chandra KD, Afiahayati. Suitable CNN Weight Initialization and Activation Function for Javanese Vowels Classification. Procedia Comput Sci. 2018; vol.? 124-132. https://doi.org/10.1016/j.procs.2018.10.512

Md. N. A Spectrogram Image based Intelligent Technique for Automatic Detection of Autism Spectrum Disorder from EEG. PLoS ONE. 2021; 16(6): 1-20. https://doi.org/10.1371/journal.pone.0253094

Shikha G, Jafreezal J, Wan FW, Arpit B. Feature Extraction using MFCC. Signal and Image Processing: An International Journal (SIPIJ). 2013; 4(4): 101-108. http://dx.doi.org/10.5121/sipij.2013.4408

Luis CS, Sergio V, Omar L, Ana CC, Jhon S, Jan BR. Recognition of EEG Signals from Imagined Vowels Using Deep Learning Methods. Sensors. 2021; 21(9): 6503. https://doi.org/10.3390/s21196503

Nemanja M. Introduction to Convolutional Neural Networks: with Image Classification using PyTorch. Apress. 2020.

Shawn H, Sourish C, Daniel P, Jort FG. CNN Architectures for Large-scale Audio Classification. In IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings. 2017; p. 131-135. https://doi.org/10.48550/arXiv.1609.09430

Parashar D, Praveen D, Ahmad YJ, Vijay D. A Near Real-time Automatic Speaker Recognition Architecture for Voice-based User Interface. Mach Learn Knowl Extr. 2019; 1(1): 504-520. https://doi.org/10.3390/make1010031

Zrifie H, Adilah HZ, Juzaila AL, Rostam AH, Farizal H, Maisarah K. Analysis on Vowel /E/ in Malay Language Recognition Via Convolution Neural Network (CNN). Theor Appl Inf Technol. 2022; 5: 1301-1318.

Giulio P. Aphasia: Definition, Clinical Contexts, Neurobiological Profiles and Clinical Treatments. Psychologist sp.ed Strategic Psychotherapist. 2020; 4(1): 21-26. http://dx.doi.org/10.17352/aadc.000014

Tarza HA, Fattah A, Berivan HA. COVID-19 Diagnosis System using SimpNet Deep Model. Baghdad Sci J. 2022; 19(5): 1078-1089. https://doi.org/10.21123/bsj.2022.6074

Osamah YF, Bashar SM, Ayad RA. Using VGG Models with Intermediate Layer Feature Maps for Static Hand Gesture Recognition. Baghdad Sci J. 2023; 20(5): 1808-1816. https://doi.org/10.21123/bsj.2023.7364

التعرف على حروف العلة لتقييم تأهيل مرضى اضطراب الكلام عبر صور الطيف الترددي متعدد المصادر

المؤلفون

DOI:

الكلمات المفتاحية:

الملخص

المراجع

التنزيلات

إصدار

القسم

الرخصة

كيفية الاقتباس

Journal Info
Journal: Baghdad Science Journal
Publisher: College of Science for Women/ University of Baghdad
Baghdad Sci. J. is peer-reviewed and open access
Print ISSN: 2078-8665
Electronic ISSN: 2411-7986
Publishing Frequency: Quarterly (from 2004 - 2021) Bi-monthly (from 2022) Monthly (from 2024)
Launched Date: 2004
Abbreviation: Baghdad Sci.J.
Each published paper in Baghdad Sci. J. has a digital object identifier (DOI) number