Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network

Asroni  Asroni; Ku Ruhana Ku-Mahamud; Cahya Damarjati; Hasan Basri  Slamat

doi:10.21123/bsj.2021.18.2(Suppl.).0925

PDF

Published: Jun 20, 2021

DOI: https://doi.org/10.21123/bsj.2021.18.2(Suppl.).0925

Keywords:

Arabic alphabet, deep learning, speech classification, COVID-19, spectrogram

Asroni Asroni

Universitas Muhammadiyah Yogyakarta, Indonesia

Ku Ruhana Ku-Mahamud

Universiti Utara Malaysia

Cahya Damarjati

Universitas Muhammadiyah Yogyakarta, Indonesia.

Hasan Basri Slamat

Universitas Muhammadiyah Yogyakarta, Indonesia

Abstract

Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the CNN structure to developed the classification model. In addition, three other feature extraction techniques have been introduced to enable the comparison of the proposed method which employs padding technique. The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients. Results also show that the proposed method was able to distinguish the Arabic alphabets that are difficult to pronounce. The proposed method with padding technique may be extended to address other voice pronunciation ability other than the Arabic alphabets.

Received 28/3/2021, Accepted 12/4/2021, Published 6/6/2021

How to Cite

Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network. Baghdad Sci.J [Internet]. 2021 Jun. 20 [cited 2025 Feb. 22];18(2(Suppl.):0925. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/6213

Issue

Vol. 18 No. 2(Suppl.) (2021): Supplement Issue 2

Section

article

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Download Citation

References

LeChun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998;86(11), 2278-2324.

Yuan A, Bai G, Jiao L, Liu Y. Offline handwritten English character recognition based on convolution neural network. Proceedings of the 10th IAPR International Workshop on Document Analysis Systems. 2012;125-129.

Ren H, El-Khamy, Lee J. CNF+CT: Context network fusion of cascade-trained convolution neural networks for image super-resolution. IEEE Transactions on Computational Imaging. 2019;6,447-462.

Li H, Shi L. Robust event-based object tracking combining correlation filter and CNN representation. Frontiers in Neurorobotics. 2019;13,82.

Mushtaq Z, Su SF. Environment sound classification using a regularized deep convolution neural network with data augmentation. Applied Acoustics. 2020;167,107389.

Mushtaq Z, Su SF, Tran Q. -V. Spectral images based environmental sound classification using CNN with meaningful data augmentation. Applied Acoustics. 2021;172,107581.

Tun PTZ. Audio feature extraction using mel frequency cepstral coefficients. International Journal of Creative and Innovative Research in All Studies. 2020;2(12),95-98.

Jin S, Wamg X, Du L, He D. Evaluation and modeling of automotive transmission whine noise quality based on MFCC and CNN. Applied Acoustics. 2021;172,107562.

Almanfaluti IK, Sugiono JP. Identifikasi pola suara pada bahasa Jawa meggunakan mel frequency cepstral coefficients (MFCC). Jurnal Media Informatika Budidarma, 2020;4(1),22-26. https://doi.org/10.30865/mib.v4i1.1793

Ranjan R, Thakur A. Analysis of feature extraction techniques for speech recognition system. International Journal of Innovative Technology and Exploring Engineering. 2019;8(7C2),197-200.

El-Alami F, El Mahdaouy A, El Alaoui SO, En-Nahnahi N. A deep autoencoder-based representation for Arabic text categorization. Journal of Information and Communication Technology, 2020;19(3),381–398.

Adhayani A, Tresnawati D. Pengembangan sistem multimedia pembelajaran Iqro’ menggunakan metode Luther. Jurnal Algoritma. 2015;12(1),264-270.

Anwar K. Pengenalan pengucapan huruf hijaiyah dengan mel-frequency cepstrum coefficients (MFCC) dan manhattan distance. [Masters thesis]:Universitas Islam Negeri Sultan Syarif Kasim, Indonesia. 2018.

Ramansyah W, Madura UT. Pengembangan multimedia pembelajaran interaktif dengan tema pengenalan huruf Arabic alphabet untuk peserta didik sekolah dasar. Jurnal Ilmiah Edutic. 2016;3(1),28-37.

Efendi R, Purwandari EP, Aziz MA. Aplikasi pengenalan huruf hujaiyah berbaris merker augmented reality pada platform android. Jurnal Pseudocode. 2015;2(2),124–134. https://doi.org/10.33369/pseudocode.2.2.124-134

Richardson A, Ari SB, Sinai M, Atsmon A, Conley ES, Gat Y, Segev G. Mobile applications for stroke: A survey and a speech classification approach. Proceedings of the 5th International Conference on Information and Communication Technologies for Ageing Well and e-Health. 2019;159–166.

Livezey JA, Bouchard KE, Chang EF. Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex. PLoS Computational Biology. 2019;15(9).

Tamulevičius G, Karbauskaitė R, Dzemyda G. Speech emotion classification using fractal dimension-based features. Nonlinear Analysis: Modelling and Control 2019;24(5),679–695.

Coates A, Lee H, Ng AY. An analysis of single layer networks in unsupervised feature learning. 2011

Boddapati V, Petef A, Rasmusson J, Lundberg L. Classifying environmental sounds using image recognition networks. Procedia Computer Science. 2017;112,2048–2056.

Mustaqeem M, Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8,79861-79875.

Huang J, Chen B, Yao B, He W. ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network. IEEE Access. 2019;7,92871-92880.

Gimenez M, Palanca J, Botti V. Semantic-based padding in convolution neural networks for improving the performance in natural language processing. A case study in sentiment analysis. Neurocomputing. 2020;378, 315-323.

Nada Q, Ridhuandi C, Santoso P, Apriyanto D. Speech recognition dengan Hidden Markov Model untuk pengenalan dan pelafalan huruf Arabic alphabet. Jurnal Al-Azhar Indonesia Seri Sains dan Teknologi. 2019;5(1),19-26.

Nugroho K, Noersasongko E, Purwanto, Muljono, Santoso, HA. Javanese gender speech recognition using deep learning and singular value decomposition. Proceedings of the International Seminar on Application for Technology of Information and Communication. 2019;251–254.

Borsky M, Mehta DD, Van Stan JH, Gudnason J. Modal and nonmodal voice quality classification using acoustic and electroglottographic features. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2017;25(12),2281-2291.

Wu H, Yan W, Li P, Wen Z. Deep texture exemplar extraction based on trimmed T-CNN. IEEE Transactions on Multimedia. 2020.

Hashemi M. Enlarging smaller images before inputting into convolutional neural network: Zero-padding vs. interpolation. Journal of Big Data 2019;6(1),98. https://doi.org/10.1186/s40537-019-0263-7

CS-IF

2.0

CiteScore

1.2

Impact Factor

Make a Submission

issn

P-ISSN: 2078-8665 | E-ISSN: 2411-7986

journalindexing

Journal Indexing
SCOPUS
Directory of Open Access Journals DOAJ
Library of Congress
Iraqi Academic Scientific Journal
Open Access Scholarly Publishers Association (OASPA)
SNIP (Source Normalized Impact Per Paper)

journalinfo

Journal Info
Journal: Baghdad Science Journal
Publisher: College of Science for Women/ University of Baghdad
Baghdad Sci. J. is peer-reviewed and open access
Print ISSN: 2078-8665
Electronic ISSN: 2411-7986
Publishing Frequency: Quarterly (from 2004 - 2021) Bi-monthly (from 2022) Monthly (from 2024)
Launched Date: 2004
Abbreviation: Baghdad Sci.J.
Each published paper in Baghdad Sci. J. has a digital object identifier (DOI) number

Language

scopus

1.3

2022CiteScore

50th percentile

ca

cope

sjr

locongress

clockss

Ithenticate

Sherpa Romeo

crossref

WHO

sci journal

uob digital repository

Scilit

cc

© 2022 The Author(s). Published by College of Science for Women, University of Baghdad. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article Sidebar

Main Article Content

Abstract

Article Details

How to Cite

References