Main Article Content
Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the CNN structure to developed the classification model. In addition, three other feature extraction techniques have been introduced to enable the comparison of the proposed method which employs padding technique. The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients. Results also show that the proposed method was able to distinguish the Arabic alphabets that are difficult to pronounce. The proposed method with padding technique may be extended to address other voice pronunciation ability other than the Arabic alphabets.
This work is licensed under a Creative Commons Attribution 4.0 International License.
LeChun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998;86(11), 2278-2324.
Yuan A, Bai G, Jiao L, Liu Y. Offline handwritten English character recognition based on convolution neural network. Proceedings of the 10th IAPR International Workshop on Document Analysis Systems. 2012;125-129.
Ren H, El-Khamy, Lee J. CNF+CT: Context network fusion of cascade-trained convolution neural networks for image super-resolution. IEEE Transactions on Computational Imaging. 2019;6,447-462.
Li H, Shi L. Robust event-based object tracking combining correlation filter and CNN representation. Frontiers in Neurorobotics. 2019;13,82.
Mushtaq Z, Su SF. Environment sound classification using a regularized deep convolution neural network with data augmentation. Applied Acoustics. 2020;167,107389.
Mushtaq Z, Su SF, Tran Q. -V. Spectral images based environmental sound classification using CNN with meaningful data augmentation. Applied Acoustics. 2021;172,107581.
Tun PTZ. Audio feature extraction using mel frequency cepstral coefficients. International Journal of Creative and Innovative Research in All Studies. 2020;2(12),95-98.
Jin S, Wamg X, Du L, He D. Evaluation and modeling of automotive transmission whine noise quality based on MFCC and CNN. Applied Acoustics. 2021;172,107562.
Almanfaluti IK, Sugiono JP. Identifikasi pola suara pada bahasa Jawa meggunakan mel frequency cepstral coefficients (MFCC). Jurnal Media Informatika Budidarma, 2020;4(1),22-26. https://doi.org/10.30865/mib.v4i1.1793
Ranjan R, Thakur A. Analysis of feature extraction techniques for speech recognition system. International Journal of Innovative Technology and Exploring Engineering. 2019;8(7C2),197-200.
El-Alami F, El Mahdaouy A, El Alaoui SO, En-Nahnahi N. A deep autoencoder-based representation for Arabic text categorization. Journal of Information and Communication Technology, 2020;19(3),381–398.
Adhayani A, Tresnawati D. Pengembangan sistem multimedia pembelajaran Iqro’ menggunakan metode Luther. Jurnal Algoritma. 2015;12(1),264-270.
Anwar K. Pengenalan pengucapan huruf hijaiyah dengan mel-frequency cepstrum coefficients (MFCC) dan manhattan distance. [Masters thesis]:Universitas Islam Negeri Sultan Syarif Kasim, Indonesia. 2018.
Ramansyah W, Madura UT. Pengembangan multimedia pembelajaran interaktif dengan tema pengenalan huruf Arabic alphabet untuk peserta didik sekolah dasar. Jurnal Ilmiah Edutic. 2016;3(1),28-37.
Efendi R, Purwandari EP, Aziz MA. Aplikasi pengenalan huruf hujaiyah berbaris merker augmented reality pada platform android. Jurnal Pseudocode. 2015;2(2),124–134. https://doi.org/10.33369/pseudocode.2.2.124-134
Richardson A, Ari SB, Sinai M, Atsmon A, Conley ES, Gat Y, Segev G. Mobile applications for stroke: A survey and a speech classification approach. Proceedings of the 5th International Conference on Information and Communication Technologies for Ageing Well and e-Health. 2019;159–166.
Livezey JA, Bouchard KE, Chang EF. Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex. PLoS Computational Biology. 2019;15(9).
Tamulevičius G, Karbauskaitė R, Dzemyda G. Speech emotion classification using fractal dimension-based features. Nonlinear Analysis: Modelling and Control 2019;24(5),679–695.
Coates A, Lee H, Ng AY. An analysis of single layer networks in unsupervised feature learning. 2011
Boddapati V, Petef A, Rasmusson J, Lundberg L. Classifying environmental sounds using image recognition networks. Procedia Computer Science. 2017;112,2048–2056.
Mustaqeem M, Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8,79861-79875.
Huang J, Chen B, Yao B, He W. ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network. IEEE Access. 2019;7,92871-92880.
Gimenez M, Palanca J, Botti V. Semantic-based padding in convolution neural networks for improving the performance in natural language processing. A case study in sentiment analysis. Neurocomputing. 2020;378, 315-323.
Nada Q, Ridhuandi C, Santoso P, Apriyanto D. Speech recognition dengan Hidden Markov Model untuk pengenalan dan pelafalan huruf Arabic alphabet. Jurnal Al-Azhar Indonesia Seri Sains dan Teknologi. 2019;5(1),19-26.
Nugroho K, Noersasongko E, Purwanto, Muljono, Santoso, HA. Javanese gender speech recognition using deep learning and singular value decomposition. Proceedings of the International Seminar on Application for Technology of Information and Communication. 2019;251–254.
Borsky M, Mehta DD, Van Stan JH, Gudnason J. Modal and nonmodal voice quality classification using acoustic and electroglottographic features. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2017;25(12),2281-2291.
Wu H, Yan W, Li P, Wen Z. Deep texture exemplar extraction based on trimmed T-CNN. IEEE Transactions on Multimedia. 2020.
Hashemi M. Enlarging smaller images before inputting into convolutional neural network: Zero-padding vs. interpolation. Journal of Big Data 2019;6(1),98. https://doi.org/10.1186/s40537-019-0263-7