•  
  •  
 

Abstract

The elucidation of DNA sequencing provides great importance in increasing the comprehension of organisms' genomic functions. However, the investigation of concealed structural information preserved within Deoxyribonucleic Acid (DNA) sequencing represents an outstanding challenge. Recently, machine and deep learning have become the techniques of preference for various tasks of genomics modeling, involving the prediction of genetic variation influence on the mechanisms of gene regulation like DNA receptivity and splicing. Therefore, this paper presented an information system for elucidating DNA sequencing data using diverse machine and deep learning techniques. In this system, two encoding methods are utilized for modifying the DNA sequencing to be appropriate as training input, then Random Forest (RF) and Support Vector Machine (SVM) are implemented as machine learning techniques, and Bidirectional-Long ShortTerm Memory (Bi-LSTM) is implemented as a deep learning technique for classifying the encoded DNA sequencing. Among these techniques, the proposed Bi-LSTM represents the most significant one since it can efficiently constitute the complex temporal dependencies contained in DNA sequences and enhance the classification performance.The utilized classification techniques are trained on the Human DNA sequence dataset (DNA-SD) and assessed using various classification metrics. The experimental results exhibit that the proposed Bi-LSTM technique exceeds the other implemented techniques and provides higher precision, sensitivity, accuracy, and F1Score with 98%, 95%, 96%, and 96%, respectively, for Human DNA-SD testing data. Additionally, the classification techniques were tested using the Dog and Chimpanzee DNA-SD datasets, and the obtained results indicated that the proposed technique can afford the best overall performance.

Keywords

Bidirectional-Long Short Term Memory (Bi-LSTM), DNA sequencing, Information system, Machine and deep learning, Random Forest (RF), Support Vector Machine (SVM)

Subject Area

Computer Science

Article Type

Article

First Page

4241

Last Page

4255

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

 
COinS