Fast Processing RNA-Seq on Multicore Processor

Main Article Content

Lee Jia Bin
Nor Asilah Wati Abdul Hamid
Zurita Ismail
Mohamed Faris Laham

Abstract

RNA Sequencing (RNA-Seq) is the sequencing and analysis of transcriptomes. The main purpose of RNA-Seq analysis is to find out the presence and quantity of RNA in an experimental sample under a specific condition. Essentially, RNA raw sequence data was massive. It can be as big as hundreds of Gigabytes (GB). This massive data always makes the processing time become longer and take several days. A multicore processor can speed up a program by separating the tasks and running the tasks’ errands concurrently. Hence, a multicore processor will be a suitable choice to overcome this problem. Therefore, this study aims to use an Intel multicore processor to improve the RNA-Seq speed and analyze RNA-Seq analysis's performance with a multiprocessor. This study only processed RNA-Seq from quality control analysis until sorted the BAM (Binary Alignment/Map) file content. Three different sizes of RNA paired end has been used to make the comparison. The final experiment results showed that the implementation of RNA-Seq on an Intel multicore processor could achieve a higher speedup. The total processing time of RNA-Seq with the largest size of RNA raw sequence data (66.3 Megabytes) decreased from 317.638 seconds to 211.916 seconds. The reduced processing time was 105 seconds and near to 2 minutes. Furthermore, for the smallest RNA raw sequence data size, the total processing time decreased from 212.380 seconds to 163.961 seconds which reduced 48 seconds.

Downloads

Download data is not yet available.

Article Details

How to Cite
1.
Bin LJ, Abdul Hamid NAW, Ismail Z, Laham MF. Fast Processing RNA-Seq on Multicore Processor. Baghdad Sci.J [Internet]. 2021Dec.20 [cited 2022Jan.20];18(4(Suppl.):1413. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/6642
Section
article

References

Van De Walle G. 9 Important Functions of Protein in Your Body. Healthline. Retrieved September 1, 2020, from https://www.healthline.com/nutrition/functions-of-protein.

Chen F, Li, Y, Qin, N, Wang, F, Du, J, Wang, C, Du, F, Jiang, T, Jiang, Y, Dai, J, Hu, Z, Lu, C, Shen, H. RNA-seq analysis identified hormone-related genes associated with prognosis of triple negative breast cancer. J Biomed Res. 2020. 34(2), 129–138.

Wrighton KH. The diagnostic power of RNA-seq. Nature Reviews Genetics. 2017. 18(7), 392-392.

Chatterjee K. and Wan Y. RNA. Encyclopedia Britannica. 2018, July 13. Retrived from https://www.britannica.com/science/RNA.

Firesmith D. Multicore Processing. Software Engineering Institute. 2017, August 21. Retrieved from https://insights.sei.cmu.edu/sei_blog/2017/08/multicore-processing.html.

Martínez, H., Barrachina, S., Castillo, M., Tárraga, J., Medina, I., Dopazo, J., Quintana-Ortí, E. S. Scalable RNA sequencing on clusters of multicore processors. 2015 IEEE Trustcom/BigDataSE/ISPA. IEEE. 2015. 3, 190-195.

Al-Ars, Z., Wang, S., & Mushtaq, H. SparkRA: enabling big data scalability for the GATK RNA-seq pipeline with apache spark. Genes. 2020. 11(1), 53.

Cascitti, J., Niebler, S., Müller, A., Schmidt, B. RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing. In International Conference on Computational Science. Springer, Cham. 2021. 367-381.

Tran, S. S., Zhou, Q., Xiao, X. Statistical inference of differential RNA-editing sites from RNA-sequencing data by hierarchical modeling. Bioinformatics, 2020. 36(9), 2796-2804.

Andrews, S. FastQC: a quality control tool for high throughput sequence data [WWW document]. URL http://www. bioinformatics. babraham. ac. uk/projects/fastqc. 2010.

Roehr JT, Dieterich C, Reinert K.Flexbar 3.0–SIMD and multicore parallelization. Bioinformatics. 2017. 33(18), 2941-2942.

Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature methods, 2015. 12(4), 357-360.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics, 2009. 25(16), 2078-2079.

Griffith M, Walker JR, Spies NC, Ainscough BJ, Griffith OL. Informatics for RNA-seq: A web resource for analysis on the cloud. PLoS Comp Biol. 2010. 11(8).