A Column Encryption-Based Privacy-Preserving Framework for Hadoop Big Data Sets
Main Article Content
Abstract
The exponential growth of the Internet, the Internet of Things, and Cloud Computing in recent times has led to a significant rise of data across various sectors of business and industry. Big data has become a growing trend in recent years, attracting the attention of academics, corporate leaders, and government officials worldwide. Hadoop is a commonly adopted framework for processing big data. This data expansion has the potential to provide substantial and beneficial advantages, and some early success has been achieved from a technical standpoint in dealing with such a large quantity of data. Along with its many benefits, it also has a slew of disadvantages. These include, but are not limited to, data storage, exchange, curation, transit, analysis, visualization, security and privacy. In this research, the privacy implications of Big Data analytics are being investigated. Several publications suggest methods to secure big data. Each technique has advantages and disadvantages. Regardless of privacy laws, application developers must protect sensitive data. Therefore, there is need for innovative methods to guarantee the protection of individuals' privacy in the context of big data. This paper presents a framework for preserving privacy in data-at-rest within the Hadoop architecture. The framework employs columnar data storage, data masking, and encryption techniques to address these challenges efficiently.
Received 28/12/2023
Revised 03/05/2024
Accepted 05/05/2024
Published 25/05/2024
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Reinsel J, Rydning DR, Gantz J. Gantz JF, Reinsel D, Rydning J. The us datasphere: Consumers flocking to cloud. White Paper. International Data Corporation (IDC) 2019 Jan.
Anna K, Nikolay K. Survey on Big Data Analytics in Public Sector of Russian Federation. Procedia Comput Sci. 2015; 55: 905–11. https://doi.org/ 10.1016/j.procs.2015.07.144
Apache Software Foundation. Hadoop. 2020. hadoop.apache.org
Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Ullah Khan S. The rise of “big data” on cloud computing: Review and open research issues. Inf Syst. 2015 Jan; 47: 98–115. https://doi.org/10.1016/j.is.2014.07.006
Mutasher WG, Aljuboori AF. New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools. Baghdad Sci J. 2022 ;19(4): 887-898. https://doi.org/10.21123/bsj.2022.19.4.0887
Jain P, Gyanchandani M, Khare N. Big Data Security and Privacy: New Proposed Model of Big Data with Secured MR Layer. Advanced Computing and Systems for Security. Singapore: Springer Singapore. 2019; 31–53. https://doi.org/10.1007/978-981-13-3702-4_3
Mayyahi MA, Seno SA. A Security and Privacy Aware Computing Approach on Data Sharing in Cloud Environment. Baghdad Sci J. 2022; 19(6(Suppl.): 1572. https://doi.org/10.21123/bsj.2022.7077
Merceedi KJ, Sabry NA. A Comprehensive Survey for Hadoop Distributed File System. Asian J Res Comput Sci. 2021; 46–57. https://doi.org/10.9734/ajrcos/2021/v11i230260
Elkawkagy M, Elbeh H. High Performance Hadoop Distributed File System: Int J Networked Distrib Comput. 2020; 8(3): 119-123. https://doi.org/10.2991/ijndc.k.200515.007
Tabrizchi H, Kuchaki Rafsanjani M. A survey on security challenges in cloud computing: issues, threats, and solutions. J Supercomput. 2020; 76(12): 9493–532. https://doi.org/10.1007/s11227-020-03213-1
Leicher A, Kuntze N, Schmidt AU. Implementation of a Trusted Ticket System. In: Gritzalis D, Lopez J, editors. Emerging Challenges for Security, Privacy and Trust. Berlin, Heidelberg: Springer Berlin Heidelberg. 2009; 152–63. https://doi.org/10.1007/978-3-642-01244-0_14
Khalil I, Dou Z, Khreishah A. TPM-Based Authentication Mechanism for Apache Hadoop. International Conference on Security and Privacy in Communication Networks. 2015; 105–122. https://doi.org/10.1007/978-3-319-23829-6_8
Shahin D, Ennab H, Saeed R, Alwidian J. Big Data Platform Privacy and Security, A Review. Int J Comp Sci Netw Secur. 2019; 19(5): 24-34.
Filaly Y, Mendili FE, Berros N, Idrissi YEBE. Hybrid Encryption Algorithm for Information Security in Hadoop. Int J Adv Comput Sci Appl . 2023; 14(6): 1295-302. https://dx.doi.org/10.14569/IJACSA.2023.01406137
Guan S, Zhang C, Wang Y, Liu W. Hadoop-based secure storage solution for big data in cloud computing environment. Digit Commun Netw. 2024; 10(1): 227–36. https://doi.org/10.1016/j.dcan.2023.01.014
Chen Y, Hao Y, Yi Z, Wu K, Zhao Q, Wang X. Searchable Encryption System for Big Data Storage. Commun Comput Inf Sci. 2021; 1452: 139–15. Springer, Singapore. https://doi.org/10.1007/978-981-16-5943-0_12
Anand K. Sentry to Ranger - A Concise Guide. Cloudera Blog. 2021.
Strata. Cloudera introduces RecordService for security, Kudu for streaming data analysis. ZDNET. 2015.
Cloudera. Apache Ranger. 2022.
Cloudera. Apache Knox Gateway Overview. 2022.
GoCypher. Eleven-Z/rhino. GitHub. 2020.
Baig HA. A Protection Layer over MapReduce Framework for Big Data Privacy. Int J Comput Inf Technol. 2022 Apr; 11(2): 68-73. https://doi.org/10.24203/ijcit.v11i2.263.
Baig H A, Sharma Y K, Ali S Z. Privacy-Preserving in Big Data Analytics: State of the Art (September 12, 2020). Int. Conf. on Business Management, Innovation & Sustainability (ICBMIS) 2020. http://dx.doi.org/10.2139/ssrn.3713826
Apache Software Foundation. ORC Specification v1. 2021.
Baig HA, Jummani DF, Ali SZ. A Framework for Preserving the Privacy of Data in Hadoop Clusters using Column Encryption. Int. J. Adv. Res. Eng. Technol. 2021; 8: 17894-902.