Prioritized Text Detergent: Comparing Two Judgment Scales of Analytic Hierarchy Process on Prioritizing Pre-Processing Techniques on Social Media Sentiment Analysis
Main Article Content
Abstract
Most companies use social media data for business. Sentiment analysis automatically gathers analyses and summarizes this type of data. Managing unstructured social media data is difficult. Noisy data is a challenge to sentiment analysis. Since over 50% of the sentiment analysis process is data pre-processing, processing big social media data is challenging too. If pre-processing is carried out correctly, data accuracy may improve. Also, sentiment analysis workflow is highly dependent. Because no pre-processing technique works well in all situations or with all data sources, choosing the most important ones is crucial. Prioritization is an excellent technique for choosing the most important ones. As one of many Multi-Criteria Decision Making (MCDM) methods, the Analytic Hierarchy Process (AHP) is preferred for handling complicated decision-making challenges using several criteria. The Consistency Ratio (CR) scores were used to examine pair-wise comparisons to evaluate the AHP. This study used two judgment scales to get the most consistent judgment. Firstly, the Saaty judgment scale (SS), then the Generalized Balanced Scale (GBS). It investigated whether two different AHP judgment scales would affect decision-making. The main criteria for prioritizing pre-processing techniques in sentiment analysis are Punctuation, Spelling, Number, and Context. These four criteria also contain sub-criteria. GBS pair-wise comparisons are closer to the CR value than SS, reducing the alternatives’ weight ratios. This paper explains how AHP aids logical decision-making. Prioritizing pre-processing techniques with AHP can be a paradigm for other sentiment analysis stages. In short, this paper adds another contribution to the Big Data Analytics domain.
Received 29/09/2023
Revised 10/02/2024
Accepted 12/02/2024
Published 25/02/2024
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Mutasher WG, Aljuboori AF. New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools. Baghdad Sci J. 2022;19(4):887–98. https://doi.org/ 10.21123/bsj.2022.19.4.0887
Al-Bakri NF, Yonan JF, Sadiq AT, Abid AS. Tourism companies assessment via social media using sentiment analysis. Baghdad Sci J. 2022;19(2):422–9. https://doi.org/10.21123/BSJ.2022.19.2.0422
Singh NK, Tomar DS, Sangaiah AK. Sentiment analysis: a review and comparative analysis over social media. J Ambient Intell Humaniz Comput. 2020;11(1):97–117. https://doi.org/ 10.1007/s12652-018-0862-8
Shehab N, Badawy M, Arafat H. Big Data Analytics Concepts, Technologies Challenges, and Opportunities. In: Intelligent Transport Systems and Its Challenges. Springer Cham. 2019. https://doi.org/ 10.1007/978-3-030-31129-2_9
Triguero I, García-Gil D, Maillo J, Luengo J, García S, Herrera F. Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdiscip Rev Data Min Knowl Discov. 2019;9(2):1–24. https://doi.org/10.1002/widm.1289
Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing. 2017;239:39–57. https://doi.org/10.1016/j.neucom.2017.01.078
Murshed BAH, Abawajy J, Mallappa S, Saif MAN, Al-Ghuribi SM, Ghanem FA. Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling. IEEE Access. 2022;10(October):105328–51. https://doi.org/10.1109/ACCESS.2022.3211396
Yan X, Li Y, Fan W. Identifying domain relevant user generated content through noise reduction: a test in a Chinese stock discussion forum. Inf Discov Deliv. 2017;45(4):181–93. https://doi.org/10.1108/IDD-04-2017-0043
Woo HS, Kim JM, Lee WG. Validation of Text Data Preprocessing Using a Neural Network Model. Math Probl Eng. 2020;2020. https://doi.org/10.1155/2020/1958149
Ali K. Sentiment Analysis as a Service. RMIT University. 2019.
Ali K, Dong H, Bouguettaya A, Erradi A, Hadjidj R. Sentiment analysis as a service: a social media based sentiment analysis framework. In: 2017 IEEE International Conference on Web Services (ICWS). 2017. p. 660–7. https://doi.org/10.1109/ICWS.2017.79
Saggi MK, Jain S. A survey towards an integration of big data analytics to big insights for value-creation. Inf Process Manag. 2018;54(5):758–90. https://doi.org/10.1016/j.ipm.2018.01.010
Alaoui I El, Gahi Y, Messoussi R. Full consideration of big data characteristics in sentiment analysis context. In: 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2019. IEEE; 2019. p. 126–30. https://doi.org/10.1109/ICCCBDA.2019.8725728
Siriweera THAS, Paik I, Kumara BTGS. Constraint-Driven Dynamic Workflow for Automation of Big Data Analytics Based on GraphPlan. In: IEEE 24th International Conference on Web Services, ICWS 2017. 2017. p. 357–64. https://doi.org/10.1109/ICWS.2017.120
Melo PF, Dalip DH, Junior MM, Gonçalves MA, Benevenuto F. 10SENT: A stable sentiment analysis method based on the combination of off-the-shelf approaches. J Assoc Inf Sci Technol. 2019;70(3):242–55. https://doi.org/10.1002/asi.24117
Pradha S, Halgamuge MN, Tran Quoc Vinh N. Effective text data preprocessing technique for sentiment analysis in social media data. In: Proceedings of 2019 11th International Conference on Knowledge and Systems Engineering, KSE 2019. IEEE; 2019. https://doi.org/10.1109/KSE.2019.8919368
Hair Zaki UH, Ibrahim R, Abd Halim S, Kamsani II. Text Detergent: The Systematic Combination of Text Pre-processing Techniques for Social Media Sentiment Analysis. LNDECT. 2022. p. 50–61. https://doi.org/10.1007/978-3-030-98741-1_5
Hair Zaki UH, Ibrahim R, Abd Halim S. A Social Media Services Analysis. Int J Adv Trends Comput Sci Eng. 2019;8(1.6):69–75. https://doi.org/10.30534/ijatcse/2019/1181.62019
Naseem U, Razzak I, Eklund PW. A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed Tools Appl. 2020. https://doi.org/10.1007/s11042-020-10082-6
Baykasoğlu A, Gölcük İ. A dynamic multiple attribute decision making model with learning of fuzzy cognitive maps. Comput Ind Eng. 2019;135(April):1063–76. https://doi.org/10.1016/j.cie.2019.06.032
Yalcin AS, Kilic HS, Delen D. The use of multi-criteria decision-making methods in business analytics: A comprehensive literature review. Technol Forecast Soc Change. 2022;174(September 2021):121193. https://doi.org/10.1016/j.techfore.2021.121193
Tufail H, Qasim I, Masood MF, Tanvir S, Butt WH. Towards the selection of Optimum Requirements Prioritization Technique: A Comparative Analysis. In: International Conference on Information Management (ICIM). IEEE; 2019. p. 227–31. https://doi.org/10.1109/INFOMAN.2019.8714709
Sufian M, Khan Z, Rehman S, Haider Butt W. A systematic literature review: Software requirements prioritization techniques. Proc - 2018 Int Conf Front Inf Technol FIT 2018. 2019;35–40. https://doi.org/10.1109/FIT.2018.00014
Tüzemen A. Which YouTuber Should Be Followed? A Comparison Based Delphi-AHP-TOPSIS. Int J Contemp Econ Adm Sci. 2020;X(2). https://doi.org/10.5281/zenodo.4430009
Afify EA, Eldin AS, Khedr AE. Facebook Profile Credibility Detection using Machine and Deep Learning Techniques based on User’s Sentiment Response on Status Message. Int J Adv Comput Sci Appl. 2020;11(12):622–37. https://doi.org/10.14569/IJACSA.2020.0111273
Yenkar PP, Sawarkar SD. A novel ensemble approach based on MCC and MCDM methods for prioritizing tweets mentioning urban issues in smart city. Kybernetes. 2022. https://doi.org/10.1108/K-08-2021-0785
Al-Yazidi SA, Berri J, Hassan MM. Novel hybrid model for organizations’ reputation in online social networks. J King Saud Univ - Comput Inf Sci. 2022;34(8):5305–17. https://doi.org/10.1016/j.jksuci.2022.01.006
A. M, Gandhi GM. Framework for Social Media Analytics based on Multi-Criteria Decision Making (MCDM) model. Multimed Tools Appl. 2020. https://doi.org/10.1007/s11042-019-7470-2
Wu Z, Shen Y, Wang H. Assessing urban areas’ vulnerability to flood disaster based on text data: A case study in Zhengzhou City. Sustain. 2019;11(17). https://doi.org/10.3390/su11174548
Ye Y, Zhao Y, Shang J, Zhang L. A hybrid IT framework for identifying high-quality physicians using big data analytics. Int J Inf Manage. 2019;47(August 2018):65–75. https://doi.org/10.1016/j.ijinfomgt.2019.01.005
Saifullah S. Fuzzy-AHP approach using Normalized Decision Matrix on Tourism Trend Ranking based-on Social Media. J Inform. 2019;13(2):16. https://doi.org/10.26555/jifo.v13i2.a15268
Kaur R, Singh S, Kumar H. AuthCom: Authorship verification and compromised account detection in online social networks using AHP-TOPSIS embedded profiling based technique. Expert Syst Appl. 2018;113:397–414. https://doi.org/ 10.1016/j.eswa.2018.07.011
Saaty TL. Decision Making with the Analytic Hierarchy Process. Int J Serv Sci. 2008.https://doi.org/10.1504/IJSSCI.2008.017590
Adenle YA, Chan EHW, Sun Y, Chau CK. Modifiable campus-wide appraisal model (MOCAM) for sustainability in higher education institutions. Sustain. 2020;12(17). https://doi.org/10.3390/SU12176821
Sailunaz K, Alhajj R. Emotion and sentiment analysis from Twitter text. J Comput Sci. 2019;36:101003. https://doi.org/10.1016/j.jocs.2019.05.009
Lamirán-Palomares JM, Baviera T, Baviera-Puig A. Sports influencers on twitter. Analysis and comparative study of track cycling world cups 2016 and 2018. Soc Sci. 2020;9(10):1–23. https://doi.org/10.3390/socsci9100169
Zhou F, Lim MK, He Y, Pratap S. What attracts vehicle consumers’ buying: A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective.Ind Manag Data Syst. 2020;120(1):57–78. https://doi.org/ 10.1108/IMDS-01-2019-0034
Bueno I, Carrasco RA, Ureña R, Herrera-Viedma E. A business context aware decision-making approach for selecting the most appropriate sentiment analysis technique in e-marketing situations. Inf Sci (Ny). 2022;589:300–20. https://doi.org/10.1016/j.ins.2021.12.080
Klaus DG. Comparison of Judgment Scales of the Analytical Hierarchy Process - A New Approach. Int J Inf Technol Decis Mak. 2019;18(2):445–63. https://doi.org/10.1142/S0219622019500044
Goepel K. Implementation of an Online software tool for the Analytic Hierarchy Process (AHP-OS). Int J Anal Hierarchy Process. 2018;10(3):469–87. https://doi.org/10.13033/ijahp.v10i3.590
Zhang S, Kindlmann G. Diffusion tensor MRI visualization. In: Visualization Handbook. Elsevier Inc.; 2005. p. 327–40. https://doi.org/10.1016/B978-012387582-2/50018-6
Larson R, Falvo DC. Power method for approximation eigenvalues. In: Elementary Linear Algebra. Boston, New York: Houghton Mifflin Harcourt Publishing Company. 2009; p. 550–8. Chapter 10
Ford W. The Algebraic Eigenvalue Problem. In: Ford WBTNLA with A, editor. Numerical Linear Algebra with Applications using MATLAB. Boston: Academic Press; 2015. p. 379–438. https://doi.org/10.1016/B978-0-12-394435-1.00018-1
Badeel R, Subramaniam SK, Muhammed A, Hanapi ZM. A Multicriteria Decision-Making Framework for Access Point Selection in Hybrid LiFi/WiFi Networks Using Integrated AHP–VIKOR Technique. Sensors. 2023;23(3). https://doi.org/10.3390/s23031312
Alonso JA, Lamata MT. Consistency in the Analytic Hierarchy Process: A new approach. Int J Uncertainty, Fuzziness Knowledge-Based Syst. 2006;14(4):445–59. https://doi.org/10.1142/S0218488506004114