Prioritized Text Detergent: Comparing Two Judgment Scales of Analytic Hierarchy Process on Prioritizing Pre-Processing Techniques on Social Media Sentiment Analysis

Main Article Content

Ummu Hani’ Hair Zaki
https://orcid.org/0000-0002-3747-8505
Roliana Ibrahim
https://orcid.org/0000-0001-7580-1804
Shahliza Abd Halim
https://orcid.org/0000-0002-5533-2171
Izyan Izzati Kamsani
https://orcid.org/0000-0001-7788-5407

Abstract

Most companies use social media data for business. Sentiment analysis automatically gathers analyses and summarizes this type of data. Managing unstructured social media data is difficult. Noisy data is a challenge to sentiment analysis. Since over 50% of the sentiment analysis process is data pre-processing, processing big social media data is challenging too. If pre-processing is carried out correctly, data accuracy may improve. Also, sentiment analysis workflow is highly dependent. Because no pre-processing technique works well in all situations or with all data sources, choosing the most important ones is crucial. Prioritization is an excellent technique for choosing the most important ones. As one of many Multi-Criteria Decision Making (MCDM) methods, the Analytic Hierarchy Process (AHP) is preferred for handling complicated decision-making challenges using several criteria. The Consistency Ratio (CR) scores were used to examine pair-wise comparisons to evaluate the AHP. This study used two judgment scales to get the most consistent judgment. Firstly, the Saaty judgment scale (SS), then the Generalized Balanced Scale (GBS). It investigated whether two different AHP judgment scales would affect decision-making. The main criteria for prioritizing pre-processing techniques in sentiment analysis are Punctuation, Spelling, Number, and Context. These four criteria also contain sub-criteria. GBS pair-wise comparisons are closer to the CR value than SS, reducing the alternatives’ weight ratios. This paper explains how AHP aids logical decision-making. Prioritizing pre-processing techniques with AHP can be a paradigm for other sentiment analysis stages. In short, this paper adds another contribution to the Big Data Analytics domain.

Article Details

How to Cite
1.
Prioritized Text Detergent: Comparing Two Judgment Scales of Analytic Hierarchy Process on Prioritizing Pre-Processing Techniques on Social Media Sentiment Analysis . Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2025 Jan. 20];21(2(SI):0662. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9750
Section
article

How to Cite

1.
Prioritized Text Detergent: Comparing Two Judgment Scales of Analytic Hierarchy Process on Prioritizing Pre-Processing Techniques on Social Media Sentiment Analysis . Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2025 Jan. 20];21(2(SI):0662. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9750

References

Mutasher WG, Aljuboori AF. New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools. Baghdad Sci J. 2022;19(4):887–98. https://doi.org/ 10.21123/bsj.2022.19.4.0887

Al-Bakri NF, Yonan JF, Sadiq AT, Abid AS. Tourism companies assessment via social media using sentiment analysis. Baghdad Sci J. 2022;19(2):422–9. https://doi.org/10.21123/BSJ.2022.19.2.0422

Singh NK, Tomar DS, Sangaiah AK. Sentiment analysis: a review and comparative analysis over social media. J Ambient Intell Humaniz Comput. 2020;11(1):97–117. https://doi.org/ 10.1007/s12652-018-0862-8

Shehab N, Badawy M, Arafat H. Big Data Analytics Concepts, Technologies Challenges, and Opportunities. In: Intelligent Transport Systems and Its Challenges. Springer Cham. 2019. https://doi.org/ 10.1007/978-3-030-31129-2_9

Triguero I, García-Gil D, Maillo J, Luengo J, García S, Herrera F. Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdiscip Rev Data Min Knowl Discov. 2019;9(2):1–24. https://doi.org/10.1002/widm.1289

Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing. 2017;239:39–57. https://doi.org/10.1016/j.neucom.2017.01.078

Murshed BAH, Abawajy J, Mallappa S, Saif MAN, Al-Ghuribi SM, Ghanem FA. Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling. IEEE Access. 2022;10(October):105328–51. https://doi.org/10.1109/ACCESS.2022.3211396

Yan X, Li Y, Fan W. Identifying domain relevant user generated content through noise reduction: a test in a Chinese stock discussion forum. Inf Discov Deliv. 2017;45(4):181–93. https://doi.org/10.1108/IDD-04-2017-0043

Woo HS, Kim JM, Lee WG. Validation of Text Data Preprocessing Using a Neural Network Model. Math Probl Eng. 2020;2020. https://doi.org/10.1155/2020/1958149

Ali K. Sentiment Analysis as a Service. RMIT University. 2019.

Ali K, Dong H, Bouguettaya A, Erradi A, Hadjidj R. Sentiment analysis as a service: a social media based sentiment analysis framework. In: 2017 IEEE International Conference on Web Services (ICWS). 2017. p. 660–7. https://doi.org/10.1109/ICWS.2017.79

Saggi MK, Jain S. A survey towards an integration of big data analytics to big insights for value-creation. Inf Process Manag. 2018;54(5):758–90. https://doi.org/10.1016/j.ipm.2018.01.010

Alaoui I El, Gahi Y, Messoussi R. Full consideration of big data characteristics in sentiment analysis context. In: 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2019. IEEE; 2019. p. 126–30. https://doi.org/10.1109/ICCCBDA.2019.8725728

Siriweera THAS, Paik I, Kumara BTGS. Constraint-Driven Dynamic Workflow for Automation of Big Data Analytics Based on GraphPlan. In: IEEE 24th International Conference on Web Services, ICWS 2017. 2017. p. 357–64. https://doi.org/10.1109/ICWS.2017.120

Melo PF, Dalip DH, Junior MM, Gonçalves MA, Benevenuto F. 10SENT: A stable sentiment analysis method based on the combination of off-the-shelf approaches. J Assoc Inf Sci Technol. 2019;70(3):242–55. https://doi.org/10.1002/asi.24117

Pradha S, Halgamuge MN, Tran Quoc Vinh N. Effective text data preprocessing technique for sentiment analysis in social media data. In: Proceedings of 2019 11th International Conference on Knowledge and Systems Engineering, KSE 2019. IEEE; 2019. https://doi.org/10.1109/KSE.2019.8919368

Hair Zaki UH, Ibrahim R, Abd Halim S, Kamsani II. Text Detergent: The Systematic Combination of Text Pre-processing Techniques for Social Media Sentiment Analysis. LNDECT. 2022. p. 50–61. https://doi.org/10.1007/978-3-030-98741-1_5

Hair Zaki UH, Ibrahim R, Abd Halim S. A Social Media Services Analysis. Int J Adv Trends Comput Sci Eng. 2019;8(1.6):69–75. https://doi.org/10.30534/ijatcse/2019/1181.62019

Naseem U, Razzak I, Eklund PW. A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed Tools Appl. 2020. https://doi.org/10.1007/s11042-020-10082-6

Baykasoğlu A, Gölcük İ. A dynamic multiple attribute decision making model with learning of fuzzy cognitive maps. Comput Ind Eng. 2019;135(April):1063–76. https://doi.org/10.1016/j.cie.2019.06.032

Yalcin AS, Kilic HS, Delen D. The use of multi-criteria decision-making methods in business analytics: A comprehensive literature review. Technol Forecast Soc Change. 2022;174(September 2021):121193. https://doi.org/10.1016/j.techfore.2021.121193

Tufail H, Qasim I, Masood MF, Tanvir S, Butt WH. Towards the selection of Optimum Requirements Prioritization Technique: A Comparative Analysis. In: International Conference on Information Management (ICIM). IEEE; 2019. p. 227–31. https://doi.org/10.1109/INFOMAN.2019.8714709

Sufian M, Khan Z, Rehman S, Haider Butt W. A systematic literature review: Software requirements prioritization techniques. Proc - 2018 Int Conf Front Inf Technol FIT 2018. 2019;35–40. https://doi.org/10.1109/FIT.2018.00014

Tüzemen A. Which YouTuber Should Be Followed? A Comparison Based Delphi-AHP-TOPSIS. Int J Contemp Econ Adm Sci. 2020;X(2). https://doi.org/10.5281/zenodo.4430009

Afify EA, Eldin AS, Khedr AE. Facebook Profile Credibility Detection using Machine and Deep Learning Techniques based on User’s Sentiment Response on Status Message. Int J Adv Comput Sci Appl. 2020;11(12):622–37. https://doi.org/10.14569/IJACSA.2020.0111273

Yenkar PP, Sawarkar SD. A novel ensemble approach based on MCC and MCDM methods for prioritizing tweets mentioning urban issues in smart city. Kybernetes. 2022. https://doi.org/10.1108/K-08-2021-0785

Al-Yazidi SA, Berri J, Hassan MM. Novel hybrid model for organizations’ reputation in online social networks. J King Saud Univ - Comput Inf Sci. 2022;34(8):5305–17. https://doi.org/10.1016/j.jksuci.2022.01.006

A. M, Gandhi GM. Framework for Social Media Analytics based on Multi-Criteria Decision Making (MCDM) model. Multimed Tools Appl. 2020. https://doi.org/10.1007/s11042-019-7470-2

Wu Z, Shen Y, Wang H. Assessing urban areas’ vulnerability to flood disaster based on text data: A case study in Zhengzhou City. Sustain. 2019;11(17). https://doi.org/10.3390/su11174548

Ye Y, Zhao Y, Shang J, Zhang L. A hybrid IT framework for identifying high-quality physicians using big data analytics. Int J Inf Manage. 2019;47(August 2018):65–75. https://doi.org/10.1016/j.ijinfomgt.2019.01.005

Saifullah S. Fuzzy-AHP approach using Normalized Decision Matrix on Tourism Trend Ranking based-on Social Media. J Inform. 2019;13(2):16. https://doi.org/10.26555/jifo.v13i2.a15268

Kaur R, Singh S, Kumar H. AuthCom: Authorship verification and compromised account detection in online social networks using AHP-TOPSIS embedded profiling based technique. Expert Syst Appl. 2018;113:397–414. https://doi.org/ 10.1016/j.eswa.2018.07.011

Saaty TL. Decision Making with the Analytic Hierarchy Process. Int J Serv Sci. 2008.https://doi.org/10.1504/IJSSCI.2008.017590

Adenle YA, Chan EHW, Sun Y, Chau CK. Modifiable campus-wide appraisal model (MOCAM) for sustainability in higher education institutions. Sustain. 2020;12(17). https://doi.org/10.3390/SU12176821

Sailunaz K, Alhajj R. Emotion and sentiment analysis from Twitter text. J Comput Sci. 2019;36:101003. https://doi.org/10.1016/j.jocs.2019.05.009

Lamirán-Palomares JM, Baviera T, Baviera-Puig A. Sports influencers on twitter. Analysis and comparative study of track cycling world cups 2016 and 2018. Soc Sci. 2020;9(10):1–23. https://doi.org/10.3390/socsci9100169

Zhou F, Lim MK, He Y, Pratap S. What attracts vehicle consumers’ buying: A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective.Ind Manag Data Syst. 2020;120(1):57–78. https://doi.org/ 10.1108/IMDS-01-2019-0034

Bueno I, Carrasco RA, Ureña R, Herrera-Viedma E. A business context aware decision-making approach for selecting the most appropriate sentiment analysis technique in e-marketing situations. Inf Sci (Ny). 2022;589:300–20. https://doi.org/10.1016/j.ins.2021.12.080

Klaus DG. Comparison of Judgment Scales of the Analytical Hierarchy Process - A New Approach. Int J Inf Technol Decis Mak. 2019;18(2):445–63. https://doi.org/10.1142/S0219622019500044

Goepel K. Implementation of an Online software tool for the Analytic Hierarchy Process (AHP-OS). Int J Anal Hierarchy Process. 2018;10(3):469–87. https://doi.org/10.13033/ijahp.v10i3.590

Zhang S, Kindlmann G. Diffusion tensor MRI visualization. In: Visualization Handbook. Elsevier Inc.; 2005. p. 327–40. https://doi.org/10.1016/B978-012387582-2/50018-6

Larson R, Falvo DC. Power method for approximation eigenvalues. In: Elementary Linear Algebra. Boston, New York: Houghton Mifflin Harcourt Publishing Company. 2009; p. 550–8. Chapter 10

Ford W. The Algebraic Eigenvalue Problem. In: Ford WBTNLA with A, editor. Numerical Linear Algebra with Applications using MATLAB. Boston: Academic Press; 2015. p. 379–438. https://doi.org/10.1016/B978-0-12-394435-1.00018-1

Badeel R, Subramaniam SK, Muhammed A, Hanapi ZM. A Multicriteria Decision-Making Framework for Access Point Selection in Hybrid LiFi/WiFi Networks Using Integrated AHP–VIKOR Technique. Sensors. 2023;23(3). https://doi.org/10.3390/s23031312

Alonso JA, Lamata MT. Consistency in the Analytic Hierarchy Process: A new approach. Int J Uncertainty, Fuzziness Knowledge-Based Syst. 2006;14(4):445–59. https://doi.org/10.1142/S0218488506004114