Unmasking Online Hostility: Analysing and Mitigating Hate Speech in Social Media

Jawaid Ahmed Siddiqui; Siti Sophiayati  Yuhaniz; Zulfiqar Ali  Memon

doi:10.21123/bsj.2024.10743

المؤلفون

Jawaid Ahmed Siddiqui كلية رزاك للتكنولوجيا والمعلوماتية، جامعة التكنولوجيا ماليزيا، كوالالمبور، ماليزيا. https://orcid.org/0009-0005-7882-4776
Siti Sophiayati Yuhaniz كلية رزاك للتكنولوجيا والمعلوماتية، جامعة التكنولوجيا ماليزيا، كوالالمبور، ماليزيا.
Zulfiqar Ali Memon المدرسة السريعة للحوسبة، الجامعة الوطنية للحاسوب والعلوم الناشئة، كراتشي، باكستان.

DOI:

https://doi.org/10.21123/bsj.2024.10743

الكلمات المفتاحية:

كشف خطاب الكراهية، التعلم الالي؛ معالجة اللغة الطبيعية، وسائل التواصل الاجتماعي، تصنيف النص

الملخص

تعمل منصات التواصل الاجتماعي على توليد كمية هائلة من البيانات في كل ثانية. تويتر، من الناحية العملية، ينتج الأفراد أكثر من ستمائة تغريدة في كل ثانية. أثناء نشر آراء المستخدمين وتعبيراتهم بحرية، من الصعب جدًا حصر خطاب الكراهية الذي يتم مشاركته ضد أي فرد أو دين أو أي مجموعة عرقية. وبالتالي، فإن الأشخاص المستهدفين بمثل هذا المحتوى الذي يحض على الكراهية يشعرون بالإحباط. وفي هذا الصدد، قامت الأساليب المختلفة بحل هذه المشكلة الخطيرة، ولكنها في بعض الأحيان لم تتمكن من تحقيق نتائج مرضية. ولذلك، نقترح نماذج مختلفة للتعلم الآلي لتصنيف البيانات المعطاة إلى فئتين، مسيئة أو غير مسيئة. تم إجراء التجارب على بيانات تويتر التي أنشأناها بأنفسنا باستخدام Twitter API ومكتبة Tweepy بواسطة Python. تم تقييم النتائج الناتجة بناءً على مقاييس مختلفة مثل الدقة والدقة والاستدعاء وقياس F1 واختبار MCNEMAR. بالمقارنة مع خوارزميات التعلم الآلي المختلفة، تفوق مصنف مجموعة الغابات العشوائية على الخوارزميات الأخرى، فإن حداثة ومساهمة ورقتنا البحثية هي: تطوير مجموعة بيانات تويتر التي تتكون من عدة تغريدات تحتوي على 11 متغير كائن مع أربعة متغيرات فئة مختلفة تظهر الهجوم المختلف المستويات، وتطبيق خوارزميات التعلم الآلي للكشف عن خطاب الكراهية، والتحليل المقارن لخوارزميات التعلم الآلي المختلفة مقابل مقاييس تقييم مختلفة بما في ذلك اختبار ماكنيمار. يتم شرح أهمية التقنية المقترحة جيدًا من خلال مجموعات بيانات Twitter التي تم إنشاؤها من خلال Twitter API ومكتبة Tweepy بواسطة Python.

Received 21/01/2024

Revised 13/07/2024

Accepted 15/07/2024

Published Online First 20/11/2024

المراجع

Mohammed A, Haider D, Widad K. Fake News Detection Model Basing on Machine Learning Algorithms. Baghdad Sci J. 2024; 21(2): 150-162. https://doi.org/10.21123/bsj.2024.8710

M. U. S. Khan, A. Abbas, A. Rehman, R. Nawaz. Hate classify: a service framework for hate speech identification on social media. IEEE Internet Comput. 2021; 25: 40-49. https://doi.org/10.1109/mic.2020.3037034.

Ayo F E, Folorunso O, Ibharalu F T, Osinuga I A. Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions. Comput. Sci. Rev..2020; 38: 100311. https://doi.org/10.1016/j.cosrev.2020.100311.

Khan S, Ullah F, Alquhayz H, Imran M, Mehmood A, Ahmad M, et al. HCovBi-Caps: Hate speech detection using convolutional and bi-directional gated recurrent unit with capsule network. IEEE Access. 2022; 10: 7881-94. https://doi.org/10.1109/ACCESS.2022.3143799.

Valentina I, Juhaida A, Nor Hazlyna H, Alaa F. A word cloud model based on hate speech in an online social media environment. Baghdad Sci J. 2021; 18(2 Suppl): 0937-page??. https://doi.org/10.21123/bsj.2021.18.2.0937.

Fortuna P, Soler-Company J, Wanner L. How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? Inf Process Manag. 2021; 58(3): 102524. https://doi.org/10.1016/j.ipm.2021.102524.

Awal MR, Lee RK, Tanwar E, Garg T, Chakraborty T. Model-agnostic meta-learning for multilingual hate speech detection. IEEE Trans Comput Soc Syst. 2023; 1-10. https://doi.org/10.1109/TCSS.2023.10100717.

Agarwal S, Chowdary CR. Combating hate speech using an adaptive ensemble learning model with a case study on COVID-19. Expert Syst Appl. 2021; 185: 115632.

Yin W, Zubiaga A. Towards generalisable hate speech detection: a review on obstacles and solutions. Peer J Comput Sci. 2021; 7: 598. https://doi.org/10.7717/peerj-cs.598.

Kim B, Wang Y, Lee J, Kim Y. Unfriending effects: Testing contrasting indirect-effects relationships between exposures to hate speech on political talk via social media unfriending. Comput Human Behav. 2022; 137: 107414. https://doi.org/10.1016/j.chb.2022.107414.

Kumar G, Singh JP, Singh AK. Autoencoder-based feature extraction for identifying hate speech spreaders in social media. IEEE Trans Comput Soc Syst. 2023; 10(2): 315-328. https://doi.org/10.14569/IJACSA.2023.0140542

Alatawi HS, Alhothali AM, Moria KM. Detecting white supremacist hate speech using domain specific word embedding with deep learning and BERT. IEEE Access. 2021; 9: 106363-106374. Available from: https://arxiv.org/abs/2010.00357.

Qureshi KA, Sabih M. Un-compromised credibility: Social media based multi class hate speech classification for text. IEEE Access. 2021; 9: 109465-109477. https://doi.org/10.1109/ACCESS.2021.3101977.

M-Harigy LM, Al-Nuaim HA, Moradpoor N, Tan Z. Building towards automated cyberbullying detection: A comparative analysis. Comput Intell Neurosci. 2020. https://doi.org/10.1155/2022/4794227.

Chhabra A, Vishwakarma DK. A literature survey on multimodal and multilingual automatic hate speech identification. Multimedia Syst. 2023; 29: 1203-1230. https://doi.org/10.1007/s00530-023-01051-8.

Wu XK, Zhao TF, Lu L, Chen WN. Predicting the hate: A GSTM model based on COVID-19 hate speech datasets. Inf Process Manag. 2022; 59: 102998.

Imran A, Yongming L, Witold P. Granular computing approach for the ordinal semantic weighted multiscale values for the attributes in formal concept analysis algorithm. J Intell Fuzzy Syst. 2023; 45: 1567–1586. https://doi.org/10.3233/JIFS-223764.

Ali I, Li Y, Pedrycz W. Granular computing approach to evaluate spatio-temporal events in intuitionistic fuzzy sets data through formal concept analysis. Axioms. 2023; 12(5): 407-423. https://doi.org/10.3390/axioms12050407

Sharmila P, Anbananthen KSM, Chelliah D, Parthasarathy S, Kannan S. PDHS: Pattern-based deep hate speech detection with improved tweet representation. IEEE Access. 2022; 10: 105366-105376. https://doi.org/10.1109/ACCESS.2022.3210177.

Ganfure GO. Comparative analysis of deep learning based Afaan Oromo hate speech detection. J Big Data. 2022; 9(76). https://doi.org/10.1186/s40537-022-00628-w.

Mullah NS, Zainon WMNW. Advances in machine learning algorithms for hate speech detection in social media: A review. IEEE Access. 2021; 9: 88364-88376. https://doi.org/10.1109/ACCESS.2021.3089515.

Khan S, Kamal A, Fazil M, Alshara MA, Sejwal VK, Alotaibi RM, Baig AR, Alqahtani S. Hcovbi-caps: Hate speech detection using convolutional and bi-directional gated recurrent unit with capsule network. IEEE Access. 2022; 10: 7881–7894. https://doi.org/10.1109/ACCESS.2022.3143799.

Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V. Resources and benchmark corpora for hate speech detection: A systematic review. Lang Resour Eval. 2021; 55: 477–523. https://doi.org/10.1007/s10579-020-09502-8.

Siddiqui JA, Yuhaniz SS, Memon ZA, Amin Y. Improving hate speech detection using machine and deep learning techniques: A preliminary study. Open Int J Infor. 2021; 9(1): 45-59.

Wang CC, Day MY, Wu CL. Political hate speech detection and lexicon building: A study in Taiwan. IEEE Access. 2022; 10: 44337-44346. https://doi.org/10.1016/j.chb.2022.107414.

Abro S, Alzahrani AJ, Mehmood A, Khalid H, Rashid F, Cheikhrouhou O, Salehi S, et al. Automatic hate speech detection using machine learning: A comparative study. Int J Adv Comput Sci Appl. 2020; 11(1): 123-131. https://doi.org/10.14569/IJACSA.2020.0110861

Khan MY, Qayoom A, Nizami MS, Siddiqui MS, Wasi S, Raazi SMKR. Automated prediction of good dictionary examples (GDEX): A comprehensive experiment with distant supervision, machine learning, and word embedding-based deep learning techniques. Complexity. 2021. https://doi.org/10.1155/2021/2553199.

Oriola O, Kotzé E. Evaluating machine learning techniques for detecting offensive and hate speech in South African tweets. IEEE Access. 2020; 8: 21496-21509. https://doi.org/10.1109/ACCESS.2020.3037073.

Bilal M, Khan A, Jan S, Musa S. Context-aware deep learning model for detection of Roman Urdu hate speech on social media platform. IEEE Access. 2022; 10: 121133-121151. https://doi.org/10.1109/ACCESS.2022.3216375.

Robinson D, Zhang Z, Tepper J. Hate speech detection on Twitter: Feature engineering vs. feature selection. In: The Semantic Web: ESWC 2018 Satellite Events. Cham: Springer; 2018. p. 46-49. https://doi.org/10.1007/978-3-319-98192-5_9.

William P, Gade R, Chaudhari RE, Pawar AB, Jawale MA. Machine learning based automatic hate speech recognition system. In: 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS). IEEE; 2022. p. 315-318. https://doi.org/10.1109/ICSCDS53736.2022.9760959

Agrawal T, Chakravarthy VD. Cyberbullying detection and hate speech identification using machine learning techniques. In: 2022 Second International Conference on Interdisciplinary Cyber Physical Systems (ICPS). IEEE; 2022. p. 182-187. https://doi.org/10.1109/ICPS55917.2022.00041.

Roy PK, Tripathy AK, Das TK, Gao XZ. A framework for hate speech detection using deep convolutional neural network. IEEE Access. 2020; 8: 204951–204962. https://doi.org/10.1109/ACCESS.2020.3037073.

Ayo FE, Folorunso O, Ibharalu FT, Osinuga IA, Alli AA. A probabilistic clustering model for hate speech classification in Twitter. Expert Syst Appl. 2021; 173: 114762. https://doi.org/10.1016/j.eswa.2021.114762.

Liu L, Xu D, Zhao P, Zeng DD, Hu PJH, Zhang Q, Luo Y, Cao Z. A cross-lingual transfer learning method for online COVID-19-related hate speech detection. Expert Syst Appl. 2023; 234: 121031. https://doi.org/10.1016/j.eswa.2023.121031.

Pérez JM, Luque FM, Zayat D, Kondratzky M, Moro A, Serrati PS, Zajac J, Miguel P, Debandi N, Gravano A, Cotik V. Assessing the impact of contextual information in hate speech detection. IEEE Access. 2023; 11: 30575-30590. https://doi.org/10.1109/ACCESS.2023.3258973.

Makhadmeh ZA, Tolba A. Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach. Computing. 2020; 102: 501–522. https://doi.org/10.1007/s00607-019-00745-0.

Dwivedy V, Roy PK. Deep feature fusion for hate speech detection: A transfer learning approach. Multimed Tools Appl. 2023; 82: 36279–36301. https://doi.org/10.1007/s11042-023-14850-y.

Şahinuç F, Yilmaz EH, Toraman C, Koç A. The effect of gender bias on hate speech detection. Sig Img Proc Lett. 2023; 17: 1591–1597. https://doi.org/10.1007/s11760-022-02368-z

Miok K, Škrlj B, Zaharie D, Šikonja MR. To BAN or not to BAN: Bayesian attention networks for reliable hate speech detection. Cogn Comput. 2022; 14: 353–371. https://doi.org/10.1007/s12559-021-09826-9.

Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V. Emotionally informed hate speech detection: A multitarget perspective. Cogn Comput. 2022; 14: 322–352. https://doi.org/10.1007/s12559-021-09862-5.

Stanković SV, Mladenović M. An approach to automatic classification of hate speech in sports domain on social media. J Big Data. 2023; 10(109): 1-15. https://doi.org/10.1186/s40537-023-00766-9.

Díaz JAG, Zafra SMJ, Cumbreras MAG, García RV. Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers. Complex Intell Syst. 2023; 9: 2893–2914. https://doi.org/10.1007/s40747-022-00693-x.

Ghosh S, Ekbal A, Bhattacharyya P, Saha T, Kumar A, Srivastava S. SEHC: A benchmark setup to identify online hate speech in English. IEEE Trans Comput Soc Syst. 2023; 10: 760-770. https://doi.org/10.1038/s41598-022-08438-z.

Min B, Xu H, Ma J, He X, Wang M, Guo H, Wang W, Zheng K, Jin D, Zhang C. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput Surv. 2023; 56(2): 1–40. https://doi.org/10.1145/3605943.

Joni S, Maximilian H, Shammur A, Soon-Gyo J. Developing an online hate classifier for multiple social media platforms. Hum Centric Comput Inf Sci. 2020; 10(1): 1-14. https://doi.org/10.1186/s13673-019-0205-6.

Pronoza E, Panicheva P, Koltsova O, Rosso P. Detecting ethnicity-targeted hate speech in Russian social media texts. Inf Process Manag. 2021; 58: 102674. https://doi.org/10.1016/J.IPM.2021.102674.

Subba B, Gupta P. A tfidf vectorizer and singular value decomposition based host intrusion detection system framework for detecting anomalous system processes. Comput Secur. 2021; 110: 102084. https://doi.org/10.1016/j.cose.2020.102084.

Smith MQR, Ruxton GD. Effective use of the McNemar test. Behav Ecol Sociobiol. 2020; 133. https://doi.org/10.1007/s00265-020-02916-y.

كشف العداء عبر الإنترنت: تحليل وتخفيف خطاب الكراهية في وسائل التواصل الاجتماعي

المؤلفون

DOI:

الكلمات المفتاحية:

الملخص

المراجع

التنزيلات

إصدار

القسم

الرخصة

كيفية الاقتباس

Journal Info
Journal: Baghdad Science Journal
Publisher: College of Science for Women/ University of Baghdad
Baghdad Sci. J. is peer-reviewed and open access
Print ISSN: 2078-8665
Electronic ISSN: 2411-7986
Publishing Frequency: Quarterly (from 2004 - 2021) Bi-monthly (from 2022) Monthly (from 2024)
Launched Date: 2004
Abbreviation: Baghdad Sci.J.
Each published paper in Baghdad Sci. J. has a digital object identifier (DOI) number