Unmasking Online Hostility: Analysing and Mitigating Hate Speech in Social Media

Jawaid Ahmed Siddiqui; Siti Sophiayati  Yuhaniz; Zulfiqar Ali  Memon

doi:10.21123/bsj.2024.10743

Authors

Jawaid Ahmed Siddiqui Razak Faculty of Technology and Informatics, University Technology Malaysia, Kuala Lumpur, Malaysia. https://orcid.org/0009-0005-7882-4776
Siti Sophiayati Yuhaniz Razak Faculty of Technology and Informatics, University Technology Malaysia, Kuala Lumpur, Malaysia.
Zulfiqar Ali Memon Fast School of Computing, National University of Computer and Emerging Sciences, Karachi, Pakistan.

DOI:

https://doi.org/10.21123/bsj.2024.10743

Keywords:

Hate speech detection, Machine Learning, Natural Language Processing, Social media, Text Classification

Abstract

The social media platforms have been generating an enormous amount of data for every second. Twitter, in practice by the individuals is producing more than six hundred tweets in each second. While freely posting opinions and expressions by users, it is very difficult to confine the hate speech shared against any individual, religion or any ethnic group. Consequently, the persons targeted by such hateful content get frustrated. In this regard the different approaches have been solving this serious problem but, sometimes unable to achieve satisfactory results. Therefore, we propose different Machine Learning models to classify given data in two categories, offensive or non-offensive. The experiments were conducted on Twitter data generated by ourselves using Twitter API and Tweepy library by Python. The generated results were evaluated based upon various metrics such as accuracy, precision, recall, F1-measure and MCNEMAR test. Compared to the different machine learning algorithms, random forest ensemble classifier outperformed against other algorithms, the novelty and contribution of our research paper is: The development of Twitter dataset that consists of several tweets containing 11 object variables with four different class variables showing the different offensive levels, Machine Learning algorithms’ application to detect the hate speech, Comparative analysis of different Machine Learning algorithms against different evaluating metrics including McNemar Test. The significance of proposed technique is well explained by the Twitter datasets generated through Twitter API and Tweepy library by Python.

Received 21/01/2024

Revised 13/07/2024

Accepted 15/07/2024

Published Online First 20/11/2024

References

Mohammed A, Haider D, Widad K. Fake News Detection Model Basing on Machine Learning Algorithms. Baghdad Sci J. 2024; 21(2): 150-162. https://doi.org/10.21123/bsj.2024.8710

M. U. S. Khan, A. Abbas, A. Rehman, R. Nawaz. Hate classify: a service framework for hate speech identification on social media. IEEE Internet Comput. 2021; 25: 40-49. https://doi.org/10.1109/mic.2020.3037034.

Ayo F E, Folorunso O, Ibharalu F T, Osinuga I A. Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions. Comput. Sci. Rev..2020; 38: 100311. https://doi.org/10.1016/j.cosrev.2020.100311.

Khan S, Ullah F, Alquhayz H, Imran M, Mehmood A, Ahmad M, et al. HCovBi-Caps: Hate speech detection using convolutional and bi-directional gated recurrent unit with capsule network. IEEE Access. 2022; 10: 7881-94. https://doi.org/10.1109/ACCESS.2022.3143799.

Valentina I, Juhaida A, Nor Hazlyna H, Alaa F. A word cloud model based on hate speech in an online social media environment. Baghdad Sci J. 2021; 18(2 Suppl): 0937-page??. https://doi.org/10.21123/bsj.2021.18.2.0937.

Fortuna P, Soler-Company J, Wanner L. How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? Inf Process Manag. 2021; 58(3): 102524. https://doi.org/10.1016/j.ipm.2021.102524.

Awal MR, Lee RK, Tanwar E, Garg T, Chakraborty T. Model-agnostic meta-learning for multilingual hate speech detection. IEEE Trans Comput Soc Syst. 2023; 1-10. https://doi.org/10.1109/TCSS.2023.10100717.

Agarwal S, Chowdary CR. Combating hate speech using an adaptive ensemble learning model with a case study on COVID-19. Expert Syst Appl. 2021; 185: 115632.

Yin W, Zubiaga A. Towards generalisable hate speech detection: a review on obstacles and solutions. Peer J Comput Sci. 2021; 7: 598. https://doi.org/10.7717/peerj-cs.598.

Kim B, Wang Y, Lee J, Kim Y. Unfriending effects: Testing contrasting indirect-effects relationships between exposures to hate speech on political talk via social media unfriending. Comput Human Behav. 2022; 137: 107414. https://doi.org/10.1016/j.chb.2022.107414.

Kumar G, Singh JP, Singh AK. Autoencoder-based feature extraction for identifying hate speech spreaders in social media. IEEE Trans Comput Soc Syst. 2023; 10(2): 315-328. https://doi.org/10.14569/IJACSA.2023.0140542

Alatawi HS, Alhothali AM, Moria KM. Detecting white supremacist hate speech using domain specific word embedding with deep learning and BERT. IEEE Access. 2021; 9: 106363-106374. Available from: https://arxiv.org/abs/2010.00357.

Qureshi KA, Sabih M. Un-compromised credibility: Social media based multi class hate speech classification for text. IEEE Access. 2021; 9: 109465-109477. https://doi.org/10.1109/ACCESS.2021.3101977.

M-Harigy LM, Al-Nuaim HA, Moradpoor N, Tan Z. Building towards automated cyberbullying detection: A comparative analysis. Comput Intell Neurosci. 2020. https://doi.org/10.1155/2022/4794227.

Chhabra A, Vishwakarma DK. A literature survey on multimodal and multilingual automatic hate speech identification. Multimedia Syst. 2023; 29: 1203-1230. https://doi.org/10.1007/s00530-023-01051-8.

Wu XK, Zhao TF, Lu L, Chen WN. Predicting the hate: A GSTM model based on COVID-19 hate speech datasets. Inf Process Manag. 2022; 59: 102998.

Imran A, Yongming L, Witold P. Granular computing approach for the ordinal semantic weighted multiscale values for the attributes in formal concept analysis algorithm. J Intell Fuzzy Syst. 2023; 45: 1567–1586. https://doi.org/10.3233/JIFS-223764.

Ali I, Li Y, Pedrycz W. Granular computing approach to evaluate spatio-temporal events in intuitionistic fuzzy sets data through formal concept analysis. Axioms. 2023; 12(5): 407-423. https://doi.org/10.3390/axioms12050407

Sharmila P, Anbananthen KSM, Chelliah D, Parthasarathy S, Kannan S. PDHS: Pattern-based deep hate speech detection with improved tweet representation. IEEE Access. 2022; 10: 105366-105376. https://doi.org/10.1109/ACCESS.2022.3210177.

Ganfure GO. Comparative analysis of deep learning based Afaan Oromo hate speech detection. J Big Data. 2022; 9(76). https://doi.org/10.1186/s40537-022-00628-w.

Mullah NS, Zainon WMNW. Advances in machine learning algorithms for hate speech detection in social media: A review. IEEE Access. 2021; 9: 88364-88376. https://doi.org/10.1109/ACCESS.2021.3089515.

Khan S, Kamal A, Fazil M, Alshara MA, Sejwal VK, Alotaibi RM, Baig AR, Alqahtani S. Hcovbi-caps: Hate speech detection using convolutional and bi-directional gated recurrent unit with capsule network. IEEE Access. 2022; 10: 7881–7894. https://doi.org/10.1109/ACCESS.2022.3143799.

Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V. Resources and benchmark corpora for hate speech detection: A systematic review. Lang Resour Eval. 2021; 55: 477–523. https://doi.org/10.1007/s10579-020-09502-8.

Siddiqui JA, Yuhaniz SS, Memon ZA, Amin Y. Improving hate speech detection using machine and deep learning techniques: A preliminary study. Open Int J Infor. 2021; 9(1): 45-59.

Wang CC, Day MY, Wu CL. Political hate speech detection and lexicon building: A study in Taiwan. IEEE Access. 2022; 10: 44337-44346. https://doi.org/10.1016/j.chb.2022.107414.

Abro S, Alzahrani AJ, Mehmood A, Khalid H, Rashid F, Cheikhrouhou O, Salehi S, et al. Automatic hate speech detection using machine learning: A comparative study. Int J Adv Comput Sci Appl. 2020; 11(1): 123-131. https://doi.org/10.14569/IJACSA.2020.0110861

Khan MY, Qayoom A, Nizami MS, Siddiqui MS, Wasi S, Raazi SMKR. Automated prediction of good dictionary examples (GDEX): A comprehensive experiment with distant supervision, machine learning, and word embedding-based deep learning techniques. Complexity. 2021. https://doi.org/10.1155/2021/2553199.

Oriola O, Kotzé E. Evaluating machine learning techniques for detecting offensive and hate speech in South African tweets. IEEE Access. 2020; 8: 21496-21509. https://doi.org/10.1109/ACCESS.2020.3037073.

Bilal M, Khan A, Jan S, Musa S. Context-aware deep learning model for detection of Roman Urdu hate speech on social media platform. IEEE Access. 2022; 10: 121133-121151. https://doi.org/10.1109/ACCESS.2022.3216375.

Robinson D, Zhang Z, Tepper J. Hate speech detection on Twitter: Feature engineering vs. feature selection. In: The Semantic Web: ESWC 2018 Satellite Events. Cham: Springer; 2018. p. 46-49. https://doi.org/10.1007/978-3-319-98192-5_9.

William P, Gade R, Chaudhari RE, Pawar AB, Jawale MA. Machine learning based automatic hate speech recognition system. In: 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS). IEEE; 2022. p. 315-318. https://doi.org/10.1109/ICSCDS53736.2022.9760959

Agrawal T, Chakravarthy VD. Cyberbullying detection and hate speech identification using machine learning techniques. In: 2022 Second International Conference on Interdisciplinary Cyber Physical Systems (ICPS). IEEE; 2022. p. 182-187. https://doi.org/10.1109/ICPS55917.2022.00041.

Roy PK, Tripathy AK, Das TK, Gao XZ. A framework for hate speech detection using deep convolutional neural network. IEEE Access. 2020; 8: 204951–204962. https://doi.org/10.1109/ACCESS.2020.3037073.

Ayo FE, Folorunso O, Ibharalu FT, Osinuga IA, Alli AA. A probabilistic clustering model for hate speech classification in Twitter. Expert Syst Appl. 2021; 173: 114762. https://doi.org/10.1016/j.eswa.2021.114762.

Liu L, Xu D, Zhao P, Zeng DD, Hu PJH, Zhang Q, Luo Y, Cao Z. A cross-lingual transfer learning method for online COVID-19-related hate speech detection. Expert Syst Appl. 2023; 234: 121031. https://doi.org/10.1016/j.eswa.2023.121031.

Pérez JM, Luque FM, Zayat D, Kondratzky M, Moro A, Serrati PS, Zajac J, Miguel P, Debandi N, Gravano A, Cotik V. Assessing the impact of contextual information in hate speech detection. IEEE Access. 2023; 11: 30575-30590. https://doi.org/10.1109/ACCESS.2023.3258973.

Makhadmeh ZA, Tolba A. Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach. Computing. 2020; 102: 501–522. https://doi.org/10.1007/s00607-019-00745-0.

Dwivedy V, Roy PK. Deep feature fusion for hate speech detection: A transfer learning approach. Multimed Tools Appl. 2023; 82: 36279–36301. https://doi.org/10.1007/s11042-023-14850-y.

Şahinuç F, Yilmaz EH, Toraman C, Koç A. The effect of gender bias on hate speech detection. Sig Img Proc Lett. 2023; 17: 1591–1597. https://doi.org/10.1007/s11760-022-02368-z

Miok K, Škrlj B, Zaharie D, Šikonja MR. To BAN or not to BAN: Bayesian attention networks for reliable hate speech detection. Cogn Comput. 2022; 14: 353–371. https://doi.org/10.1007/s12559-021-09826-9.

Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V. Emotionally informed hate speech detection: A multitarget perspective. Cogn Comput. 2022; 14: 322–352. https://doi.org/10.1007/s12559-021-09862-5.

Stanković SV, Mladenović M. An approach to automatic classification of hate speech in sports domain on social media. J Big Data. 2023; 10(109): 1-15. https://doi.org/10.1186/s40537-023-00766-9.

Díaz JAG, Zafra SMJ, Cumbreras MAG, García RV. Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers. Complex Intell Syst. 2023; 9: 2893–2914. https://doi.org/10.1007/s40747-022-00693-x.

Ghosh S, Ekbal A, Bhattacharyya P, Saha T, Kumar A, Srivastava S. SEHC: A benchmark setup to identify online hate speech in English. IEEE Trans Comput Soc Syst. 2023; 10: 760-770. https://doi.org/10.1038/s41598-022-08438-z.

Min B, Xu H, Ma J, He X, Wang M, Guo H, Wang W, Zheng K, Jin D, Zhang C. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput Surv. 2023; 56(2): 1–40. https://doi.org/10.1145/3605943.

Joni S, Maximilian H, Shammur A, Soon-Gyo J. Developing an online hate classifier for multiple social media platforms. Hum Centric Comput Inf Sci. 2020; 10(1): 1-14. https://doi.org/10.1186/s13673-019-0205-6.

Pronoza E, Panicheva P, Koltsova O, Rosso P. Detecting ethnicity-targeted hate speech in Russian social media texts. Inf Process Manag. 2021; 58: 102674. https://doi.org/10.1016/J.IPM.2021.102674.

Subba B, Gupta P. A tfidf vectorizer and singular value decomposition based host intrusion detection system framework for detecting anomalous system processes. Comput Secur. 2021; 110: 102084. https://doi.org/10.1016/j.cose.2020.102084.

Smith MQR, Ruxton GD. Effective use of the McNemar test. Behav Ecol Sociobiol. 2020; 133. https://doi.org/10.1007/s00265-020-02916-y.

Unmasking Online Hostility: Analysing and Mitigating Hate Speech in Social Media

Authors

DOI:

Keywords:

Abstract

References

Downloads

Issue

Section

License

How to Cite

Journal Info
Journal: Baghdad Science Journal
Publisher: College of Science for Women/ University of Baghdad
Baghdad Sci. J. is peer-reviewed and open access
Print ISSN: 2078-8665
Electronic ISSN: 2411-7986
Publishing Frequency: Quarterly (from 2004 - 2021) Bi-monthly (from 2022) Monthly (from 2024)
Launched Date: 2004
Abbreviation: Baghdad Sci.J.
Each published paper in Baghdad Sci. J. has a digital object identifier (DOI) number