A Novel FFUEAT Technique Enhancing the Performance of Multiple URL Classification and Cybersecurity using RNN and Transformer-based Models

Zafar Ali, Faculty of Artificial Intelligence, University Teknologi Malaysia, Kuala Lumpur, MalaysiaFollow
Siti Sophiayati Yuhaniz, Faculty of Artificial Intelligence, University Teknologi Malaysia, Kuala Lumpur, MalaysiaFollow
Noureen Noureen, Faculty of Artificial Intelligence, University Teknologi Malaysia, Kuala Lumpur, MalaysiaFollow
Ghulam Mujtaba, Department of Computer Science, Sukkur IBA University, Sukkur 65200, PakistanFollow
Husham M. Ahmed, College of Engineering University of Technology Bahrain Kingdom of BahrainFollow

Abstract

Thousands of new websites are published every day, which pose significant challenges for web classification and cybersecurity. URL classification datasets, including general and cybersecurity-specific ones, face challenges such as class imbalance, noise, and ambiguous data, which can significantly affect model performance. This study proposes a novel Fine-Tuned FastText Unsupervised Embedding Augmentation Technique (FFUEAT). Datasets (DMOZ and Phishing) were used to evaluate the performance of the proposed technique, achieving F1-scores of 0.8639 with DistilBERT and 0.8891 with BERT. The performance of sparse minority classes, achieving accuracy increases of 20.88% for 'home', 58.04% for 'news', and 11.18% for the 'kids' categories of the publicly available DMOZ dataset using the FFUEAT technique. When applied to the cybersecurity dataset (Phishing), leveraging a BiLSTM model, the proposed technique achieved F1-scores of 0.98 for legitimate sites and 0.99 for phishing URLs. The impact of class imbalance, noise and ambiguous data in URL classification datasets is reduced by applying the FFUEAT technique. This method offers a promising way to improve web classification and cybersecurity threat detection, contributing to better online content management and safety.

Keywords

Class imbalance, Cybersecurity, FastText, RNN and transformers, URL classification

Subject Area

Computer Science

Article Type

Article

First Page

1739

Last Page

1759

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite this Article

Ali, Zafar; Yuhaniz, Siti Sophiayati; Noureen, Noureen; Mujtaba, Ghulam; and Ahmed, Husham M. (2026) "A Novel FFUEAT Technique Enhancing the Performance of Multiple URL Classification and Cybersecurity using RNN and Transformer-based Models," Baghdad Science Journal: Vol. 23: Iss. 5, Article 14.
DOI: https://doi.org/10.21123/2411-7986.5298

Download

COinS

A Novel FFUEAT Technique Enhancing the Performance of Multiple URL Classification and Cybersecurity using RNN and Transformer-based Models

Abstract

Keywords

Subject Area

Article Type

First Page

Last Page

Creative Commons License

How to Cite this Article

Search

Submission Locations

A Novel FFUEAT Technique Enhancing the Performance of Multiple URL Classification and Cybersecurity using RNN and Transformer-based Models

Authors

Abstract

Keywords

Subject Area

Article Type

First Page

Last Page

Creative Commons License

How to Cite this Article

Share

Search

Submission Locations