Abstract
Thousands of new websites are published every day, which pose significant challenges for web classification and cybersecurity. URL classification datasets, including general and cybersecurity-specific ones, face challenges such as class imbalance, noise, and ambiguous data, which can significantly affect model performance. This study proposes a novel Fine-Tuned FastText Unsupervised Embedding Augmentation Technique (FFUEAT). Datasets (DMOZ and Phishing) were used to evaluate the performance of the proposed technique, achieving F1-scores of 0.8639 with DistilBERT and 0.8891 with BERT. The performance of sparse minority classes, achieving accuracy increases of 20.88% for 'home', 58.04% for 'news', and 11.18% for the 'kids' categories of the publicly available DMOZ dataset using the FFUEAT technique. When applied to the cybersecurity dataset (Phishing), leveraging a BiLSTM model, the proposed technique achieved F1-scores of 0.98 for legitimate sites and 0.99 for phishing URLs. The impact of class imbalance, noise and ambiguous data in URL classification datasets is reduced by applying the FFUEAT technique. This method offers a promising way to improve web classification and cybersecurity threat detection, contributing to better online content management and safety.
Keywords
Class imbalance, Cybersecurity, FastText, RNN and transformers, URL classification
Subject Area
Computer Science
Article Type
Article
First Page
1739
Last Page
1759
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite this Article
Ali, Zafar; Yuhaniz, Siti Sophiayati; Noureen, Noureen; Mujtaba, Ghulam; and Ahmed, Husham M.
(2026)
"A Novel FFUEAT Technique Enhancing the Performance of Multiple URL Classification and Cybersecurity using RNN and Transformer-based Models,"
Baghdad Science Journal: Vol. 23:
Iss.
5, Article 14.
DOI: https://doi.org/10.21123/2411-7986.5298
