•  
  •  
 

Abstract

During the COVID-19 pandemic, mass media, especially online news portals, have been essential in disseminating health information and governmental policies, serving as the primary reference for the general public. Unfortunately, not all news articles are relevant to COVID-19 case monitoring. Some sources provide information that is less useful for tracking the pandemic's progression. Thus, it is crucial to develop a methodology that allows news articles to effectively aid stakeholders in monitoring COVID-19 developments. This study proposes using Deep Learning (DL) models to classify news headlines for this purpose. The aim is to identify suitable and reliable DL models for classifying Indonesian-language news headlines related to COVID-19 by comparing two popular models: Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) under various data imbalance scenarios. To improve model performance and reduce overfitting during training, hyperparameter tuning is applied to parameters such as epochs, batch size, dropouts, and LSTM units. Furthermore, the model uses the Count-Vectorizer approach for word embedding with the Bag of Words (BoW) technique to effectively understand the text's vocabulary. The results indicate that the CNN model outperforms the LSTM model in terms of precision, efficiency, and reliability, especially in scenarios with imbalanced data. The CNN model proves superior across all levels of data balance when evaluating its capacity to classify imbalanced data.

Keywords

COVID-19, CNN, Imbalanced data, LSTM, News headline data classification

Subject Area

Computer Science

Article Type

Article

First Page

2458

Last Page

2468

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS