Abstract
Preparing a collection of Punjabi-English social media text and creating a next word prediction system for Punjabi users are the main goals of this study. As the Gurmukhi script has a large character set, typing in Punjabi takes a lot of time. That is the reason behind using the Roman script for communication on social media platforms like WhatsApp, Facebook, and Twitter by many Punjabi users. For such users, we have proposed a sequential CNN-BiLSTM architecture to provide suggestions to improve typing speed and make communication convenient. In our proposed model, we used 128 filters and a kernel size of 5 for the 1D convolutional layer, and then we placed a bidirectional LSTM layer with 150 units. The model was trained with a batch size of 256, and training took 38.22 minutes. In this study, we also demonstrate the key challenges in collecting and preprocessing social media text. We collected 311271 WhatsApp sequences ranging from 2 to 33 words to train our model. Experimental evaluation of this study shows that the proposed sequential CNN-LSTM model achieved higher accuracy compared to other LSTM-based approaches, such as LSTM, BiLSTM, and CNN-LSTM. This proposed model efficiently learns and thus produces common linguistic patterns in Punjabi-English bilingual social media texts produced by Punjabi users. The results of this study explore the important role of deep learning techniques in solving complex linguistic challenges and improving language modeling in multilingual texts.
Keywords
BiLSTM, CNN, LSTM, Next-word prediction, Social media text
Subject Area
Computer Science
Article Type
Article
First Page
18190
Last Page
18200
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite this Article
Singh, Gurpreet and Kamboj, C P
(2025)
"Next Word Prediction in Social Media Texts for Punjabi-English Bilingual Users with Sequential CNN-BiLSTM,"
Baghdad Science Journal: Vol. 22:
Iss.
11, Article 28.
DOI: https://doi.org/10.21123/2411-7986.5130
