The Effect Of Optimizers On The Generalizability Additive Neural Attention For Customer Support Twitter Dataset In Chatbot Application

Main Article Content

Sinarwati Mohamad Suhaili
https://orcid.org/0000-0002-3354-9679
Naomie Salim
Mohamad Nazim Jambli
https://orcid.org/0000-0002-2117-5964

Abstract

When optimizing the performance of neural network-based chatbots, determining the optimizer is one of the most important aspects. Optimizers primarily control the adjustment of model parameters such as weight and bias to minimize a loss function during training. Adaptive optimizers such as ADAM have become a standard choice and are widely used for their invariant parameter updates' magnitudes concerning gradient scale variations, but often pose generalization problems. Alternatively, Stochastic Gradient Descent (SGD) with Momentum and the extension of ADAM, the ADAMW, offers several advantages. This study aims to compare and examine the effects of these optimizers on the chatbot CST dataset. The effectiveness of each optimizer is evaluated based on its sparse-categorical loss during training and BLEU in the inference phase, utilizing a neural generative attention-based additive scoring function. Despite memory constraints that limited ADAMW to ten epochs, this optimizer showed promising results compared to configurations using early stopping techniques. SGD provided higher BLEU scores for generalization but was very time-consuming. The results highlight the importance of finding a balance between optimization performance and computational efficiency, positioning ADAMW as a promising alternative when training efficiency and generalization are primary concerns.

Article Details

How to Cite
1.
The Effect Of Optimizers On The Generalizability Additive Neural Attention For Customer Support Twitter Dataset In Chatbot Application. Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2024 Dec. 19];21(2(SI):0655. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9743
Section
article

How to Cite

1.
The Effect Of Optimizers On The Generalizability Additive Neural Attention For Customer Support Twitter Dataset In Chatbot Application. Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2024 Dec. 19];21(2(SI):0655. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9743

References

Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press; 2016.[cited 2023 Oct 13]. http://www.deeplearningbook.org.

Gupta M, Rajnish K, Bhattacharjee V. Impact of parameter tuning for optimizing deep neural network models for predicting software faults. Sci Program. 2021;1–17.https://doi.org/10.1155/2021/6662932.

Sulayman N. Deep Learning-based Predictive Model of mRNA Vaccine Deterioration: An Analysis of the Stanford COVID-19 mRNA Vaccine Dataset. Baghdad Sci. J . 2023;20(4(SI):1451-8. https://doi.org/10.21123/bsj.2023.8504.

Zhou P, Feng J, Ma C, et al. Towards theoretically understanding why sgd generalizes better than adam in deep learning. Adv Neural Inf Process Syst. 2020;33:21285–21296.

Wotaifi TA, Dhannoon BN. An Effective Hybrid Deep Neural Network for Arabic Fake News Detection. Baghdad Sci. J . 2023;20(4):1392. https://doi.org/10.21123/bsj.2023.7427.

Aggarwal CC. Neural networks and deep learning: A textbook. 2nd ed. Springer International Publishing; 2023. https://doi.org/10.1007/978-3-031-29642-0

Abadi M, Barham P, Chen J, et al. TensorFlow: A system for large-scale machine learning. 2016.

Mou L, Jin Z. Tree-Based Convolutional Neural Networks: Principles and Applications. 1st ed. Springer Publishing Company, Incorporated; 2018. https://doi.org/10.1007/978-981-13-1870-2

Tian Y, Zhang Y, Zhang H. Recent Advances in Stochastic Gradient Descent in Deep Learning. Mathematics. 2023;11(3):682. http://dx.doi.org/10.3390/math11030682.

Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington DM, editors. AISTATS, JMLR Proceedings, vol. 9; 2010. p. 249–256.

Similar Articles

You may also start an advanced similarity search for this article.