The Effect Of Optimizers On The Generalizability Additive Neural Attention For Customer Support Twitter Dataset In Chatbot Application

Sinarwati Mohamad Suhaili; Naomie Salim; Mohamad Nazim Jambli

doi:10.21123/bsj.2024.9743

PDF

Published: Feb 25, 2024

DOI: https://doi.org/10.21123/bsj.2024.9743

Keywords:

ADAM, ADAMW, Neural Network-based Chatbot, Optimizer, SGD

Sinarwati Mohamad Suhaili

Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia & Centre of Pre-University, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia.

https://orcid.org/0000-0002-3354-9679

Naomie Salim

Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia.

Mohamad Nazim Jambli

Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia.

https://orcid.org/0000-0002-2117-5964

Abstract

When optimizing the performance of neural network-based chatbots, determining the optimizer is one of the most important aspects. Optimizers primarily control the adjustment of model parameters such as weight and bias to minimize a loss function during training. Adaptive optimizers such as ADAM have become a standard choice and are widely used for their invariant parameter updates' magnitudes concerning gradient scale variations, but often pose generalization problems. Alternatively, Stochastic Gradient Descent (SGD) with Momentum and the extension of ADAM, the ADAMW, offers several advantages. This study aims to compare and examine the effects of these optimizers on the chatbot CST dataset. The effectiveness of each optimizer is evaluated based on its sparse-categorical loss during training and BLEU in the inference phase, utilizing a neural generative attention-based additive scoring function. Despite memory constraints that limited ADAMW to ten epochs, this optimizer showed promising results compared to configurations using early stopping techniques. SGD provided higher BLEU scores for generalization but was very time-consuming. The results highlight the importance of finding a balance between optimization performance and computational efficiency, positioning ADAMW as a promising alternative when training efficiency and generalization are primary concerns.

Received 29/09/2023

Revised 10/02/2024

Accepted 12/02/2024

Published 25/02/2024

How to Cite

The Effect Of Optimizers On The Generalizability Additive Neural Attention For Customer Support Twitter Dataset In Chatbot Application. Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2025 Feb. 21];21(2(SI):0655. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9743

Issue

Vol. 21 No. 2(SI) (2024): 2(Special Issue) ICAC2023/PARS2023

Section

article

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Download Citation

References

Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press; 2016.[cited 2023 Oct 13]. http://www.deeplearningbook.org.

Gupta M, Rajnish K, Bhattacharjee V. Impact of parameter tuning for optimizing deep neural network models for predicting software faults. Sci Program. 2021;1–17.https://doi.org/10.1155/2021/6662932.

Sulayman N. Deep Learning-based Predictive Model of mRNA Vaccine Deterioration: An Analysis of the Stanford COVID-19 mRNA Vaccine Dataset. Baghdad Sci. J . 2023;20(4(SI):1451-8. https://doi.org/10.21123/bsj.2023.8504.

Zhou P, Feng J, Ma C, et al. Towards theoretically understanding why sgd generalizes better than adam in deep learning. Adv Neural Inf Process Syst. 2020;33:21285–21296.

Wotaifi TA, Dhannoon BN. An Effective Hybrid Deep Neural Network for Arabic Fake News Detection. Baghdad Sci. J . 2023;20(4):1392. https://doi.org/10.21123/bsj.2023.7427.

Aggarwal CC. Neural networks and deep learning: A textbook. 2nd ed. Springer International Publishing; 2023. https://doi.org/10.1007/978-3-031-29642-0

Abadi M, Barham P, Chen J, et al. TensorFlow: A system for large-scale machine learning. 2016.

Mou L, Jin Z. Tree-Based Convolutional Neural Networks: Principles and Applications. 1st ed. Springer Publishing Company, Incorporated; 2018. https://doi.org/10.1007/978-981-13-1870-2

Tian Y, Zhang Y, Zhang H. Recent Advances in Stochastic Gradient Descent in Deep Learning. Mathematics. 2023;11(3):682. http://dx.doi.org/10.3390/math11030682.

Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington DM, editors. AISTATS, JMLR Proceedings, vol. 9; 2010. p. 249–256.

CS-IF

2.0

CiteScore

1.2

Impact Factor

Make a Submission

issn

P-ISSN: 2078-8665 | E-ISSN: 2411-7986

journalindexing

Journal Indexing
SCOPUS
Directory of Open Access Journals DOAJ
Library of Congress
Iraqi Academic Scientific Journal
Open Access Scholarly Publishers Association (OASPA)
SNIP (Source Normalized Impact Per Paper)

journalinfo

Journal Info
Journal: Baghdad Science Journal
Publisher: College of Science for Women/ University of Baghdad
Baghdad Sci. J. is peer-reviewed and open access
Print ISSN: 2078-8665
Electronic ISSN: 2411-7986
Publishing Frequency: Quarterly (from 2004 - 2021) Bi-monthly (from 2022) Monthly (from 2024)
Launched Date: 2004
Abbreviation: Baghdad Sci.J.
Each published paper in Baghdad Sci. J. has a digital object identifier (DOI) number

Language

scopus

1.3

2022CiteScore

50th percentile

ca

cope

sjr

locongress

clockss

Ithenticate

Sherpa Romeo

crossref

WHO

sci journal

uob digital repository

Scilit

cc

© 2022 The Author(s). Published by College of Science for Women, University of Baghdad. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article Sidebar

Main Article Content

Abstract

Article Details

How to Cite

References