Research on Emotion Classification Based on Multi-modal Fusion

Main Article Content

zhihua Xiang
https://orcid.org/0009-0007-2135-7390
Nor Haizan Mohamed Radzi
Haslina Hashim
https://orcid.org/0000-0003-0048-719X

Abstract

Nowadays, people's expression on the Internet is no longer limited to text, especially with the rise of the short video boom, leading to the emergence of a large number of modal data such as text, pictures, audio, and video. Compared to single mode data ,the multi-modal data always contains massive information. The mining process of multi-modal information can help computers to better understand human emotional characteristics. However, because the multi-modal data show obvious dynamic time series features, it is necessary to solve the dynamic correlation problem within a single mode and between different modes in the same application scene during the fusion process. To solve this problem, in this paper, a feature extraction framework of the three-dimensional dynamic expansion is established based on the common multi-modal data, for example video , sound ,text.Based on the framework, a multi-modal fusion-matched framework based on spatial and temporal feature enhancement, respectively to solve the dynamic correlation within and between modes, and then model the short and long term dynamic correlation information between different modes based on the proposed framework. Multiple group experiments performed on MOSI datasets show that the emotion recognition model constructed based on the framework proposed here in this paper can better utilize the more complex complementary information between different modal data. Compared with other multi-modal data fusion models, the spatial-temporal attention-based multimodal data fusion framework proposed in this paper significantly improves the emotion recognition rate and accuracy when applied to multi-modal emotion analysis, so it is more feasible and effective.

Article Details

How to Cite
1.
Research on Emotion Classification Based on Multi-modal Fusion. Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2024 Dec. 19];21(2(SI):0548. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9454
Section
article

How to Cite

1.
Research on Emotion Classification Based on Multi-modal Fusion. Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2024 Dec. 19];21(2(SI):0548. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9454

References

Tan, Y., Zhang, J., & Xia, L. A survey of sentiment analysis on social media[J]. Data Anal. Knowl. Discov. .2020;4(1): 1-11. https://doi.org/10.11925/infotech.2096-3467.2019.0769.

Cimtay, Y., Ekmekcioglu, E., & Caglar-Ozhan, S. Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access.2020;8: 168865-168878. https://doi.org/10.1109/ACCESS.2020.3023871.

Thandaga Jwalanaiah, S. J., Jeena Jacob, I., & Mandava, A. K. Effective deep learning based multimodal sentiment analysis from unstructured big data. Expert Systems.2023; 40(1): e13096. https://doi.org/10.1111/exsy.13096.

Xuyang, W. A. N. G., Shuai, D. O. N. G., & Jie, S. H. I. Multimodal Sentiment Analysis with Composite Hierarchical Fusion. Front. Comput. Sci. .2023; 17(1): 198-208. https://doi.org/10.3778/j.issn.1673-9418.2111004.

Zadeh, A., Liang, P. P., Poria, S., Vij, P., Cambria, E., & Morency, L. P. Multi-attention recurrent network for human communication comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence.2018; (Vol. 32, No. 1):5642-5649. https://doi.org/10.1609/aaai.v32i1.12024.

Semma A, Hannad Y, Siddiqi I, et al. Writer identification using deep learning with fast keypoints and harris corner detector[J]. Expert Syst. Appl. . 2021; 184: 115473.https://doi.org/10.1016/j.eswa.2021.115473.

Zadeh A, Liang P P, Mazumder N, et al. Memory fusion network for multi-view sequential learning.Proceedings of the AAAI conference on artificial intelligence. 2018; 32(1):5634-5641. https://doi.org/10.1609/aaai.v32i1.12021.

Ibrahim, V., Abu Bakar, J., Harun, N. H. ., & Abdulateef , A. F. A Word Cloud Model based on Hate Speech in an Online Social Media Environment[J]. Baghdad Sci. J. 2021;18(2(Suppl.): 0937-0946. https://doi.org/10.21123/bsj.2021.18.2(Suppl.).0937.

Hameed, N. H., Alimi, A. M., & Sadiq, A. T. Short Text Semantic Similarity Measurement Approach Based on Semantic Network[J]. Baghdad Sci. J. 2022;19(6(Suppl.):1581-1591. https://dx.doi.org/10.21123/bsj.2022.7255.

Gandhi A, Adhvaryu K, Poria S, et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information Fusion. 2022;424-444.https://doi.org/10.1016/j.inffus.2022.09.025.

Similar Articles

You may also start an advanced similarity search for this article.