Short Text Semantic Similarity Measurement Approach Based on Semantic Network

Naamah Hussien Hameed; Adel M. Alimi; Ahmed T. Sadiq

doi:10.21123/bsj.2022.7255

المؤلفون

Naamah Hussien Hameed Computer Science Department, University of Technology, Baghdad, Iraq. https://orcid.org/0000-0002-4552-3927
Adel M. Alimi 1Computer Science Department, University of Technology, Baghdad, Iraq.
Ahmed T. Sadiq Computer Science Department, University of Technology, Baghdad, Iraq. https://orcid.org/0000-0002-4749-8243

DOI:

https://doi.org/10.21123/bsj.2022.7255

الملخص

يلعب تقدير التشابه الدلالي بين النصوص القصيرة دورًا بارزًا بشكل متزايد في العديد من المجالات المتعلقة بتعدين النص وتطبيقات معالجة اللغة الطبيعية ، خاصة مع الزيادة الكبيرة في حجم البيانات النصية التي يتم إنتاجها يوميًا. الأساليب التقليدية لحساب درجة التشابه بين نصين بناءً على الكلمات التي يتشاركانها لا تعمل بشكل جيد مع النصوص القصيرة. لأن نصين متشابهين يمكن كتابتهما بعبارات مختلفة من خلال استخدام المرادفات. نتيجة لذلك ، يجب مقارنة الجمل الموجزة من حيث المعنى الدلالي. في هذا البحث ، يتم تقديم طريقة قياس التشابه الدلالي بين النصوص والتي تجمع بين المعلومات الدلالية القائمة على المعرفة والنصوص لبناء شبكة دلالية تمثل العلاقة بين النصوص المقارنة وتستخلص درجة التشابه بينها. يمثل تمثيل النص كشبكة دلالية أفضل تمثيل معرفي يقترب من فهم العقل البشري للنصوص ، حيث تعكس الشبكة الدلالية المعرفة الدلالية والنحوية والهيكلية للجملة. تمثيل الشبكة هو تمثيل مرئي لأشياء المعرفة وصفاتها وعلاقتها. تم استخدام قاعدة بيانات WordNet المعجمية كمصدر قائم على المعرفة بينما تم استخدام متجهات تضمين الكلمات المدربة مسبقًا من GloVe كمصدر مستند إلى النصوص. تم اختبار الطريقة المقترحة باستخدام ثلاث مجموعات بيانات مختلفة ، مجموعات بيانات SICK و DSCS و MOHLER. تم الحصول على نتائج جيدة بصيغة RMSE و MAE.

Received 30/3/2022, Accepted 8/8/2022, Published Online First 25/11/2022

المراجع

Liu J, Kong X, Zhou X, Wang L, Zhang D, Lee I, et al. Data mining and information retrieval in the 21st century: A bibliographic review. Comput Sci Rev. 2019; 34: 100193.

El-Kassas WS, Salama CR, Rafea AA, Mohamed HK. Automatic text summarization: A comprehensive survey. Expert Syst Appl. 2021; 165: 113679.

AL-Jumaili AS. A hybrid method of linguistic and statistical features for Arabic sentiment analysis. Baghdad Sci J. 2020; 17(1 (Suppl.)): 0385-0385.

Moussallem D, Wauer M, Ngomo A-CN. Machine translation using semantic web technologies: A survey. J Web Semant. 2018; 51: 1-19.

Singh R, Singh S. Text similarity measures in news articles by vector space model using NLP. J Inst Eng (India): B. 2021; 102(2): 329-338.

Sánchez Rodríguez I. Text similarity by using GloVe word vector representations [Master thesis]: Polytechnic University of Valencia; 2017.

Hassani H, Beneki C, Unger S, Mazinani MT, Yeganegi MR. Text mining in big data analytics. Big Data Cogn Comput. 2020; 4(1): 1.

Lee YY, Ke H, Yen TY, Huang HH, Chen HH. Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement. J Assoc Inf Sci. 2020; 71(6): 657-670.

Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: A survey. Sci China Technol Sci. 2020; 63(10): 1872-1897.

Liu H, Wang P. Assessing Sentence Similarity Using WordNet based Word Similarity. J Softw. 2013; 8(6): 1451-1458.

Croft D, Coupland S, Shell J, Brown S, editors. A fast and efficient semantic short text similarity metric. 13th UK workshop on computational intelligence (UKCI); 2013.

Kusner M, Sun Y, Kolkin N, Weinberger K, editors. From word embeddings to document distances. International conference on machine learning. Proc Int Conf Mach Learn. 2015; 37: 957-966. Available from: https://proceedings.mlr.press/v37/kusnerb15.html.

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst. 2013; 26.

Vu HH, Villaneau J, Saïd F, Marteau P-F, editors. Sentence similarity by combining explicit semantic analysis and overlapping n-grams. Proc Int Conf TSD . 2014: Springer. Available from: https://doi.org/10.1007/978-3-319-10816-2_25.

Soğancıoğlu G, Öztürk H, Özgür A. BIOSSES: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics. 2017; 33(14): i49-i58.

Pawar A, Mago V. Calculating the similarity between words and sentences using a lexical database and corpus statistics. arXiv preprint arXiv:180205667. 2018.

Yang J, Li Y, Gao C, Zhang Y. Measuring the short text similarity based on semantic and syntactic information. Future Gener Comput Syst. 2021; 114: 169-180.

Fetty Fitriyanti Lubis MDWTSYRAPAAA. Automated Short-Answer Grading using Semantic Similarity based on Word Embedding. Int J Technol 2021; 12(3): 291-319.

Mijbel SH, Liatsis P, Sadiq AT. Text Similarity Approach based on Semantic Networks and Words Description. Des Eng. 2021: 15217-15228.

Chandrasekaran D, Mago V. Domain Specific Complex Sentence (DCSC) Semantic Similarity Dataset. arXiv preprint arXiv:201012637. 2020.

Chowdhary K. Fundamentals of artificial intelligence. New Delhi: Springer; 2020. Available from: https://doi.org/10.1007/978-81-322-3972-7.

Chandrasekaran D, Mago V. Evolution of semantic similarity—A survey. ACM Comput Surv. 2021; 54(2): 1-37.

Li Y, McLean D, Bandar ZA, O'shea JD, Crockett K. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006; 18(8): 1138-1150.

Araque O, Zhu G, Iglesias CA. A semantic similarity-based perspective of affect lexicons for sentiment analysis. Knowl Based Syst. 2019; 165: 346-359.

Rodriguez PL, Spirling A. Word embeddings: What works, what doesn’t, and how to tell the difference for applied research. J Polit. 2022; 84(1): 101-115.

Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013.

Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist. 2017; 5: 135-146.

Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. 2018.

Al Maksur I, Muhajir M. MyBotS Prototype on Social Media Discord with NLP. Baghdad Sci J. 2021; 18(1 (Suppl.)): 0753-0753.

Chiche A, Yitagesu B. Part of speech tagging: a systematic review of deep learning and machine learning approaches. J Big Data. 2022; 9(1): 1-25.

Pawar A, Mago V. Challenging the boundaries of unsupervised learning for semantic similarity. IEEE Access. 2019; 7: 16291-16308.

Mohler M, Bunescu R, Mihalcea R, editors. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. Portland, Oregon, USA. 2011: 752–762 Available from: https://aclanthology.org/P11-1076.

نهج قياس التشابه الدلالي للنص القصير على أساس الشبكة الدلالية

المؤلفون

DOI:

الملخص

المراجع

التنزيلات

منشور

إصدار

القسم

الرخصة

كيفية الاقتباس

Journal Info
Journal: Baghdad Science Journal
Publisher: College of Science for Women/ University of Baghdad
Baghdad Sci. J. is peer-reviewed and open access
Print ISSN: 2078-8665
Electronic ISSN: 2411-7986
Publishing Frequency: Quarterly (from 2004 - 2021) Bi-monthly (from 2022) Monthly (from 2024)
Launched Date: 2004
Abbreviation: Baghdad Sci.J.
Each published paper in Baghdad Sci. J. has a digital object identifier (DOI) number