Semantic Model Evaluation Dataset For Indonesian In Al-Qur'an Vocabulary: Similarity and Relatedness
Abstract
Abstract — The Qur'an is one of the research in linguistic branches that have not been studied by many experts in their field so it has not gotten a popular place. Whereas in the Qur'an, very many words can be used to be researched especially in terms of Natural Language Processing such as text classification, document clustering, text summarization, etc. One of them is like the semantic similarity and the Distribution Semantic Model. The purpose of this writing is to try to create an evaluation dataset in the model of semantic distribution in Bahasa Indonesia with two classes of words that are noun and verb, looking for equal value and linkage of 500 word-pairs provided. Hopefully by looking at this, the semantic sciences that exist for the study of the Qur'an are growing, especially in the translation of the Quran in the Indonesia Language. This research was created at the same time to create datasets such as previously conducted research, in order to hope that future research with the focus of other discussions can use this dataset to help with the research. The study uses 6236 number of verses and from the number of such verses, the system gets 2193 for nouns and 1733 for verbs. The amount is processed using the Sim-rail vector method, a questionnaire against 15 respondents and gold standard, to get the performance value measured using Spearman Rank and get a correlation result of 0.909.
Keywords — Natural Language Processing; Distribution Semantic Model; Sim-Rel Vector; Spearman Rank
The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to Jurnal Teknologi Informasi dan Terapan (J-TIT) and Department of Information Technology, Politeknik Negeri Jember as publisher of the journal. Copyright encompasses rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations. Authors should sign a copyright transfer agreement when they have approved the final proofs sent by Jurnal Teknologi Informasi dan Terapan (J-TIT) prior to the publication. The copyright transfer agreement can be download here .
Jurnal Teknologi Informasi dan Terapan (J-TIT) and Department of Information Technology, Politeknik Negeri Jember and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Jurnal Teknologi Informasi dan Terapan (J-TIT) are the sole responsibility of their respective authors and advertisers.
Users of this website will be licensed to use materials from this website following the Creative Commons Attribution 4.0 International License. No fees charged. Please use the materials accordingly.
This work is licensed under a Creative Commons Attribution-Share A like 4.0 International License
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.