Implementasi Algoritma Damerau-Levenshtein Untuk Pemeriksaan Dan Koreksi Kesalahan Ejaan Bahasia Indonesia

Authors

  • Okvi Nugroho Universitas Muhammadiyah Sumatera Utara
  • Ahmad Rahmatika Universitas Muhammadiyah Sumatera Utara
  • Tri Andre Anu Universitas Muhammadiyah Sumatera Utara
  • Maulidya Rahmah Politeknik LP3I

DOI:

https://doi.org/10.59024/jiti.v4i2.1943

Keywords:

Algoritma Damerau-Levenshtein, Kata, Kesalahan, Akurasi

Abstract

This study implements the Damerau-Levenshtein algorithm for an Indonesian spelling checking and correction system based on the distance editing approach. The main objective of this study is to develop a system capable of automatically detecting and correcting spelling errors at the character level through a matching process against the KBBI dictionary and the Indonesian corpus. The methods used include data collection, text pre-processing, system design, and implementation of the Damerau-Levenshtein algorithm which includes insertion, deletion, substitution, and transposition operations. Testing was conducted using 25 test data consisting of standard words and modified words for typographical errors. The results show that the system is able to measure all test data with an accuracy level of 100% on a limited dataset. In addition, the average Damerau-Levenshtein Distance value of 0.84 indicates that most errors are in the light category. Evaluation using a confusion matrix produces precision, recall, and F1-score values ​​of 100% each. These results indicate that the Damerau-Levenshtein algorithm is effective in handling character-based spelling errors. However, the system still has limitations in handling complex semantic contexts and language variations. Therefore, further research is recommended to integrate language model-based approaches to improve the system's accuracy and generalization on real-world data.

References

Adawiyah, R., & Saragih, N. E. (2022). Implementasi Algoritma Levenshtein Distance Dalam Mendeteksi Plagiarisme. Journal Computer Science And Information Technology (Jcoint), 3(1), 54–63.

Alexandru-Costin, B. (2023). Comparison Of Deep Learning Models For Automatic Detection Of Sarcasm Context On The Mustard Dataset.

Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., … Morgan, G. (2023). Explainable Ai (Xai): Core Ideas, Techniques, And Solutions. Acm Computing Surveys, 55(9), 1–33.

Halim, Y. D. P., & Nurhaida, I. (2024). Lstm-Based Nlp Approach For Spelling Error Detection And Correction In Scientific Writing Indonesian Language. Electronic Journal Of Education, Social Economics And Technology, 5(1), 30–39.

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). Indolem And Indobert: A Benchmark Dataset And Pre-Trained Language Model For Indonesian Nlp. Arxiv Preprint Arxiv:2011.00677.

Lubis, A. R., Prayudani, S., Lubis, M., & Nugroho, O. (2022a). Latent Semantic Indexing (Lsi) And Hierarchical Dirichlet Process (Hdp) Models On News Data. 2022 5th International Conference Of Computer And Informatics Engineering (Ic2ie), 314–319. Ieee.

Lubis, A. R., Prayudani, S., Lubis, M., & Nugroho, O. (2022b). Sentiment Analysis On Online Learning During The Covid-19 Pandemic Based On Opinions On Twitter Using Knn Method. 2022 1st International Conference On Information System & Information Technology (Icisit), 106–111. Ieee.

Lubis, A. R., Prayudani, S., Nugroho, O., Lase, Y. Y., & Lubis, M. (2022). Comparison Of Model In Predicting Customer Churn Based On Users’ Habits On E-Commerce. 2022 5th International Seminar On Research Of Information Technology And Intelligent Systems (Isriti), 300–305. Ieee.

Nugroho, O. (2020). Implementation Of Marker Based Tracking Method In The Interactive Media Of Traditional Clothes Knowledge-Based On Augmented Reality 360. Journal Of Computer Science, Information Technology And Telecommunication Engineering, 1(2), 37–43.

Nugroho, O., & Hutagalung, G. A. (2020). Design And Implementation Of Android-Based Public Transport Trayek Using Cloud Computing Infrastructure. Al’adzkiya International Of Computer Science And Information Technology (Aiocsit) Journal, 1(1).

Rahmatika, A., Nugroho, O., & Anur, T. A. (2024). Using Relational Learning In Exploring The Effectiveness Of Using Hashtags In Future Topics And User Relations In X. Eastern-European Journal Of Enterprise Technologies, (2).

Ritha, N., Bettiza, M., & Dufan, A. (2016). Prediksi Curah Hujan Dengan Menggunakan Algoritma Levenberg-Marquardt Dan Backpropagation. Jurnal Sustainable: Jurnal Hasil Penelitian Dan Industri Terapan, 5(2), 11–16.

Sagadevan, S., Malim, N. H. A. H., & Husin, M. H. (2022). A Seed-Guided Latent Dirichlet Allocation Approach To Predict The Personality Of Online Users Using The Pen Model. Algorithms, 15(3). Https://Doi.Org/10.3390/A15030087

Wagh, R., & Punde, P. (2018). Survey On Sentiment Analysis Using Twitter Dataset. Proceedings Of The 2nd International Conference On Electronics, Communication And Aerospace Technology, Iceca 2018, (Iceca), 208–211. Https://Doi.Org/10.1109/Iceca.2018.8474783

Downloads

Published

2026-04-30