Text Similarity Analysis for Evaluating Alignment Between Lesson Plans and Teaching Reports
DOI:
https://doi.org/10.31154/cogito.v11i2.976.414-429Keywords:
Text Similarity algorithms, Class evaluation, Lesson Plans, Teaching ReportsAbstract
RPS (Rencana Pembelajaran Semester, or called Lesson Plans) is a class activity planning document in the higher education learning process that includes learning outcomes, methods, learning strategy, and evaluation criteria. It is created by the lecturers in charge of the course and coordinated with the relevant department. This document needs to be monitored throughout the semester for its conformity with the implementation document (Borang Pelaksanaan Perkuliahan (BPP)). It was done manually through our eRPS system, but it requires a lot of effort and precision and is not time-efficient. This research focused on evaluating the effectiveness of several content-based text similarity methods to detect RPS conformity compared with the BPP, or called Teaching Reports document. The Boyer-Moore (B), Rabin-Karp (R), Jaccard (JC), Jaro-Winkler (JW), Smith-Waterman (SW), Knuth-Morris-Pratt (K), Levenehtein cosine similarity (C), Dice (D), Jaro (J), and Soundex (S) algorithms were evaluated in this paper. In the vector-based similarity method, TF-IDF was used. The evaluation of 11 string-matching algorithms across four scenarios demonstrated clear performance trends. Fuzzy algorithms (SW with accuracy 0,845–0,870, and JW with accuracy 0,840-0,850) achieved the highest accuracy in a single row of lecturer scenario, while exact/pattern-based algorithms (B, K, and S with accuracy 0,8625–0,8725) on a combination of all rows of lectures with minimal variance (≈0,005–0,015). Pre-processing benefits fuzzy algorithms (+2.5%) but is neutral for exact/pattern-based algorithms. The combined scenario improves the exact/phonetic algorithms (+6–7%) but reduces the fuzzy performance algorithm (−10–14%). The optimal thresholds were generally 40–50%, except for JW and J, which were 65%.References
S. S. Kusumawardani et al., Panduan Penyusunan Kurikulum Pendidikan Tinggi Mendukung Merdeka Belajar-Kampus Merdeka Menuju Indonesia Emas, V (2024). Jakarta: Direktorat Jenderal Pendidikan Tinggi, Riset, dan Teknologi Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi, 2024.
A. Chrismanto et al., Panduan Penyusunan Kurikulum Berorientasi KKNI dengan Pendekatan Pendidikan Berbasis Capaian (Outcome-based Education/OBE). Yogyakarta: Universitas Kristen Duta Wacana, 2024.
LPAIP, Panduan Penyusunan Kurikulum Berorientasi KKNI Universitas Kristen Duta Wacana, 1st ed. Yogyakarta: Lembaga Pengembangan Akademik dan Inovasi Pembelajaran Universitas Kristen Duta Wacana, 2016. doi: QADW-1200-PA-16.030.001.
M. AL-Smadi, Z. Jaradat, M. AL-Ayyoub, and Y. Jararweh, “Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features,” Inf Process Manag, vol. 53, no. 3, pp. 640–652, 2017, doi: 10.1016/j.ipm.2017.01.002.
A. Debnath, N. Pinnaparaju, M. Shrivastava, V. Varma, and I. Augenstein, “Semantic Textual Similarity of Sentences with Emojis,” in Companion Proceedings of the Web Conference 2020, New York, NY, USA: ACM, Apr. 2020, pp. 426–430. doi: 10.1145/3366424.3383758.
G. Majumder, P. Pakray, A. Gelbukh, and D. Pinto, “Semantic Textual Similarity Methods, Tools, and Applications: A Survey,” Computación y Sistemas, vol. 20, no. 4, pp. 647–665, Dec. 2016, doi: 10.13053/cys-20-4-2506.
T. A. Firdaus, R. H. Putra, F. Arifandi, M. K. Anam, and L. Lathifah, “Implementasi Sistem Rencana Pembelajaran Semester Berbasis Web ntuk Mempermudah Proses Pembelajaran,” Jurnal Teknoinfo, vol. 17, no. 1, pp. 156–169, Jan. 2023, doi: 10.33365/JTI.V17I1.2348.
D. Sebastian, A. Chrismanto, and W. Raharjo, “Implementasi Algoritma Okapi Bm25 dan K-Means untuk Mencari Relevansi Artikel Pada Beberapa Situs Berita,” Yogyakarta: Universitas Kristen Marantha Bandung, 2011. Accessed: Jan. 16, 2025. [Online]. Available:https://www.researchgate.net/publication/263814545_IMPLEMENTASI_ALGORITMA_OKAPI_BM25_DAN_K-MEANS_UNTUK_MENCARI_RELEVANSI_ARTIKEL_PADA_BEBERAPA_SITUS_BERITA
A. R. Chrismanto, A. K. Sari, and Y. Suyanto, “Critical Evaluation On Spam Content Detection in Social Media,” Journal of Theoretical and Applied Information Technology (JATIT), vol. 100, no. 8, pp. 2642–2667, 2022, [Online]. Available: http://www.jatit.org/volumes/Vol100No8/29Vol100No8.pdf
L. Wang, L. Zhang, and J. Jiang, “Duplicate Question Detection With Deep Learning in Stack Overflow,” IEEE Access, vol. 8, pp. 25964–25975, 2020, doi: 10.1109/ACCESS.2020.2968391.
A. R. Lahitani, “Automated Essay Scoring menggunakan Cosine Similarity pada Penilaian Esai Multi Soal,” Jurnal Kajian Ilmiah, vol. 22, no. 2, pp. 107–118, May 2022, doi: 10.31599/JKI.V22I2.1121.
R. Delima, A. Rachmat, and C. #2, “Otomatisasi Pembentukan Class Diagram dengan Pendekatan Metode Pemrosesan Teks dan Algoritma CombineTF,” JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 10, no. 1, pp. 120–127, Apr. 2024, doi: 10.26418/JP.V10I1.72518.
N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 3980–3990. doi: 10.18653/v1/D19-1410.
A. Talman, A. Yli-Jyrä, and J. Tiedemann, “Sentence embeddings in NLI with iterative refinement encoders,” Nat Lang Eng, vol. 25, no. 4, pp. 467–482, Jul. 2019, doi: 10.1017/S1351324919000202.
Haihua Chen, Lei Wu, Jiangping Chen, Wei Lu, Junhua Ding, A comparative study of automated legal text classification using random forests and deep learning, Information Processing & Management, Volume 59, Issue 2, 2022, 102798, ISSN 0306-4573, https://doi.org/10.1016/j.ipm.2021.102798.
Yang X, He X, Zhang H, Ma Y, Bian J, Wu Y, Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models, JMIR Med Inform 2020;8(11):e19735, doi: 10.2196/19735
C. P. Chai, “Comparison of text pre-processing methods,” Nat Lang Eng, vol. 29, no. 3, pp. 509–553, May 2023, doi: 10.1017/S1351324922000213.
A. Jabbar, S. Iqbal, M. I. Tamimy, A. Rehman, S. A. Bahaj, and T. Saba, “An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing Systems,” IEEE Access, vol. 11, pp. 133681–133702, 2023, doi: 10.1109/ACCESS.2023.3332710.
Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation,” J Big Data, vol. 8, no. 1, pp. 1–16, Dec. 2021, doi: 10.1186/S40537-021-00413-1/FIGURES/6.
R. Feldman and J. Sanger, The Text Mining Handbook. Cambridge University Press, 2006. doi: 10.1017/cbo9780511546914.
C. C. Aggarwal, “Data Mining: The Textbook,” Springer International Publishing, pp. 285–344, 2015.
M. Kranti and V. Ghag, “Comparative Analysis of Effect of Stopwords Removal on Sentiment Classification,” IEEE International Conference on Computer, Communication and Control (IC4-2015), pp. 2–7, 2015.
D. Sebastian and K. A. Nugraha, “Text normalization for Indonesian abbreviated word using crowdsourcing method,” 2019 International Conference on Information and Communications Technology, ICOIACT 2019, pp. 529–532, 2019, doi: 10.1109/ICOIACT46704.2019.8938463.
N. Hanafiah, A. Kevin, C. Sutanto, Fiona, Y. Arifin, and J. Hartanto, “Text Normalization Algorithm on Twitter in Complaint Category,” Procedia Comput Sci, vol. 116, pp. 20–26, 2017, doi: 10.1016/j.procs.2017.10.004.
S. Dutta, T. Saha, S. Banerjee, and S. K. Naskar, “Text normalization in code-mixed social media text,” 2015 IEEE 2nd International Conference on Recent Trends in Information Systems, ReTIS 2015 - Proceedings, no. c, pp. 378–382, 2015, doi: 10.1109/ReTIS.2015.7232908.
R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Commun ACM, vol. 20, no. 10, pp. 762–772, Oct. 1977, doi: 10.1145/359842.359859.
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. MIT Press, 2009. Accessed: Jan. 18, 2025. [Online]. Available: https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press/dp/0262033844
R. M. Karp and M. O. Rabin, “Efficient Randomized Pattern-Matching Algorithms.,” IBM J Res Dev, vol. 31, no. 2, pp. 249–260, 1987, doi: 10.1147/RD.312.0249.
P. Jaccard, “Etude de la distribution florale dans une portion des Alpes et du Jura,” in Bulletin de la Societe Vaudoise des Sciences Naturelles 37 (142), 1901, pp. 547–579. Accessed: Jan. 18, 2025. [Online]. Available: https://www.researchgate.net/publication/225035806_Etude_de_la_distribution_florale_dans_une_portion_des_Alpes_et_du_Jura
J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets, 3rd ed. Cambridge University Press, 2020. Accessed: Jan. 18, 2025. [Online]. Available: https://www.amazon.com/Mining-Massive-Datasets-Jure-Leskovec-dp-1108476341/dp/1108476341
M. A. Jaro, “Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,” J Am Stat Assoc, vol. 84, no. 406, p. 414, Jun. 1989, doi: 10.2307/2289924.
O. Rozinek and J. Mares, “Fast and Precise Convolutional Jaro and Jaro-Winkler Similarity,” Conference of Open Innovation Association, FRUCT, pp. 604–613, 2024, doi: 10.23919/FRUCT61870.2024.10516360.
W. E. Winkler, “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage.,” 1990.
T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” J Mol Biol, vol. 147, no. 1, pp. 195–197, Mar. 1981, doi: 10.1016/0022-2836(81)90087-5.
D. E. Knuth, Jr. James H. Morris, and V. R. Pratt, “Fast Pattern Matching in Strings,” https://doi.org/10.1137/0206024, vol. 6, no. 2, pp. 323–350, Jul. 2006, doi: 10.1137/0206024.
“Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance | Guide books | ACM Digital Library.” Accessed: Aug. 20, 2025. [Online]. Available: https://dl.acm.org/doi/10.5555/1822502
A. Carass et al., “Evaluating White Matter Lesion Segmentations with Refined Sørensen-Dice Analysis,” Scientific Reports 2020 10:1, vol. 10, no. 1, pp. 1–19, May 2020, doi: 10.1038/s41598-020-64803-w.
D. Pinto, D. Vilariño, Y. Alemán, H. Gómez, N. Loya, and H. Jiménez-Salazar, “The Soundex Phonetic Algorithm Revisited for SMS Text Representation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7499 LNAI, pp. 47–55, 2012, doi: 10.1007/978-3-642-32790-2_5.
G. Salton, Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer, Hardcover Edition. Addison-Wesley, 1989. Accessed: Jan. 19, 2025. [Online]. Available: https://www.amazon.com/Automatic-Text-Processing-Transformation-Addison-Wesley/dp/0201122278
R. Singh and S. Singh, “Text Similarity Measures in News Articles by Vector Space Model Using NLP,” Journal of The Institution of Engineers (India): Series B, vol. 102, no. 2, pp. 329–338, Apr. 2021, doi: 10.1007/S40031-020-00501-5/METRICS.
H. Khatter, N. Goel, N. Gupta, and M. Gulati, “Movie Recommendation System using Cosine Similarity with Sentiment Analysis,” Proceedings of the 3rd International Conference on Inventive Research in Computing Applications, ICIRCA 2021, pp. 597–603, Sep. 2021, doi: 10.1109/ICIRCA51532.2021.9544794.
A. Widianto, E. Pebriyanto, F. Fitriyanti, and M. Marna, “Document Similarity Using Term Frequency-Inverse Document Frequency Representation and Cosine Similarity,” Journal of Dinda : Data Science, Information Technology, and Data Analytics, vol. 4, no. 2, pp. 149–153, Aug. 2024, doi: 10.20895/DINDA.V4I2.1589.
A. Srivastava and M. Sahami, “Text Mining: Classification, Clustering, and Applications (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series),” p. 328, 2009, Accessed: Sep. 03, 2021. [Online]. Available: http://www.amazon.com/Text-Mining-Classification-Clustering-Applications/dp/1420059408
S. M. Weiss, N. Indurkhya, T. Zhang, and F. J. Damerau, Text mining: Predictive methods for analyzing unstructured information. Springer New York, 2005. doi: 10.1007/978-0-387-34555-0.
A. R. Chrismanto, A. Afiahayati, Y. Sari, A. K. Sari, and Y. Suyanto, “Spam Comments Detection on Instagram Using Machine Learning and Deep Learning Methods,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 13, no. 1, p. 46, Aug. 2022, doi: 10.24843/LKJITI.2022.v13.i01.p05.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 CogITo Smart Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).


