Named Entity Recognition in Indonesian History Textbook Using BERT Model

Ichwanul Muslim; Muliawan Firdaus; Rizki Habibi

doi:10.31154/cogito.v11i1.880.140-151

Authors

Ichwanul Muslim Universitas Negeri Medan
Muliawan Firdaus Universitas Negeri Medan
Rizki Habibi Universitas Negeri Medan

DOI:

https://doi.org/10.31154/cogito.v11i1.880.140-151

Keywords:

NER, IOB, BERT, History

Abstract

History is not recognized as an explicit subject in some primary or secondary education institutions anymore. Certainly, this can cause concern for the younger generation about their nation's history. Whereas history textbooks are available in digital form and contain much information, the presentation is still unstructured and difficult to understand. This research aims to develop a model of extracting historical entities from textbooks using the Named Entity Recognition (NER) approach based on the BERT (Bidirectional Encoder Representations from Transformers). The text data is derived from the history chapter of the 8th Social Science published by the Ministry of Education. The research stages include data extraction, preprocessing, IOB labeling, identifying entities by the BERT algorithm, and performance evaluation. The preprocessing results successfully reduced irrelevant words and improved analysis efficiency. The BERT model showed high performance with a precision value of 88.68%, a recall of 74.60%, and an F1-score of 81.03%. In addition, there were fluctuations in training time between epochs that were influenced by entity variation and sentence complexity. Overall, this research shows, the model application can extract historical entities automatically and accurately, thus potentially enriching historical understanding for students and society through the utilization of Natural Language Processing technology

References

E. Suparjan, “Perubahan Kurikulum Pendidikan Sejarah Di SMA (1994-2013),” JISIP (Jurnal Ilmu Sosial dan Pendidikan), vol. 4, no. 3, 2020, doi: 10.36312/jisip.v4i3.1283.

R. A. Pratama, Maskun, and N. I. Lestari, “Dinamika Pelajaran Sejarah Indonesia dalam Kurikulum 2013 pada Jenjang SMK/MAK,” Jurnal Pendidikan Sejarah, vol. 8, no. 2, 2019, doi: 10.21009/jps.082.02.

Istianah and S. Wahyuningsih, “The hadith digitization in millennial era: A study at center for hadith studies, Indonesia,” Qudus International Journal of Islamic Studies, vol. 7, no. 1, 2019, doi: 10.21043/qijis.v7i1.4900.

N. S. Lagutina, A. M. Vasilyev, and D. D. Zafievsky, “Name Entity Recognition Tasks: Technologies and Tools,” Modeling and Analysis of Information Systems, vol. 30, no. 1, 2023, doi: 10.18255/1818-1015-2023-1-64-85.

A. Ushio, L. Espinosa-Anke, S. Schockaert, and J. Camacho-Collados, “BERT is to NLP what AlexNet is to CV: Can pre-trained language models identify analogies?,” in ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2021. doi: 10.18653/v1/2021.acl-long.280.

S. Zou, Y. Xie, J. Yan, Y. Wei, and X. Luan, “Research on Image Captioning Based on Vision-language Pre-trained Models,” in 2023 9th International Conference on Big Data and Information Analytics, BigDIA 2023 - Proceedings, 2023. doi: 10.1109/BigDIA60676.2023.10429361.

S. Pichai, “Google AI updates: Bard and new AI features in Search,” Google The Keyword Blog.

A. J. Keya, M. A. H. Wadud, M. F. Mridha, M. Alatiyyah, and M. A. Hamid, “AugFake-BERT: Handling Imbalance through Augmentation of Fake News Using BERT to Enhance the Performance of Fake News Classification,” Applied Sciences (Switzerland), vol. 12, no. 17, 2022, doi: 10.3390/app12178398.

E. I. Setiawan, L. Kristianto, A. T. Hermawan, J. Santoso, K. Fujisawa, and M. H. Purnomo, “Social Media Emotion Analysis in Indonesian Using Fine-Tuning BERT Model,” in 3rd 2021 East Indonesia Conference on Computer and Information Technology, EIConCIT 2021, 2021. doi: 10.1109/EIConCIT50028.2021.9431885.

R. C. G. Ramos, H. D. Calderón-Vilca, and F. C. Cárdenas-Mariño, “A BERT-based Question Answering Architecture for Spanish Language,” International Journal of Computer Information Systems and Industrial Management Applications, vol. 14, 2022.

D. H. Fudholi, A. Zahra, S. Rani, S. N. Huda, I. V. Paputungan, and Z. Zukhri, “BERT-based tourism named entity recognition: making use of social media for travel recommendations,” PeerJ Comput Sci, vol. 9, 2023, doi: 10.7717/PEERJ-CS.1731.

Y. Iwasaki, A. Yamashita, Y. Konno, and K. Matsubayashi, “Japanese abstractive text summarization using BERT,” Advances in Science, Technology and Engineering Systems, vol. 5, no. 6, 2020, doi: 10.25046/AJ0506199.

E. T. Luthfi, Z. I. M. Yusoh, and B. M. Aboobaider, “BERT based Named Entity Recognition for Automated Hadith Narrator Identification,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 1, 2022, doi: 10.14569/IJACSA.2022.0130173.

S. Sofiana, Konsep Bert Pada Natural Language Processing, vol. 225. Jawa Tengah: Eureka Media Aksara, 2021.

I. Kementerian Pendidikan dan Kebudayaan, Buku Paket Ilmu Pengetahuan Sosial Kelas 8, vol. 3, no. 4. 2017.

A. P. Anar, A. Widodo, and D. Indraswati, “Menilik Jejak Historis Pendidikan IPS Di Indonesia: Konsep Dan Kedudukan Pendidikan IPS Dalam Perubahan Kurikulum Di Sekolah Dasar,” Phinisi Integration Review, vol. 5, no. 2, 2022, doi: 10.26858/pir.v5i2.33677.

I. M. Karo Karo, M. F. M. Fudzee, S. Kasim, and A. A. Ramli, “Sentiment Analysis in Karonese Tweet using Machine Learning,” Indonesian Journal of Electrical Engineering and Informatics, vol. 10, no. 1, pp. 219–231, Mar. 2022, doi: 10.52549/ijeei.v10i1.3565.

A. Tehseen, T. Ehsan, H. Bin Liaqat, X. Kong, A. Ali, and A. Al-Fuqaha, “Shahmukhi named entity recognition by using contextualized word embeddings,” Expert Syst Appl, vol. 229, 2023, doi: 10.1016/j.eswa.2023.120489.

W. Hwang, J. Yim, S. Park, S. Yang, and M. Seo, “Spatial Dependency Parsing for Semi-Structured Document Information Extraction,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021. doi: 10.18653/v1/2021.findings-acl.28.

M. Modrzejewski, T. Le Ha, A. Waibel, M. Exel, and B. Buschbeck, “Incorporating External Annotation to improve Named Entity Translation in NMT,” in Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, 2020.

M. Mujahid, K. Kanwal, F. Rustam, W. Aljadani, and I. Ashraf, “Arabic ChatGPT Tweets Classification Using RoBERTa and BERT Ensemble Model,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 8, 2023, doi: 10.1145/3605889.

L. Zhang, P. Xia, X. Ma, C. Yang, and X. Ding, “Enhanced Chinese named entity recognition with multi-granularity BERT adapter and efficient global pointer,” Complex and Intelligent Systems, vol. 10, no. 3, 2024, doi: 10.1007/s40747-024-01383-6.

A. M. A. Barhoom, B. S. Abunasser, and S. S. Abu-Naser, “Sarcasm Detection in Headline News using Machine and Deep Learning Algorithms,” 2022.

I. M. K. Karo, R. Ramdhani, A. W. Ramadhelza, and B. Z. Aufa, “A Hybrid Classification Based on Machine Learning Classifiers to Predict Smart Indonesia Program,” in Proceeding - 2020 3rd International Conference on Vocational Education and Electrical Engineering: Strengthening the framework of Society 5.0 through Innovations in Education, Electrical, Engineering and Informatics Engineering, ICVEE 2020, 2020. doi: 10.1109/ICVEE50212.2020.9243195.