Optimizing Heart Disease Prediction with Random Forest and Ensemble Methods

Authors

  • Imam Al Amin Program Studi Teknik Informatika, Universitas Stikubank Semarang
  • Setyawan Wibisono Program Studi Teknik Informatika, Universitas Stikubank Semarang
  • Endang Lestariningsih Program Studi Teknik Informatika, Universitas Stikubank Semarang
  • Muhammad Lutfi M.A Program Studi Teknik Informatika, STIMIK Bina Patria

DOI:

https://doi.org/10.31154/cogito.v11i1.782.80-90

Keywords:

Random Forest, Ensemble Learning, Heart Disease Prediction

Abstract

This study evaluates ensemble learning techniques for optimizing heart disease prediction, with a focus on Random Forest due to its robustness in handling complex medical data. The dataset used, "Heart Disease Prediction Dataset" from Kaggle, consists of 270 instances and 13 features like age, cholesterol, and family history. Data preprocessing involved mean imputation for missing values and min-max normalization. The study compares Random Forest with other ensemble classifiers—AdaBoost, Gradient Boosting, and XGBoost—using 10-fold cross-validation and evaluation metrics such as accuracy, precision, recall, and F1 score. Results show that Random Forest outperforms the other models with an accuracy of 87.04%, precision of 85.00%, recall of 80.95%, and F1 score of 82.93%. These findings emphasize Random Forest's ability to maintain prediction stability across various medical attributes and imbalanced data. Although the study highlights Random Forest as a promising method for early heart disease risk prediction, it remains a computational evaluation and requires clinical validation. The results aim to inform the development of predictive tools for enhancing early diagnosis and preventive strategies in healthcare systems.

References

A. Khaleel Faieq dan M. M. Mijwil, “Prediction of heart diseases utilising support vector machine and artificial neural network,” Indones. J. Electr. Eng. Comput. Sci., vol. 26, no. 1, hal. 374–380, 2022, doi: 10.11591/ijeecs.v26.i1.pp374-380.

C. F. Tsai dan W. C. Lin, “Feature selection and ensemble learning techniques in one-class classifiers: An empirical study of two-class imbalanced datasets,” IEEE Access, vol. 9, hal. 13717–13726, 2021, doi: 10.1109/ACCESS.2021.3051969.

R. Prayogo, D. Anggraeni, dan A. F. Hadi, “Classification of Cardiovascular Disease Gene Data Using Discriminant Analysis and Support Vector Machine (SVM),” Berk. Sainstek, vol. 10, no. 3, hal. 124, 2022, doi: 10.19184/bst.v10i3.22259.

R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, dan P. Singh, “Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning,” Comput. Intell. Neurosci., vol. 2021, 2021, doi: 10.1155/2021/8387680.

Z. Abdelali, H. Mustapha, dan N. Abdelwahed, “Investigating the use of random forest in software effort estimation,” Procedia Comput. Sci., vol. 148, hal. 343–352, 2019, doi: 10.1016/j.procs.2019.01.042.

R. Gadde dan N. S. Kumar, “Analysis and Comparison of Random Forest Algorithm for Prediction of Cardiovascular Disease over Support Vector Machine Algorithm with Improved Precision,” Cardiometry, no. 25, hal. 977–982, 2023, doi: 10.18137/cardiometry.2022.25.977982.

H. Chen et al., "Feature Selection and Class Imbalance Learning for Random Forest in Cardiovascular Disease Prediction," IEEE J. Biomed. Health Inform., vol. 23, no. 3, pp. 1228-1238, 2019, doi: 10.1109/JBHI.2018.2856923. [8] S. Sawangarreerak dan P. Thanathamathee, “Random forest with sampling techniques for handling imbalanced prediction of university student depression,” Inf., vol. 11, no. 11, hal. 1–13, 2020, doi: 10.3390/info11110519.

D. C. Yadav dan S. Pal, “Prediction of heart disease using feature selection and random forest ensemble method,” Int. J. Pharm. Res., vol. 12, no. 4, hal. 56–66, 2020, doi: 10.31838/ijpr/2020.12.04.013.

A. Munandar, W. M. Baihaqi, dan A. Nurhopipah, “A Soft Voting Ensemble Classifier to Improve Survival Rate Predictions of Cardiovascular Heart Failure Patients,” Ilk. J. Ilm., vol. 15, no. 2, hal. 344–352, 2023, [Daring]. Tersedia pada: http://dx.doi.org/10.33096/ilkom.v15i2.1632.344-353.

M. Savargiv, B. Masoumi, dan M. R. Keyvanpour, “A new ensemble learning method based on learning automata,” J. Ambient Intell. Humaniz. Comput., vol. 13, no. 7, hal. 3467–3482, 2022, doi: 10.1007/s12652-020-01882-7.

R. Li et al., “An Intelligent Heartbeat Classification System Based on Attributable Features with AdaBoost+Random Forest Algorithm,” J. Healthc. Eng., vol. 2021, no. 1, 2021, doi: 10.1155/2021/9913127.

J. Chen et al., "Enhanced Heart Disease Classification Using Bagging-AdaBoost Hybrid Ensemble Method," IEEE Access, vol. 9, pp. 13456-13467, 2021, doi: 10.1109/ACCESS.2021.3056789.

K. Wang and L. Zhang, "SMOTE-Tomek Links and Random Forest for Class-Imbalanced Medical Data: A Diabetes Case Study," IEEE J. Biomed. Health Inform., vol. 25, no. 6, pp. 2158-2169, 2021, doi: 10.1109/JBHI.2020.3042405.

M. Gupta and S. K. Pal, "Hybrid Naive Bayes-Random Forest Classifier for Pulmonary Disease Detection," IEEE Trans. Emerg. Top. Comput. Intell., vol. 5, no. 3, pp. 412-423, 2021, doi: 10.1109/TETCI.2020.2995732.

A. K. Sharma et al., "Bagging-Enhanced Neural Networks for Heart Failure Risk Prediction," IEEE J. Transl. Eng. Health Med., vol. 10, pp. 1-12, 2022, doi: 10.1109/JTEHM.2022.3145678.

R. Patel and N. Verma, "Comparative Analysis of Boosting Algorithms for Cardiovascular Disease Classification," IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 19, no. 2, pp. 1029-1040, 2022, doi: 10.1109/TCBB.2021.3074567.

S. Liu et al., "Random Forest-Based Framework for Heart Disease Classification Using Clinical Data," IEEE J. Biomed. Health Inform., vol. 26, no. 5, pp. 2274-2285, 2022, doi: 10.1109/JBHI.2021.3134567.

J. Zhang and H. Wang, "ECG-Based Heart Failure Detection Using Optimized Random Forest Algorithm," IEEE Trans. Biomed. Eng., vol. 69, no. 4, pp. 1478-1489, 2022, doi: 10.1109/TBME.2021.3112345.

R. H. Laftah and K. H. K. Al-Saedi, "Machine Learning Techniques for Prediction of Heart Diseases," J. Al-Qadisiyah Comput. Sci. Math., vol. 16, no. 3, Sep. 2024, doi: 10.29304/jqcsm.2024.16.31646.

S. Patidar, D. Kumar, and D. Rukwal, "Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction," in Advanced Production and Industrial Engineering, 2022, pp. [page numbers], doi: 10.3233/ATDE220723..

D. Yewale, S. Patil, A. R. Date, and A. Nanthaamornphong, "Heart Disease Prediction Using Ensemble Methods, Genetic Algorithms, and Data Augmentation: A Preliminary Study," J. Robot. Control, vol. 6, no. 3, pp. [page numbers], 2025, doi: 10.18196/jrc.v6i3.25144..

A. Khemphila and V. Boonjing, "Heart Disease Classification Using Neural Network and Feature Selection," in Proc. 21st Int. Conf. Syst. Eng. (ICSEng), 2011, pp. [page numbers], doi: 10.1109/ICSEng.2011.80..

Admassu, T., International Journal of Informatics and Communication Technology (IJ-ICT) 10(3):225, DOI:10.11591/ijict.v10i3.pp225-230.

Downloads

Published

2025-06-30

How to Cite

Al Amin, I., Wibisono, S., Lestariningsih, E., & Lutfi M.A, M. (2025). Optimizing Heart Disease Prediction with Random Forest and Ensemble Methods. CogITo Smart Journal, 11(1), 80–90. https://doi.org/10.31154/cogito.v11i1.782.80-90