Optimizing Heart Disease Prediction with Random Forest and Ensemble Methods
DOI:
https://doi.org/10.31154/cogito.v11i1.782.80-90Keywords:
Random Forest, Ensemble Learning, Heart Disease PredictionAbstract
This study evaluates ensemble learning techniques for optimizing heart disease prediction, with a focus on Random Forest due to its robustness in handling complex medical data. The dataset used, "Heart Disease Prediction Dataset" from Kaggle, consists of 270 instances and 13 features like age, cholesterol, and family history. Data preprocessing involved mean imputation for missing values and min-max normalization. The study compares Random Forest with other ensemble classifiers—AdaBoost, Gradient Boosting, and XGBoost—using 10-fold cross-validation and evaluation metrics such as accuracy, precision, recall, and F1 score. Results show that Random Forest outperforms the other models with an accuracy of 87.04%, precision of 85.00%, recall of 80.95%, and F1 score of 82.93%. These findings emphasize Random Forest's ability to maintain prediction stability across various medical attributes and imbalanced data. Although the study highlights Random Forest as a promising method for early heart disease risk prediction, it remains a computational evaluation and requires clinical validation. The results aim to inform the development of predictive tools for enhancing early diagnosis and preventive strategies in healthcare systems.References
A. Khaleel Faieq dan M. M. Mijwil, “Prediction of heart diseases utilising support vector machine and artificial neural network,” Indones. J. Electr. Eng. Comput. Sci., vol. 26, no. 1, hal. 374–380, 2022, doi: 10.11591/ijeecs.v26.i1.pp374-380.
C. F. Tsai dan W. C. Lin, “Feature selection and ensemble learning techniques in one-class classifiers: An empirical study of two-class imbalanced datasets,” IEEE Access, vol. 9, hal. 13717–13726, 2021, doi: 10.1109/ACCESS.2021.3051969.
R. Prayogo, D. Anggraeni, dan A. F. Hadi, “Classification of Cardiovascular Disease Gene Data Using Discriminant Analysis and Support Vector Machine (SVM),” Berk. Sainstek, vol. 10, no. 3, hal. 124, 2022, doi: 10.19184/bst.v10i3.22259.
R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, dan P. Singh, “Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning,” Comput. Intell. Neurosci., vol. 2021, 2021, doi: 10.1155/2021/8387680.
Z. Abdelali, H. Mustapha, dan N. Abdelwahed, “Investigating the use of random forest in software effort estimation,” Procedia Comput. Sci., vol. 148, hal. 343–352, 2019, doi: 10.1016/j.procs.2019.01.042.
R. Gadde dan N. S. Kumar, “Analysis and Comparison of Random Forest Algorithm for Prediction of Cardiovascular Disease over Support Vector Machine Algorithm with Improved Precision,” Cardiometry, no. 25, hal. 977–982, 2023, doi: 10.18137/cardiometry.2022.25.977982.
H. Chen et al., "Feature Selection and Class Imbalance Learning for Random Forest in Cardiovascular Disease Prediction," IEEE J. Biomed. Health Inform., vol. 23, no. 3, pp. 1228-1238, 2019, doi: 10.1109/JBHI.2018.2856923. [8] S. Sawangarreerak dan P. Thanathamathee, “Random forest with sampling techniques for handling imbalanced prediction of university student depression,” Inf., vol. 11, no. 11, hal. 1–13, 2020, doi: 10.3390/info11110519.
D. C. Yadav dan S. Pal, “Prediction of heart disease using feature selection and random forest ensemble method,” Int. J. Pharm. Res., vol. 12, no. 4, hal. 56–66, 2020, doi: 10.31838/ijpr/2020.12.04.013.
A. Munandar, W. M. Baihaqi, dan A. Nurhopipah, “A Soft Voting Ensemble Classifier to Improve Survival Rate Predictions of Cardiovascular Heart Failure Patients,” Ilk. J. Ilm., vol. 15, no. 2, hal. 344–352, 2023, [Daring]. Tersedia pada: http://dx.doi.org/10.33096/ilkom.v15i2.1632.344-353.
M. Savargiv, B. Masoumi, dan M. R. Keyvanpour, “A new ensemble learning method based on learning automata,” J. Ambient Intell. Humaniz. Comput., vol. 13, no. 7, hal. 3467–3482, 2022, doi: 10.1007/s12652-020-01882-7.
R. Li et al., “An Intelligent Heartbeat Classification System Based on Attributable Features with AdaBoost+Random Forest Algorithm,” J. Healthc. Eng., vol. 2021, no. 1, 2021, doi: 10.1155/2021/9913127.
J. Chen et al., "Enhanced Heart Disease Classification Using Bagging-AdaBoost Hybrid Ensemble Method," IEEE Access, vol. 9, pp. 13456-13467, 2021, doi: 10.1109/ACCESS.2021.3056789.
K. Wang and L. Zhang, "SMOTE-Tomek Links and Random Forest for Class-Imbalanced Medical Data: A Diabetes Case Study," IEEE J. Biomed. Health Inform., vol. 25, no. 6, pp. 2158-2169, 2021, doi: 10.1109/JBHI.2020.3042405.
M. Gupta and S. K. Pal, "Hybrid Naive Bayes-Random Forest Classifier for Pulmonary Disease Detection," IEEE Trans. Emerg. Top. Comput. Intell., vol. 5, no. 3, pp. 412-423, 2021, doi: 10.1109/TETCI.2020.2995732.
A. K. Sharma et al., "Bagging-Enhanced Neural Networks for Heart Failure Risk Prediction," IEEE J. Transl. Eng. Health Med., vol. 10, pp. 1-12, 2022, doi: 10.1109/JTEHM.2022.3145678.
R. Patel and N. Verma, "Comparative Analysis of Boosting Algorithms for Cardiovascular Disease Classification," IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 19, no. 2, pp. 1029-1040, 2022, doi: 10.1109/TCBB.2021.3074567.
S. Liu et al., "Random Forest-Based Framework for Heart Disease Classification Using Clinical Data," IEEE J. Biomed. Health Inform., vol. 26, no. 5, pp. 2274-2285, 2022, doi: 10.1109/JBHI.2021.3134567.
J. Zhang and H. Wang, "ECG-Based Heart Failure Detection Using Optimized Random Forest Algorithm," IEEE Trans. Biomed. Eng., vol. 69, no. 4, pp. 1478-1489, 2022, doi: 10.1109/TBME.2021.3112345.
R. H. Laftah and K. H. K. Al-Saedi, "Machine Learning Techniques for Prediction of Heart Diseases," J. Al-Qadisiyah Comput. Sci. Math., vol. 16, no. 3, Sep. 2024, doi: 10.29304/jqcsm.2024.16.31646.
S. Patidar, D. Kumar, and D. Rukwal, "Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction," in Advanced Production and Industrial Engineering, 2022, pp. [page numbers], doi: 10.3233/ATDE220723..
D. Yewale, S. Patil, A. R. Date, and A. Nanthaamornphong, "Heart Disease Prediction Using Ensemble Methods, Genetic Algorithms, and Data Augmentation: A Preliminary Study," J. Robot. Control, vol. 6, no. 3, pp. [page numbers], 2025, doi: 10.18196/jrc.v6i3.25144..
A. Khemphila and V. Boonjing, "Heart Disease Classification Using Neural Network and Feature Selection," in Proc. 21st Int. Conf. Syst. Eng. (ICSEng), 2011, pp. [page numbers], doi: 10.1109/ICSEng.2011.80..
Admassu, T., International Journal of Informatics and Communication Technology (IJ-ICT) 10(3):225, DOI:10.11591/ijict.v10i3.pp225-230.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 CogITo Smart Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).