Comparison of Naïve Bayes Algorithm and Decision Tree C4.5 for Hospital Readmission Diabetes Patients using HbA1c Measurement

Utomo Pujianto, Asa Luki Setiawan, Harits Ar Rosyid, Ali M. Mohammad Salah


Diabetes is a metabolic disorder disease in which the pancreas does not produce enough insulin or the body cannot use insulin produced effectively. The HbA1c examination, which measures the average glucose level of patients during the last 2-3 months, has become an important step to determine the condition of diabetic patients. Knowledge of the patient's condition can help medical staff to predict the possibility of patient readmissions, namely the occurrence of a patient requiring hospitalization services back at the hospital. The ability to predict patient readmissions will ultimately help the hospital to calculate and manage the quality of patient care. This study compares the performance of the Naïve Bayes method and C4.5 Decision Tree in predicting readmissions of diabetic patients, especially patients who have undergone HbA1c examination. As part of this study we also compare the performance of the classification model from a number of scenarios involving a combination of preprocessing methods, namely Synthetic Minority Over-Sampling Technique (SMOTE) and Wrapper feature selection method, with both classification techniques. The scenario of C4.5 method combined with SMOTE and feature selection method produces the best performance in classifying readmissions of diabetic patients with an accuracy value of 82.74 %, precision value of 87.1 %, and recall value of 82.7 %.

Full Text:



G. E. Umpierrez, S. D. Isaacs, N. Bazargan, X. You, L. M. Thaler, and A. E. Kitabchi, “Hyperglycemia: An Independent Marker of In-Hospital Mortality in Patients with Undiagnosed Diabetes,” J Clin Endocrinol Metab, vol. 87, no. 3, pp. 978–982, Mar. 2002.

M. Dewi, “Resistensi Insulin Terkait Obesitas: Mekanisme Endokrin dan Intrinsik Sel,” Jurnal Gizi dan Pangan, vol. 2, no. 2, pp. 49–54, Jul. 2007.

H. Sonmez, V. Kambo, D. Avtanski, L. Lutsky, and L. Poretsky, “The Readmission Rates in Patients with versus those without Diabetes Mellitus at an Urban Teaching Hospital Journal of Diabetes and Its Complications,” Journal of Diabetes and Its Complications, no. October, 2017.

R. N. Fatimah, “Diabetes Melitus Tipe 2,” Jurnal Majority, vol. 4, no. 5, Jan. 2015.

J.-O. Jeppsson et al., “Approved IFCC Reference Method for the Measurement of HbA1c in Human Blood,” Clinical Chemistry and Laboratory Medicine, vol. 40, no. 1, pp. 78–89, 2005.

H. M. Krumholz et al., “Readmission After Hospitalization for Congestive Heart Failure Among Medicare Beneficiaries,” Arch Intern Med, vol. 157, no. 1, pp. 99–104, Jan. 1997.

D. Kansagara et al., “Risk Prediction Models for Hospital Readmission: A Systematic Review,” JAMA, vol. 306, no. 15, pp. 1688–1698, Oct. 2011.

M. Yusa, E. Utami, and E. T. Luthfi, “Analisis Komparatif Evaluasi Performa Algoritma Klasifikasi pada Readmisi Pasien Diabetes,” Jurnal Buana Informatika, vol. 7, no. 4, Oct. 2016.

J. Ren, S. D. Lee, X. Chen, B. Kao, R. Cheng, and D. Cheung, “Naive Bayes Classification of Uncertain Data,” in 2009 Ninth IEEE International Conference on Data Mining, 2009, pp. 944–949.

B. Strack et al., “Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records,” BioMed Research International, 2014. [Online]. Available: [Accessed: 22-Dec-2019].

D. W. Hosmer, “The Multiple Logistic Regression Model,” 2013.

J. E. Kolassa, “Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression,” Advances in Pure Mathematics, vol. 6, no. 5, pp. 331–341, Mar. 2016.

F. Tamin and N. M. S. Iswari, “Implementation of C4.5 algorithm to determine hospital readmission rate of diabetes patient,” in 2017 4th International Conference on New Media Studies (CONMEDIA), 2017, pp. 15–18.

B. Hssina, A. Merbouha, H. Ezzikouri, and M. Erritali, “A comparative study of decision tree ID3 and C4.5,” International Journal of Advanced Computer Science and Applications, vol. 4, no. 2, 2014.

S. Maldonado, J. López, and C. Vairetti, “An alternative SMOTE oversampling strategy for high-dimensional datasets,” Applied Soft Computing, vol. 76, pp. 380–389, Mar. 2019.

R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1, pp. 273–324, Dec. 1997.

J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms,” in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, USA, 2012, pp. 2951–2959.

M. D. Jaelani, A. P. Wibawa, and U. Pujianto, “Technology acceptance model of student ability and tendency classification system,” Bulletin of Social Informatics Theory and Application, vol. 2, no. 2, pp. 47–57, Dec. 2018.

H. A. Rosyid, M. Palmerlee, and K. Chen, “Deploying learning materials to game content for serious education game development: A case study,” Entertainment Computing, vol. 26, pp. 1–9, May 2018.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Nov. 2011.

J. Guedes and N. Kikuchi, “Preprocessing and postprocessing for materials based on the homogenization method with adaptive finite element methods,” Computer Methods in Applied Mechanics and Engineering, vol. 83, no. 2, pp. 143– 198, Oct. 1990.

R. Schmieder and R. Edwards, “Quality control and preprocessing of metagenomic datasets,” Bioinformatics, vol. 27, no. 6, pp. 863–864, Mar. 2011.

A. Riezka, Analisis dan Implementasi Data-Cleaning dengan Menggunakan Metode Multi-Pass Neighborhood (MPN). Universitas Telkom, 2011.

J. Twisk and W. de Vente, “Attrition in longitudinal studies: How to deal with missing data,” Journal of Clinical Epidemiology, vol. 55, no. 4, pp. 329–337, Apr. 2002.

J. B. Buse et al., “How Do We Define Cure of Diabetes?,” Diabetes Care, vol. 32, no. 11, pp. 2133–2135, Nov. 2009.

S. Visa, B. Ramsay, A. Ralescu, and E. VanDerKnaap, “Confusion Matrix-Based Feature Selection,” in Proceedings of the 22nd Midwest Artificial Intelligence and Cognitive Science Conference, MAICS 2011, USA, 2011, pp. 120–127.

R. Kohavi and H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 97, pp. 273–324, 2011.

A. Indriani, “Klasifikasi Data Forum dengan menggunakan Metode Naive Bayes Classifier,” Seminar Nasional Aplikasi Teknologi Informasi (SNATI), vol. 1, no. 1, Jun. 2014.

Y. Trisaputra, Indriyani, S. M. Biru, and M. Ervan, “Klasifikasi Profil Siswa SMA/SMK yang Masuk PTN (Perguruan Tinggi Negeri) dengan k-Nearest Neighbor,” ResearchGate, 2015. [Online]. Available: uruan_Tinggi_Negeri_dengan_k-Nearest_Neighbor. [Accessed: 22-Dec-2019].

T. R. Patil and S. S. Sherekar, “Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification,” International Journal Of Computer Science And Applications, vol. 6, no. 2, pp. 256–261, 2013.

D. Xhemali, C. J. Hinde, and R. G. Stone, “Naïve Bayes vs. Decision Trees vs. Neural Networks in the classification of training web pages,” International Journal of Computer Science Issues (IJCSI), vol. 4, no. 1, pp. 16–23, 2009.

M. Ridwan, H. Suyono, and M. Sarosa, “Penerapan Data Mining Untuk Evaluasi Kinerja Akademik Mahasiswa Menggunakan Algoritma Naive Bayes Classifier,” Jurnal EECCIS, vol. 7, no. 1, pp. 59–64, 2013.

D. L. Naik and R. Kiran, “Naïve Bayes classifier, multivariate linear regression and experimental testing for classification and characterization of wheat straw based on mechanical properties,” Industrial Crops and Products, vol. 112, pp. 434–448, Feb. 2018.

R. Al-Otaibi, R. B. C. Prudêncio, M. Kull, and P. A. Flach, “Versatile Decision Trees for Learning Over Multiple Contexts,” in Proceedings of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML PKDD) 2015, Portugal, 2015.

Chen Jin, Luo De-lin, and Mu Fen-xiang, “An improved ID3 decision tree algorithm,” in 2009 4th International Conference on Computer Science Education, 2009, pp. 127–130.

F. F. Harryanto and S. Hansun, “Penerapan Algoritma C4.5 untuk Memprediksi Penerimaan Calon Pegawai Baru di PT WISE,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 3, no. 2, pp. 95–103, 2017.



  • There are currently no refbacks.

Copyright (c) 2019 Knowledge Engineering and Data Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats