Optimal Strategy for Handling Unbalanced Medical Datasets: Performance Evaluation of K-NN Algorithm Using Sampling Techniques

Yulita Salim; Aulia Putri Utami; Abdul Rachman Manga’; Huzain Aziz; Fadhila Tangguh Admojo

doi:10.17977/um018v7i22024p176-186

Optimal Strategy for Handling Unbalanced Medical Datasets: Performance Evaluation of K-NN Algorithm Using Sampling Techniques

Yulita Salim, Aulia Putri Utami, Abdul Rachman Manga’, Huzain Aziz, Fadhila Tangguh Admojo

Abstract

This study addresses the critical role of medical image classification in enhancing healthcare effectiveness and tackling the challenges of imbalanced medical datasets. It focuses on optimizing classification performance by integrating Canny edge detection for segmentation and Hu-moment feature extraction and applying oversampling and undersampling techniques. Five diverse medical datasets were utilized, covering Alzheimer’s and Parkinson’s diseases, COVID-19, brain tumours, and lung cancer. The K-Nearest Neighbors (K-NN) algorithm was implemented to enhance classification accuracy, aiming to develop a more robust framework for medical image analysis. The evaluation, conducted using cross-validation, demonstrated notable improvements in key metrics. Specifically, oversampling significantly enhanced lung cancer detection accuracy, while undersampling contributed to balanced performance gains in the COVID-19 class. Metrics, including accuracy, precision, recall, and F1-score, provided insights into the model’s effectiveness. These findings highlight the positive impact of data balancing techniques on K-NN performance in imbalanced medical image classification. Continued research is essential to refine these techniques and improve medical diagnostics.

Full Text:

PDF

References

M. Li, Y. Jiang, Y. Zhang, and H. Zhu, “Medical image analysis using deep learning algorithms,” Front. Public Heal., vol. 11, Nov. 2023.

M. Khashei and N. Bakhtiarvand, “A novel discrete learning-based intelligent methodology for breast cancer classification purposes,” Artif. Intell. Med., vol. 139, p. 102492, May 2023.

B. M. de Andrade et al., “Grid Search Optimised Artificial Neural Network for Open Stope Stability Prediction,” Chem. Rev., vol. 32, no. 2, pp. 600–617, 2020.

T. Kumar, D. Kumar, and G. Singh, “Brain Tumour Classification Using Quantum Support Vector Machine Learning Algorithm,” IETE J. Res., vol. 70, no. 5, pp. 4815–4828, May 2024.

T. Huynh, A. Nibali, and Z. He, “Semi-supervised learning for medical image classification using imbalanced training data,” Comput. Methods Programs Biomed., vol. 216, p. 106628, Apr. 2022.

C. Tchito Tchapga et al., “Biomedical Image Classification in a Big Data Architecture Using Machine Learning Algorithms,” J. Healthc. Eng., vol. 2021, pp. 1–11, May 2021.

V. D. P. Jasti et al., “Computational Technique Based on Machine Learning and Image Processing for Medical Image Analysis of Breast Cancer Diagnosis,” Secur. Commun. Networks, vol. 2022, pp. 1–7, Mar. 2022.

S. P. Morozov et al., “MosMedData: data set of 1110 chest CT scans performed during the COVID-19 epidemic,” Digit. Diagnostics, vol. 1, no. 1, pp. 49–59, Dec. 2020.

N. Biswas, K. M. M. Uddin, S. T. Rikta, and S. K. Dey, “A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach,” Healthc. Anal., vol. 2, no. October, p. 100116, 2022.

C. Huang, Y. Li, C. C. Loy, and X. Tang, “Deep Imbalanced Learning for Face Recognition and Attribute Prediction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 11, pp. 2781–2794, Nov. 2020.

H. Bakiler and S. Güney, “Estimation of Concentration Values of Different Gases Based on Long Short-Term Memory by Using Electronic Nose,” Biomed. Signal Process. Control, vol. 69, p. 102908, Aug. 2021.

G. Odongo, R. Musabe, and D. Hanyurwimfura, “A Multinomial DGA Classifier for Incipient Fault Detection in Oil-Impregnated Power Transformers,” Algorithms, vol. 14, no. 4, p. 128, Apr. 2021.

P. S. Singh, V. P. Singh, M. K. Pandey, and S. Karthikeyan, “Enhanced classification of hyperspectral images using improvised oversampling and undersampling techniques,” Int. J. Inf. Technol., vol. 14, no. 1, pp. 389–396, 2022.

N. Ahmed, A. Yigit, Z. Isik, and A. Alpkocak, “Identification of leukemia subtypes from microscopic images using convolutional neural network,” Diagnostics, vol. 9, no. 3, 2019.

A. T. Nagi, M. Javed Awan, R. Javed, and N. Ayesha, “A Comparison of Two-Stage Classifier Algorithm with Ensemble Techniques on Detection of Diabetic Retinopathy,” 2021 1st Int. Conf. Artif. Intell. Data Anal. CAIDA 2021, no. April, pp. 212–215, 2021.

K. Alpan, “Performance Evaluation of Classification Algorithms for Early Detection of Behavior Determinant Based Cervical Cancer,” ISMSIT 2021 - 5th Int. Symp. Multidiscip. Stud. Innov. Technol. Proc., no. October 2021, pp. 706–710, 2021.

M. Ozsagir, C. Erden, E. Bol, S. Sert, and A. Özocak, “Machine learning approaches for prediction of fine-grained soils liquefaction,” Comput. Geotech., vol. 152, p. 105014, Dec. 2022.

T. M. Fahrudin, P. A. Riyantoko, and K. M. Hindrayani, “Implementation of Big Data Analytics for Machine Learning Model Using Hadoop and Spark Environment on Resizing Iris Dataset,” in 2022 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2022, pp. 429–434.

E. A. Sekehravani, E. Babulak, and M. Masoodi, “Implementing canny edge detection algorithm for noisy image,” Bull. Electr. Eng. Informatics, vol. 9, no. 4, pp. 1404–1410, 2020.

H. Sharma and N. Kanwal, “Video interframe forgery detection: Classification, technique & new dataset,” J. Comput. Secur., vol. 29, no. 5, pp. 531–550, Aug. 2021.

H. Liu et al., “Lift-drag characteristics of S-shaped hydrofoil under different cloud cavitation conditions,” Ocean Eng., vol. 278, p. 114374, Jun. 2023.

L. Dai, G. Liu, L. Huang, G. Xiao, Z. Xu, and J. Ruan, “Feature transfer method for infrared and visible image fusion via fuzzy lifting scheme,” Infrared Phys. Technol., vol. 114, p. 103621, May 2021.

T. Whasphutthisit and W. Jitsakul, “Ensemble Learning Approach for Enhanced Road Deaths Prediction,” in 2022 International Conference on Digital Government Technology and Innovation (DGTi-CON), Mar. 2022, pp. 15–19.

M. Anila and G. Pradeepini, “Diagnosis of Parkinson’s Disease Using Deep Neural Network Model,” in 2021 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Oct. 2021, pp. 1–7.

M. Açıkkar and S. Tokgöz, “Improving multi-class classification: scaled extensions of harmonic mean-based adaptive k-nearest neighbors,” Appl. Intell., vol. 55, no. 3, p. 168, Feb. 2025.

M. J. Mammoottil, L. J. Kulangara, A. S. Cherian, P. Mohandas, K. Hasikin, and M. Mahmud, “Detection of Breast Cancer from Five-View Thermal Images Using Convolutional Neural Networks,” J. Healthc. Eng., vol. 2022, pp. 1–15, Feb. 2022.

N. K. Kumar, G. S. Sindhu, D. K. Prashanthi, and A. S. Sulthana, “Analysis and Prediction of Cardio Vascular Disease using Machine Learning Classifiers,” 2020 6th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2020, no. April, pp. 15–21, 2020.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th Int. Conf. Inf. Commun. Syst. ICICS 2020, no. April, pp. 243–248, 2020.

P. Devan and N. Khare, “An efficient XGBoost–DNN-based classification model for network intrusion detection system,” Neural Computing and Applications. Springer, 2020.

H. Dickel, V. Podolskiy, and M. Gerndt, “Evaluation of Autoscaling Metrics for (stateful) IoT Gateways,” in 2019 IEEE 12th Conference on Service-Oriented Computing and Applications (SOCA), Nov. 2019, pp. 17–24.

A. N. Jaber, K. Moorthy, L. Machap, and S. Deris, “The importance of data classification using machine learning methods in microarray data,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 19, no. 2, p. 491, Apr. 2021.

M. S. B. A. Ghaffar et al., “Improving classification performance of four class FNIRS-BCI using Mel Frequency Cepstral Coefficients (MFCC),” Infrared Phys. Technol., vol. 112, p. 103589, Jan. 2021.

DOI: http://dx.doi.org/10.17977/um018v7i22024p176-186