The Effect of Resampling on Classifier Performance: an Empirical Study
Abstract
Full Text:
PDFReferences
F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf. Sci. (Ny)., vol. 513, pp. 429–441, Mar. 2020.
A. Ali-Gombe and E. Elyan, “MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network,” Neurocomputing, vol. 361, pp. 212–221, Oct. 2019.
U. Pujianto, “Random forest and novel under-sampling strategy for data imbalance in software defect prediction,” Int. J. Eng. Technol., vol. 7, no. 4, pp. 39–42, 2018.
T. Chen, Y. Lu, X. Fu, N. N. Sze, and H. Ding, “A resampling approach to disaggregate analysis of bus-involved crashes using panel data with excessive zeros,” Accid. Anal. Prev., vol. 164, p. 106496, Jan. 2022.
B. Mirzaei, B. Nikpour, and H. Nezamabadi-pour, “CDBH: A clustering and density-based hybrid approach for imbalanced data classification,” Expert Syst. Appl., vol. 164, p. 114035, Feb. 2021.
C. Zhang et al., “Over-Sampling Algorithm Based on VAE in Imbalanced Classification,” in Lecture Notes in Computer Science (LNISA,volume 10967), 2018, pp. 334–344.
J. Hancock, T. M. Khoshgoftaar, and J. M. Johnson, “The Effects of Random Undersampling for Big Data Medicare Fraud Detection,” in 2022 IEEE International Conference on Service-Oriented System Engineering (SOSE), Aug. 2022, pp. 141–146.
R. Zhou et al., “Prediction Model for Infectious Disease Health Literacy Based on Synthetic Minority Oversampling Technique Algorithm,” Comput. Math. Methods Med., vol. 2022, pp. 1–6, Mar. 2022.
G. Wang, J. Wang, and K. He, “Majority-to-minority resampling for boosting-based classification under imbalanced data,” Appl. Intell., vol. 53, no. 4, pp. 4541–4562, Feb. 2022.
M. Janicka, M. Lango, and J. Stefanowski, “Using Information on Class Interrelations to Improve Classification of Multiclass Imbalanced Data: A New Resampling Algorithm,” Int. J. Appl. Math. Comput. Sci., vol. 29, no. 4, pp. 769–781, Dec. 2019.
R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques,” IEEE Access, vol. 8, pp. 67899–67911, 2020.
S. Saeed, A. Abdullah, N. Z. Jhanjhi, M. Naqvi, and A. Nayyar, “New techniques for efficiently k-NN algorithm for brain tumor detection,” Multimed. Tools Appl., vol. 81, no. 13, pp. 18595–18616, May 2022.
H. Xu and Y. Chen, “A block padding approach in multidimensional dependency missing data,” Eng. Appl. Artif. Intell., vol. 120, p. 105929, Apr. 2023.
H. A. Abu Alfeilat et al., “Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review,” Big Data, vol. 7, no. 4, pp. 221–248, Dec. 2019.
J. Li, S. Fong, S. Hu, R. K. Wong, and S. Mohammed, “Similarity Majority Under-Sampling Technique for Easing Imbalanced Classification Problem,” in Communications in Computer and Information Science, 2018, pp. 3–23.
J. Fonseca, G. Douzas, and F. Bacao, “Improving Imbalanced Land Cover Classification with K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures,” Information, vol. 12, no. 7, p. 266, Jun. 2021.
Z. Shi, “Improving k-Nearest Neighbors Algorithm for Imbalanced Data Classification,” IOP Conf. Ser. Mater. Sci. Eng., vol. 719, no. 1, p. 012072, Jan. 2020.
N. Salmi and Z. Rustam, “Naïve Bayes Classifier Models for Predicting the Colon Cancer,” IOP Conf. Ser. Mater. Sci. Eng., vol. 546, no. 5, p. 052068, Jun. 2019.
S. Wahyuni, “Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree,” J. Phys. Conf. Ser., vol. 970, p. 012030, Mar. 2018.
T. Thomas, A. P. Vijayaraghavan, and S. Emmanuel, “Applications of Decision Trees,” in Machine Learning Approaches in Cyber Security Analytics, Singapore: Springer Singapore, 2020, pp. 157–184.
R. Benkercha and S. Moulahoum, “Fault detection and diagnosis based on C4.5 decision tree algorithm for grid connected PV system,” Sol. Energy, vol. 173, pp. 610–634, Oct. 2018.
G. S. Reddy and S. Chittineni, “Entropy based C4.5-SHO algorithm with information gain optimization in data mining,” PeerJ Comput. Sci., vol. 7, p. e424, Apr. 2021.
I. Gonzalez-Fernandez, M. A. Iglesias-Otero, M. Esteki, O. A. Moldes, J. C. Mejuto, and J. Simal-Gandara, “A critical review on the use of artificial neural networks in olive oil production, characterization and authentication,” Crit. Rev. Food Sci. Nutr., vol. 59, no. 12, pp. 1913–1926, Jul. 2019.
A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artif. Intell. Rev., vol. 53, no. 8, pp. 5455–5516, Dec. 2020.
X. Qi, G. Chen, Y. Li, X. Cheng, and C. Li, “Applying Neural-Network-Based Machine Learning to Additive Manufacturing: Current Applications, Challenges, and Future Perspectives,” Engineering, vol. 5, no. 4, pp. 721–729, Aug. 2019.
H. Dagdougui, F. Bagheri, H. Le, and L. Dessaint, “Neural network model for short-term and very-short-term load forecasting in district buildings,” Energy Build., vol. 203, p. 109408, Nov. 2019.
Y. Wu, R. Gao, and J. Yang, “Prediction of coal and gas outburst: A method based on the BP neural network optimized by GASA,” Process Saf. Environ. Prot., vol. 133, pp. 64–72, Jan. 2020.
J. C. R. Whittington and R. Bogacz, “Theories of Error Back-Propagation in the Brain,” Trends Cogn. Sci., vol. 23, no. 3, pp. 235–250, Mar. 2019.
D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min., vol. 14, no. 1, p. 13, Feb. 2021.
J. Miao and W. Zhu, “Precision–recall curve (PRC) classification trees,” Evol. Intell., vol. 15, no. 3, pp. 1545–1569, Sep. 2022.
G. Mahalle, O. Salunke, N. Kotkunde, A. K. Gupta, and S. K. Singh, “Neural network modeling for anisotropic mechanical properties and work hardening behavior of Inconel 718 alloy at elevated temperatures,” J. Mater. Res. Technol., vol. 8, no. 2, pp. 2130–2140, Apr. 2019.
DOI: http://dx.doi.org/10.17977/um018v5i12022p87-100
Refbacks
- There are currently no refbacks.
Copyright (c) 2022 Knowledge Engineering and Data Science
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.