Performance of Ensemble Classification for Agricultural and Biological Science Journals with Scopus Index

Nastiti Susetyo Fanany Putri, Aji Prasetya Wibawa, Harits Ar Rosyid, Agung Bella Putra Utama, Wako Uriu

Abstract


The ensemble method is considered an advanced method in both prediction and classification. The application of this method is estimated to have a more optimal output than the previous classification method. This article aims to determine the ensemble's performance to classify journal quartiles. The subject of agriculture was chosen because Indonesia is an agricultural country, and the interest of researchers in this field shows a positive response. The data is downloaded through the Scimago Journal and Country Rank with the accumulation in 2020. Labels have four classes: Q1, Q2, Q3, and Q4. The ensemble applied is Boosting and Bagging with Decision Tree (DT) and Gaussian Naïve Bayes (GNB) algorithms compiled from 2144 instances. The Boosting meta-ensembles used are Adaboost and XGBoost. From this study, the Bagging Decision Tree has the highest accuracy score at 71.36, followed by XGBoost Decision Tree with 69.51. The third is XGBoost Gaussian Naïve Bayes with 68.82, Adaboost Decision Tree with 60.42, Adaboost Gaussian Naïve Bayes with 58.2, and Bagging Gaussian Naïve Bayes with 56.12 results. This paper shows that the Bagging Decision Tree is the ensemble method that works optimally in this subject classification. This result suggests that the ensemble method can still fail to produce an ideal outcome that approaches the SJR system.

Full Text:

PDF

References


D. Zhang and S. Lou, “The application research of neural network and BP algorithm in stock price pattern classification and prediction,” Futur. Gener. Comput. Syst., vol. 115, pp. 872–879, Feb. 2021.

Y. Zhang, Y. Wang, X.-Y. Liu, S. Mi, and M.-L. Zhang, “Large-scale multi-label classification using unknown streaming images,” Pattern Recognit., vol. 99, p. 107100, Mar. 2020.

J. Lin, H. Chen, S. Li, Y. Liu, X. Li, and B. Yu, “Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier,” Artif. Intell. Med., vol. 98, pp. 35–47, Jul. 2019.

G. Kaur, “A comparison of two hybrid ensemble techniques for network anomaly detection in spark distributed environment,” J. Inf. Secur. Appl., vol. 55, p. 102601, Dec. 2020.

F. Liu, M. Cai, L. Wang, and Y. Lu, “An Ensemble Model Based on Adaptive Noise Reducer and Over-Fitting Prevention LSTM for Multivariate Time Series Forecasting,” IEEE Access, vol. 7, pp. 26102–26115, 2019.

A. B. Shaik and S. Srinivasan, “A Brief Survey on Random Forest Ensembles in Classification Model,” 2019, pp. 253–260.

A. P. Wibawa, “International Journal Quartile Classification Using the K-Nearest Neighbor Method,” 2019.

A. P. Wibawa et al., “Naïve Bayes Classifier for Journal Quartile Classification,” Int. J. Recent Contrib. from Eng. Sci. IT, vol. 7, no. 2, p. 91, 2019.

M. J. Willemink et al., “Preparing Medical Imaging Data for Machine Learning,” Radiology, vol. 295, no. 1, pp. 4–15, Apr. 2020.

B. Sekeroglu, K. Dimililer, and K. Tuncal, “Student Performance Prediction and Classification Using Machine Learning Algorithms,” in Proceedings of the 2019 8th International Conference on Educational and Information Technology, Mar. 2019, pp. 7–11.

I. Cordón, J. Luengo, S. García, F. Herrera, and F. Charte, “Smartdata: Data preprocessing to achieve smart data in R,” Neurocomputing, vol. 360, pp. 1–13, Sep. 2019.

X. Shi, Y. D. Wong, M. Z.-F. Li, C. Palanisamy, and C. Chai, “A feature learning approach based on XGBoost for driving assessment and risk prediction,” Accid. Anal. Prev., vol. 129, pp. 170–179, Aug. 2019.

T. Wang, H. Ke, X. Zheng, K. Wang, A. K. Sangaiah, and A. Liu, “Big Data Cleaning Based on Mobile Edge Computing in Industrial Sensor-Cloud,” IEEE Trans. Ind. Informatics, vol. 16, no. 2, pp. 1321–1329, Feb. 2020.

Q. H. Nguyen et al., “Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil,” Math. Probl. Eng., vol. 2021, pp. 1–15, Feb. 2021.

R. Patgiri, H. Katari, R. Kumar, and D. Sharma, “Empirical Study on Malicious URL Detection Using Machine Learning,” 2019, pp. 380–388.

A. Mirbolouki, S. Heddam, K. Singh Parmar, S. Trajkovic, M. Mehraein, and O. Kisi, “Comparison of the advanced machine learning methods for better prediction accuracy of solar radiation using only temperature data: A case study,” Int. J. Energy Res., vol. 46, no. 3, pp. 2709–2736, Mar. 2022.

S. Ruuska, W. Hämäläinen, S. Kajava, M. Mughal, P. Matilainen, and J. Mononen, “Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle,” Behav. Processes, vol. 148, pp. 56–62, Mar. 2018.

J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Inf. Sci. (Ny)., vol. 507, pp. 772–794, Jan. 2020.

N. Khare et al., “SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid Classifier Model for Intrusion Detection,” Electronics, vol. 9, no. 4, p. 692, Apr. 2020.

O. S. Albahri et al., “Systematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: Taxonomy analysis, challenges, future solutions and methodological aspects,” J. Infect. Public Health, vol. 13, no. 10, pp. 1381–1396, Oct. 2020.

D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min., vol. 14, no. 1, p. 13, Feb. 2021.

M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” Aug. 2020.

P. Yariyan et al., “Improvement of Best First Decision Trees Using Bagging and Dagging Ensembles for Flood Probability Mapping,” Water Resour. Manag., vol. 34, no. 9, pp. 3037–3053, Jul. 2020.

A. M. Abdi, “Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data,” GIScience Remote Sens., vol. 57, no. 1, pp. 1–20, Jan. 2020.

S. Alelyani, “Stable bagging feature selection on medical data,” J. Big Data, vol. 8, no. 1, p. 11, Dec. 2021.

G. Pappalardo, S. Cafiso, A. Di Graziano, and A. Severino, “Decision Tree Method to Analyze the Performance of Lane Support Systems,” Sustainability, vol. 13, no. 2, p. 846, Jan. 2021.

S. Park, S.-Y. Hamm, and J. Kim, “Performance Evaluation of the GIS-Based Data-Mining Techniques Decision Tree, Random Forest, and Rotation Forest for Landslide Susceptibility Modeling,” Sustainability, vol. 11, no. 20, p. 5659, Oct. 2019.

B. Aronov, E. Ezra, and M. Sharir, “Testing Polynomials for Vanishing on Cartesian Products of Planar Point Sets: Collinearity Testing and Related Problems,” Mar. 2020.

H. Fujita and D. Cimr, “Decision support system for arrhythmia prediction using convolutional neural network structure without preprocessing,” Appl. Intell., vol. 49, no. 9, pp. 3383–3391, Sep. 2019.

M. Li and K. Liu, “Causality-Based Attribute Weighting via Information Flow and Genetic Algorithm for Naive Bayes Classifier,” IEEE Access, vol. 7, pp. 150630–150641, 2019.




DOI: http://dx.doi.org/10.17977/um018v5i22022p137-142

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Knowledge Engineering and Data Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

Creative Commons License


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats