Can Multinomial Logistic Regression Predicts Research Group using Text Input?

Harits Ar Rosyid, Aulia Yahya Harindra Putra, Muhammad Iqbal Akbar, Felix Andika Dwiyanto

Abstract


While submitting proposals in SISINTA, students often confuse or falsely submit their proposals to the less relevant or incorrect research group. There are 13 research groups for the students to choose from. We proposed a text classification method to help students find the best research group based on the title and/or abstract. The stages in this study include data collection, preprocessing data, classification using Logistic Regression, and evaluation of the results. Three scenarios in research group classification are based on 1) title only, 2) abstract only, and 3) title and abstract. Based on the experiments, research group classification using title-only input is the best overall. This scenario gets the most optimal results with accuracy, precision, recall, and f1-score successively at 63.68%, 64.91%, 63.68%, and 63.46%. This result is sufficient to help students find the best research group based on the text titles. In addition, lecturers can comment more elaborately since the proposals are relevant to the research group’s scope.

Full Text:

PDF

References


H. A. Rosyid, U. Pujianto, and M. R. Yudhistira, “Classification of Lexile Level Reading Load Using the K-Means Clustering and Random Forest Method,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, pp. 139–146, May 2020.

M. Taddy, “Multinomial inverse regression for text analysis,” J. Am. Stat. Assoc., vol. 108, no. 503, pp. 755–770, 2013.

H. Chai, Y. Liang, S. Wang, and H. Shen, “A novel logistic regression model combining semi-supervised learning and active learning for disease classification,” Sci. Rep., vol. 8, no. 1, p. 13009, Aug. 2018.

W. P. Ramadhan, S. T. M. T. Astri Novianty, and S. T. M. T. Casi Setianingsih, “Sentiment analysis using multinomial logistic regression,” in 2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC), Sep. 2017, pp. 46–49.

S. A. Salloum, M. Al-Emran, A. A. Monem, and K. Shaalan, “Using Text Mining Techniques for Extracting Information from Research Articles,” in Studies in Computational Intelligence, 2018, pp. 373–397.

V. Dogra, A. Singh, S. Verma, Kavita, N. Z. Jhanjhi, and M. N. Talib, “Understanding of Data Preprocessing for Dimensionality Reduction Using Feature Selection Techniques in Text Classification,” in Intelligent Computing and Innovation on Data Science, 2021, pp. 455–464.

Y. HaCohen-Kerner, D. Miller, and Y. Yigal, “The influence of preprocessing on text classification using a bag-of-words representation,” PLoS One, vol. 15, no. 5, p. e0232525, May 2020.

P. F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews,” Procedia Comput. Sci., vol. 179, pp. 728–735, 2021.

J. Lever et al., “PGxMine: Text mining for curation of PharmGKB Jake,” Pac Symp Biocomput, no. 25, pp. 611–622, 2020.

S. Vijayaraghavan et al., “Fake News Detection with Different Models,” ArXiv, 2020.

ReLearn: A Robust Machine Learning Framework in Presence of Missing Data for Multimodal Stress Detection from Physiological Signals,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Nov. 2021, pp. 535–541.

P. R. Vishnu, P. Vinod, and S. Y. Yerima, “A Deep Learning Approach for Classifying Vulnerability Descriptions Using Self Attention Based Neural Network,” J. Netw. Syst. Manag., vol. 30, no. 1, p. 9, Jan. 2022.

H. Inoue, “Multi-Sample Dropout for Accelerated Training and Better Generalization,” ArXiv, 2019.

G. N. R Prasad Sr Asst professor, “Identification of Bloom’s Taxonomy level for the given Question paper using NLP Tokenization technique,” Turkish J. Comput. Math. Educ., vol. 12, no. 13, pp. 1872–1875, 2021.

Y. A. Alhaj, J. Xiang, D. Zhao, M. A. A. Al-Qaness, M. Abd Elaziz, and A. Dahou, “A Study of the Effects of Stemming Strategies on Arabic Document Classification,” IEEE Access, vol. 7, pp. 32664–32671, 2019.

M. Adriani, J. Asian, B. Nazief, S. M. M. Tahaghoghi, and H. E. Williams, “Stemming Indonesian,” ACM Trans. Asian Lang. Inf. Process., vol. 6, no. 4, pp. 1–33, Dec. 2007.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing For Student Complaint Document Classification Using Sastrawi,” IOP Conf. Ser. Mater. Sci. Eng., vol. 874, no. 1, p. 012017, Jun. 2020.

J. M.-T. Wu, G. Srivastava, J. C.-W. Lin, and Q. Teng, “A Multi-Threshold Ant Colony System-based Sanitization Model in Shared Medical Environments,” ACM Trans. Internet Technol., vol. 21, no. 2, pp. 1–26, Jun. 2021.

S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 25–29, Jul. 2018.

N. S. Mohd Nafis and S. Awang, “An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification,” IEEE Access, vol. 9, pp. 52177–52192, 2021.

M. Umer et al., “Scientific papers citation analysis using textual features and SMOTE resampling techniques,” Pattern Recognit. Lett., vol. 150, pp. 250–257, Oct. 2021.

G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” in 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), Mar. 2019, pp. 1–5.

B. H. Shekar and G. Dagnew, “Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data,” in 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Feb. 2019, pp. 1–8.

M. P. Geetha and D. Karthika Renuka, “Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model,” Int. J. Intell. Networks, vol. 2, pp. 64–69, 2021.

A. W. Pradana and M. Hayaty, “The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, pp. 375–380, Oct. 2019.

J. Jumadi, D. S. Maylawati, L. D. Pratiwi, and M. A. Ramdhani, “Comparison of Nazief-Adriani and Paice-Husk algorithm for Indonesian text stemming process,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1098, no. 3, p. 032044, Mar. 2021.




DOI: http://dx.doi.org/10.17977/um018v5i22022p150-159

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Knowledge Engineering and Data Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

Creative Commons License


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats