Hybrid Method for User Review Sentiment Categorization in ChatGPT Application Using N-Gram and Word2Vec Features

Husna Luthfiatun Nisa, Atina Ahdika

Abstract


The rapid development of Artificial Intelligence (AI) has significantly influenced nearly all aspects of life. One AI product widely used by people worldwide is the Chat Generative Pre-Training Transformer (ChatGPT), which can respond to questions conversationally. Although data indicates that the use of ChatGPT in Indonesia is less widespread than in other countries, a Populix survey reveals that half of the respondents have utilized ChatGPT, using AI more than once a month. This indicates its crucial role among the Indonesian population. ChatGPT is not limited to browsers; it is also available as a downloadable application on the Google Play Store. The ChatGPT application has garnered various user reviews, particularly those from Indonesia. Therefore, this research employs the Naïve Bayes Classifier and K-Means Clustering to classify sentiments and group user reviews of the ChatGPT application originating from Indonesia. The study utilizes TF-IDF and Word2Vec as feature extraction methods, combining various N-Gram in data preprocessing to consider the context of sequentially arranged words that may carry meaning. The best classification results are obtained from the trigram classification model, as indicated by precision, recall, and accuracy values of 0.99 each, along with an F1-score of 1. Clustering also yields positive results, with some overlapping, yet words within clusters exhibit high similarity. Categorization results suggest that user reviews of the ChatGPT application from Indonesia tend to be positive, expressing satisfaction impressions, providing feedback for feature development, and expressing hope for the continued availability of the accessible version of ChatGPT due to its remarkable benefits.

Full Text:

PDF

References


Binus University, “Sejarah Singkat Tentang Kecerdasan Buatan (Artificial Intelligence),” Binus University Graduate Program. Binus University Graduate Program, 2022.

C. M. Annur, “Survei: ChatGPT Jadi Aplikasi AI Paling Banyak Digunakan di Indonesia,” databoks. databoks, 2023.

F. Duarte, “Number of ChatGPT Users,” 2024.

E. Gregersen, “ChatGPT Software,” Britannica. Britannica, 2023.

M. Baygin, “Classification of Text Documents based on Naive Bayes using N-Gram Features,” 2018 Int. Conf. Artif. Intell. Data Process. IDAP 2018, pp. 1–5, 2019.

A. Solikhatun and E. Sugiharti, “Application of the Naïve Bayes Classifier Algorithm using N- Gram and Information Gain to Improve the Accuracy of Restaurant Review Sentiment Analysis,” J. Adv. Inf. Syst. Technol., vol. 2, no. 2, pp. 1–12, 2020.

I. E. Tiffani, “Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 1–7, 2020.

A. Z. Farmadiansyah, “Deteksi Surel Spam dan Non Spam Bahasa Indonesia Menggunakan Metode Naïve Bayes,” vol. 2, no. 2, pp. 1–5, 2021.

E. Hasibuan and E. A. Heriyanto, “Analisis Sentimen Pada Ulasan Aplikasi Amazon Shopping Di Google Play Store Menggunakan Naive Bayes Classifier,” J. Tek. dan Sci., vol. 1, no. 3, pp. 13–24, 2022.

A. Khan, D. Majumdar, and B. Mondal, “Machine Learning Approach to Sentiment Analysis from Movie Reviews Using Word2Vec,” Proc. Res. Appl. Artif. Intell., p. 532, 2020.

M. R. Nashrulloh, I. T. Julianto, and R. K. Muzaky, “Opinion Mining on Chat GPT based on Twitter Users,” J. Appl. Intell. Syst., vol. 8, no. 2, pp. 183–192, Jul. 2023.

A. P. Wibawa, M. G. A. Purnama, M. F. Akbar, and F. A. Dwiyanto, “Metode-metode Klasifikasi,” Pros. Semin. Ilmu Komput. dan Teknol. Inf., vol. 3, no. 1, p. 134, 2018.

V. Vargas-Calderón and J. E. Camargo, “Characterization of citizens using word2vec and latent topic analysis in a large set of tweets,” Cities, vol. 92, no. March, pp. 187–196, 2019.

F. Azmi, K. Utama, O. T. Gurning, and S. Ndraha, “Initial centroid optimization of k-means algorithm using cosine similarity,” J. Informatics Telecommun. Eng., vol. 3, no. 2, pp. 224–231, 2020.

J. Santoso, E. I. Setiawan, E. M. Yuniarno, M. Hariadi, and M. H. Purnomo, “Hybrid conditional random fields and k-means for named entity recognition on indonesian news documents,” Int. J. Intell. Eng. Syst., vol. 13, no. 3, pp. 233–245, 2020.

M. M. Haider, M. A. Hossin, H. R. Mahi, and H. Arif, “Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm,” IEEE Xplore, no. June, pp. 283–286, 2020.

M. M. J. Adnan, M. L. Hemmje, and M. A. Kaufmann, “Social Media Mining to Study Social User Group by Visualizing Tweet Clusters Using Word2Vec, PCA and K-Means,” CEUR Workshop Proc., vol. 2863, pp. 40–51, 2021.

A. Sandhu, A. Edara, F. Wajid, and A. Agrawala, “Temporal Analysis on Topics Using Word2Vec,” pp. 1–11, 2023.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf. Sci. (Ny)., vol. 622, pp. 178–210, Apr. 2023.

N. E. Aoumeur, Z. Li, and E. M. Alshari, “Improving the Polarity of Text through word2vec Embedding for Primary Classical Arabic Sentiment Analysis,” Neural Process. Lett., vol. 55, no. 3, pp. 2249–2264, 2023.

A. Zakaria and M. Siallagan, “Predicting Customer Satisfaction through Sentiment Analysis on Online Review,” Int. J. Curr. Sci. Res. Rev., vol. 06, no. 01, pp. 515–522, 2023.

S. Ballı and O. Karasoy, “Development of content‐based SMS classification application by using Word2Vec‐based feature extraction,” IET Softw., vol. 13, no. 4, pp. 295–304, Aug. 2019.

F. Zhang, “A Hybrid Structured Deep Neural Network with Word2Vec for Construction Accident Causes Classification,” Int. J. Constr. Manag., vol. 22, no. 6, pp. 1120–1140, 2022.

A. C. Mazari and A. Djeffal, “Sentiment Analysis of Algerian Dialect Using Machine Learning and Deep Learning with Word2vec,” Inform., vol. 46, no. 6, pp. 67–78, 2022.

R. Feldman and J. Sanger, The Text Mining Handbook. New York: Cambridge University Press, 2006.

M. Anandarajan, C. Hill, and T. Nolan, Practical Text Analytics, vol. 2. Cham: Springer International Publishing, 2019.

S. Pattanayak, Pro Deep Learning with TensorFlow. Berkeley, CA: Apress, 2017.

S. K. Thompson, Sampling, vol. 755. John Wiley & Sons, 2012.

F. Gorunescu, “Data Mining Techniques and Models,” 2011, pp. 185–317.

M. Kubat, An Introduction to Machine Learning. Cham: Springer International Publishing, 2017.

S. H. Haji, K. Jacksi, and R. M. Salah, “Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository,” 2022, pp. 239–252.

L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations,” Organ. Res. Methods, vol. 25, no. 1, pp. 114–146, 2022.

C. K. Aridas, S. Karlos, V. G. Kanas, N. Fazakis, and S. B. Kotsiantis, “Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers under Imbalanced Data Sets,” IEEE Access, vol. 8, pp. 2122–2133, 2020.

A. Somasundaram and S. Reddy, “Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance,” Neural Comput. Appl., vol. 31, no. S1, pp. 3–14, Jan. 2019.




DOI: http://dx.doi.org/10.17977/um018v7i12024p13-26

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Knowledge Engineering and Data Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Flag Counter

Creative Commons License


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats