Sentiment Analysis and Topic Modeling on Arabic Twitter Data during Covid-19 Pandemic
DOI:
https://doi.org/10.47540/ijias.v2i1.432Keywords:
Latent Dirichlet Allocation, Sentiment Analysis, Topic Modeling, TwitterAbstract
Twitter Sentiment Analysis is the task of detecting opinions and sentiments in tweets using different algorithms. In our research work, we conducted a study to analyze and compare different Algorithms of Machine Learning (MLAs) for the classification task, and hence we collected 37 875 Moroccan tweets, during the COVID-19 pandemic, from 01 March 2020 to 28 June 2020. The analysis was done using six classification algorithms (Naive Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Decision Tree, Random Forest classifier) and considering Accuracy, Recall, Precision, and F-Score as evaluation parameters. Then we applied topic modeling over the three classified tweets categories (negative, positive, and neutral) using Latent Dirichlet Allocation (LDA) which is among the most effective approaches to extract discussed topics. As result, the logistic regression classifier gave the best predictions of sentiments with an accuracy of 68.80%.
References
Support Vector Machines [WWW Document], n.d. Scikit-Learn. URL https://scikit-learn/stable/modules/svm.html (accessed 1.3.22).
Ahuja, R., Chug, A., Kohli, S., Gupta, S., Ahuja, P. (2019). The Impact of Features Extraction on the Sentiment Analysis. Procedia Computer Science. 152, 341–348.
Al Amrani, Y., Lazaar, M., El Kadiri, K.E. (2018). Random Forest and Support Vector Machine-based Hybrid Approach to Sentiment Analysis. Procedia Computer Science. 127, 511–520.
Ansari, M.Z., Aziz, M.B., Siddiqui, M.O., Mehra, H., Singh, K.P. (2020). Analysis of Political Sentiment Orientations on Twitter. Procedia Computer Science. 167, 1821–1828.
Blei, D.M., n.d. Latent Dirichlet Allocation 30.
Devika, R., n.d. Comparative Study of Classifiers in Twitter Spam Dataset using Naive Bayes, KNN, And Random Forest 10.
Fatahillah, N.R., Suryati, P., Haryawan, C. (2017). Implementation of Naive Bayes classifier algorithm on social media (Twitter) to the teaching of Indonesian hate speech, in 2017 International Conference on Sustainable Information Engineering and Technology (SIET). Presented at the 2017 International Conference on Sustainable Information Engineering and Technology (SIET), IEEE, Malang, pp. 128–131.
Fitri, V.A., Andreswari, R., Hasibuan, M.A. (2019). Sentiment Analysis of Social Media Twitter with Case of Anti-LGBT Campaign in Indonesia using Naïve Bayes, Decision Tree, and Random Forest Algorithm. Procedia Computer Science. 161, 765–772.
For Academics - Sentiment140 - A Twitter Sentiment Analysis Tool [WWW Document], n.d. URL http://help.sentiment140.com/for-students (accessed 1.3.22).
Gensim • PyPI [WWW Document], n.d. URL https://pypi.org/project/gensim/ (accessed 1.3.22).
Hagen, L. (2018). Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models? Information Processing Management. 54, 1292–1307.
Hajjem, M., Latiri, C. (2017). Combining IR and LDA Topic Modeling for Filtering Microblogs. Procedia Computer Science. 112, 761–770.
Imandoust, S.B., Bolandraftar, M., 2013. Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background 3, 7.
Kelleher, J.D., Namee, B.M., D’Arcy, A., n.d. Fundamentals of Machine Learning For Predictive Data Analytics 31.
Lutfullaeva, M., Medvedeva, M., Komotskiy, E., Spasov, K. (2018). Optimization of Sentiment Analysis Methods for classifying text comments of bank customers. IFAC-PaperOnline. 51, 55–60.
Maheshwari, S., Shukla, S., Kumari, D. (2019). Twitter Opinion Mining Using Sentiment Analysis 10.
MongoDB: the application data platform | MongoDB [WWW Document], n.d. URL https://www.mongodb.com/ (accessed 1.3.22).
NLTK :: Natural Language Toolkit [WWW Document], n.d. URL https://www.nltk.org/ (accessed 1.3.22).
O’Callaghan, D., Greene, D., Carthy, J., Cunningham, P. (2015). An analysis of the coherence of descriptors in topic modeling. Expert System with Applications. 42, 5645–5657.
Ramage, D., Hall, D., Nallapati, R., Manning, C.D. (2009). Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora, in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 1 - EMNLP ’09. Presented at the 2009 Conference, Association for Computational Linguistics, Singapore, 248.
Ringsquandl, M., Petkociv, D., n.d. Analyzing Political Sentiment on Twitter 8.
Scikit-learn: machine learning in Python — scikit-learn 1.0.2 documenta-tion [WWW Document], n.d. URL https://scikit-learn.org/stable/ (ac-cessed 1.3.22).
Siddharth, S., Darsini, R., Sujithra, D.M., n.d. Sentiment Analysis on Twitter Data Using Machine Learning Algorithms In Python 15.
Smailovic, J., Grčar, M., Lavrac, N., Žnidaršic, M. (2014). Stream-based active learning for sentiment analysis in the financial domain. Information Sciences. 285, 181–203.
Tiwari, P., Yadav, P., Kumar, S., Mishra, B.K., Nguyen, G.N., Gochhayat, S.P., Singh, J., Prasad, M. (2019). Sentiment Analysis for Airlines Services Based on Twitter Dataset, Social Network Analytics. 149–162.
Tweepy [WWW Document], n.d. URL https://www.tweepy.org/ (ac-cessed 1.3.22).
Zhang, L., Hall, M., Bastola, D. (2018). Utilizing Twitter data for analysis of chemotherapy. International Journal of Medical Informatics. 120, 92–100.
Published
How to Cite
Issue
Section
Copyright (c) 2022 Nassera Habbat, Houda Anoun , Larbi Hassouni

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.