Comparative Study of Various Machine Learning Algorithms for Tweet Classification

Umar Abubakar*, Sulaimon A. Bashir**, Muhammad Bashir Abdullahi***, Olawale S. Adebayo****
*,**,*** Department of Computer Science, Federal University of Technology, Minna, Nigeria.
**** Department of Cyber Security Science, Federal University of Technology, Minna, Nigeria.
Periodicity:December - February'2019
DOI : https://doi.org/10.26634/jcom.6.4.15722

Abstract

Twitter is a social networking platform that has become popular in recent years. It has become a versatile information dissemination tool used by individuals, businesses, celebrities, and news organizations. It allows users to share messages called tweets with one another. These messages can contain different types of information from personal opinions of users, advertisement of products belonging to all kinds of businesses to the news. Tweets can also contain messages that are racist, bigotry, offensive, and of extremist views as shown by research. Manual identification of such tweets is impossible as hundreds of millions of tweets are posted every day and hence a solution to automate the identification of these types of tweets through classification is required for the Twitter administrators or an intelligence and security analyst. This paper presents a comparative study of traditional machine learning algorithms and deep learning algorithms for the task of tweet classification to detect different categories of abusive languages with the aim to determine which algorithm performs best in detecting abusive language that is prevalent on social media. Two approaches for building feature vectors were explored. Feature vectors based on the bag-of-words method and feature vectors based on word embeddings. These two methods of feature representation were evaluated in this paper using tweet messages representing five abusive language categories. The experiments show that the deep learning algorithms trained with word embeddings outperformed all the other machine learning algorithms that were trained with feature vectors based on the bag-of-words approach.

Keywords

Social Media, Tweets Classification, Feature Extraction, Machine Learning, Artificial Neural Networks, Deep Learning

How to Cite this Article?

Abubakar, U., Bashir, S. A., Abdullahi, M. B., Adebayo, O. S.(2019 Comparative Study of Various Machine Learning Algorithms for Tweet Classification,i-manager's Journal on Computer Science, 6(4),12-24. https://doi.org/10.26634/jcom.6.4.15722

References

[1]. Ahmed, H., Razzaq, M. A., & Qamar, A. M. (2013, December). Prediction of popular tweets using Similarity Learning. In Emerging Technologies (ICET), 2013 IEEE 9th International Conference on (pp. 1-6). IEEE.
[2]. Alabbas, W., al-Khateeb, H. M., Mansour, A., Epiphaniou, G., & Frommholz, I. (2017, June). Classification of colloquial Arabic tweets in real-time to detect high-risk floods. In Social Media, Wearable and Web Analytics (Social Media), 2017 International Conference on (pp. 1-8). IEEE.
[3]. Aphinyanaphongs, Y., Ray, B., Statnikov, A., & Krebs, P. (2014, August). Text classification for automatic detection of alcohol use-related tweets: A feasibility study. In Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on (pp. 93-97). IEEE.
[4]. Deep Learning Tutorial (2015). LISA lab, University of Montreal.
[5]. Dey, R., & Salemt, F. M. (2017, August). Gate-variants of Gated Recurrent Unit (GRU) neural networks. In Circuits and Systems (MWSCAS), 2017 IEEE 60th International Midwest Symposium on (pp. 1597-1600). IEEE.
[6]. Lee, H. S., Lee, H. R., Park, J. U., & Han, Y. S. (2018). An abusive text detection system based on enhanced abusive and non-abusive word lists. Decision Support Systems, 113, 22-31.
[7]. Lundeqvist, E., & Svensson, M. (2017). Author profiling: A machine learning approach towards detecting gender, age and native language of users in social media. UPPSALA University, Department of Information Technology.
[8]. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[9]. O'Dea, B., Wan, S., Batterham, P. J., Calear, A. L., Paris, C., & Christensen, H. (2015). Detecting suicidality on Twitter. Internet Interventions, 2(2), 183-188.
[10]. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532-1543).
[11]. Semberecki, P. & Maciejewski, H. (2017). Deep learning methods for text classification of articles FedCSIC. ACSIS, 11, 357-360, Doi: 10.15439/2017F414.
[12]. Sureka, A., & Agarwal, S. (2014, September). Learning to classify hate and extremism promoting tweets. In Intelligence and Security Informatics Conference (JISIC), 2014 IEEE Joint (pp. 320-320). IEEE.
[13]. Tsapatsoulis, N., & Djouvas, C. (2017, July). Feature extraction for tweet classification: Do the humans perform better? In Semantic and Social Media Adaptation and Personalization (SMAP), 2017 12th International Workshop on (pp. 53-58). IEEE.
[14]. Twitter. (n.d). Retrieved from https://about.twitter.com/ company on February 6, 2018.
[15]. Twitter Streaming APIs. (n.d). Retrieved from https://dev.twitter.com/streaming/overview on January 3, 2018.
[16]. Uysal, A. K., & Murphey, Y. L. (2017, August). Sentiment classification: Feature selection based approaches versus deep learning. In Computer and Information Technology (CIT), 2017 IEEE International Conference on (pp. 23-30). IEEE.
[17]. Wan, Y., & Gao, Q. (2015, November). An ensemble sentiment classification system of twitter data for airline services analysis. In Data Mining Workshop (ICDMW), 2015 IEEE International Conference on (pp. 1318-1325). IEEE.
[18]. Wikarsa, L., & Thahir, S. N. (2015, November). A text mining application of emotion classifications of Twitter's users using Naïve Bayes method. In Wireless and Telematics (ICWT), 2015 1st International Conference on (pp. 1-6). IEEE.
[19]. Xianghui, Z., Yuangang, Y., Xiaoyi, W., & Zhan, Z. (2015, December). A classification method to detect if a Tweet will be popular in a very early stage. In Computing, Communication and Security (ICCCS), 2015 International Conference on (pp. 1-5). IEEE.
[20]. Zhang, Z., He, Q., Gao, J., & Ni, M. (2018). A deep learning approach for detecting traffic accidents from social media data. Transportation Research Part C: Emerging Technologies, 86, 580-596.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.