Sentiment Analysis and Offensive Language Detection in Social Media

M. Shiny*
Department of Computer Science and Engineering, DMI College of Engineering, Tamil Nadu, India.
Periodicity:June - August'2022


Sentiment Analysis is a field of study that focuses on figuring out how to extract, identify, or otherwise describe emotions in units of written text. One of the most common tasks in sentiment analysis is finding the polarity of a person's feelings. There are many blog posts, tweets, and comments in Indian languages online these days. Sentiment analysis in Indian languages is a relatively new field, and research in this area is just beginning. There is a lot of offensive content on social media, which is a worry for businesses and government agencies. This paper presents the methodology of sentiment analysis and offensive language detection in social media.


Sentiment Analysis, Offensive Language, Support Vector Machines, Opinion, Dataset.

How to Cite this Article?

Shiny, M. (2022). Sentiment Analysis and Offensive Language Detection in Social Media. i-manager’s Journal on Computer Science, 10(2), 1-7.


[1]. Bannink, R., Broeren, S., van de Looij–Jansen, P. M., de Waart, F. G., &Raat, H. (2014). Cyber and traditional bullying victimization as a risk factor for mental health problems and suicidal ideation in adolescents. PloS One, 9(4), e94026.
[2]. Barbosa, L., & Feng, J. (2010, August). Robust sentiment detection on twitter from biased and noisy data. In Coling 2010: Posters, (pp. 36-44).
[3]. Bifet, A., & Frank, E. (2010, October). Sentiment knowledge discovery in twitter streaming data. In International Conference on Discovery Science, (pp. 1-15). Springer, Berlin, Heidelberg.
[4]. Bonanno, R. A., & Hymel, S. (2013). Cyber bullying and internalizing difficulties: Above and beyond the impact of traditional forms of bullying. Journal of Youth and Adolescence, 42(5), 685-697.
[5]. Bowker, J. (1997). The Oxford Dictionary of World Religions. Oxford University Press, USA.
[6]. Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017, May). Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512-515.
[7]. Gamallo, P., & Garcia, M. (2014, August). Citius: A naive-bayes strategy for sentiment analysis on english tweets. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), (pp. 171-175).
[8]. Gaydhani, A., Doma, V., Kendre, S., & Bhagwat, L. (2018). Detecting hate speech and offensive language on twitter using machine learning: An n-gram and TFIDF based approach. arXiv preprint arXiv:1809.08651.
[9]. Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(12), 2009.
[10]. Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2018, August). Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), (pp. 1-11).
[11]. Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2020, May). Evaluating aggression identification in social media. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, (pp. 1-5).
[12]. Lee, Y., Yoon, S., & Jung, K. (2018). Comparative studies of detecting abusive language on twitter. arXiv preprint arXiv:1808.10245.
[13]. Malmasi, S., &Zampieri, M. (2017). Detecting hate speech in social media. In Proceedings of Recent Advances in Natural Language Processing (RANLP), (pp. 467-472).
[14]. Malmasi, S., & Zampieri, M. (2018). Challenges in discriminating profanity from hate speech. Journal of Experimental & Theoretical Artificial Intelligence, 30(2), 187-202.
[15]. Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., & Patel, A. (2019, December). Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages. In Proceedings of the 11th Forum for Information Retrieval Evaluation, (pp. 14-17).
[16]. Mubarak, H., Darwish, K., & Magdy, W. (2017, August). Abusive language detection on Arabic social media. In Proceedings of the First Workshop on Abusive Language Online, (pp. 52-56).
[17]. Mubarak, H., Rashed, A., Darwish, K., Samih, Y., & Abdelali, A. (2020). Arabic offensive language on twitter: Analysis and experiments. arXiv preprint arXiv:2004.02192.
[18]. Pak, A., & Paroubek, P. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), (pp.1320-1326).
[19]. Pitenis, Z., Zampieri, M., & Ranasinghe, T. (2020). Offensive language identification in Greek. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), (pp. 5113–5119).
[20]. Rani, K.,& Satvika. (2016). Text categorization on multiple languages based on classification technique. International Journal of Computer Science and Information Technologies, 7(3), 1578-1581.
[21]. Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P. C., Carvalho, J. P., Oliveira, S., ...& Trancoso, I. (2019). Automatic cyberbullying detection: A systematic review. Computers in Human Behavior, 93, 333-345.
[22]. Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsuper vised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, (pp. 417–424).
[23]. Waseem, Z., & Hovy, D. (2016, June). Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop, (pp. 88-93).
[24]. Watanabe, H., Bouazizi, M., & Ohtsuki, T. (2018). Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access, 6, 13825-13835.
[25]. Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019). Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). arXiv preprint arXiv:1903.08983. arXiv.1903.08983
[26]. Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., ...&Çöltekin, Ç. (2020). SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). arXiv preprint arXiv:2006.07235.
[27]. Zhao, J., Liu, K., & Wang, G. (2008, October). Adding redundant features for CRFs-based sentence sentiment classification. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, (pp. 117-126).
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.