Hate Speech Detection on Twitter using Machine Learning Techniques

Aastha Joshi*, Nirmal Gaud**
* Samrat Ashok Technological Institute, Vidisha, Madhya Pradesh, India.
** Department of Computer Science and Engineering, Samrat Ashok Technological Institute, Vidisha, Madhya Pradesh, India.
Periodicity:March - May'2022
DOI : https://doi.org/10.26634/jit.11.2.18919

Abstract

People can now create, post, and share content to connect with each other as social media platforms have grown in popularity. On the other hand, it has also become a forum for hatred and war. The rampant spread of hatred on social media has had a significant impact on society, dividing people into pros and cons on topics that govern the status of a person, place, community, and country. Hate speech on social media is difficult to recognize because messages contain paralinguistic signs, jumbled language, and poorly written content. Due to the lack of consensus on what constitutes hate speech and the lack of background information, it becomes more difficult to detect. Creating huge markup corpora with lots of relevant contexts is a difficult task. Even though scientists have found that hate is a problem on all social media platforms, there is no perfect method to detect accurately. The current state and complexity of the field, as well as the main algorithms, methodologies, and key characteristics used, are described in this paper. It has focused on the important areas that have been explored for hate speech detection and also applied machine learning algorithms to detect it.

Keywords

Hate Speech, Text Classification, Machine Learning, Bag of Words, Bigram, Social Media, Twitter, Facebook.

How to Cite this Article?

Joshi, A., and Gaud, N. (2022). Hate Speech Detection on Twitter using Machine Learning Techniques. i-manager’s Journal on Information Technology, 11(2), 1-9. https://doi.org/10.26634/jit.11.2.18919

References

[1]. Abacha, A. B., Chowdhury, M. F. M., Karanasiou, A., Mrabet, Y., Lavelli, A., & Zweigenbaum, P. (2015). Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug–drug interaction extraction and classification. Journal of Biomedical Informatics, 58, 122-132. https://doi.org/10.1016/j.jbi.2015.09.015
[2]. Burnap, P., & Williams, M. L. (2016). Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data science, 5, 1-15.
[3]. Cavnar, W. B., & Trenkle, J. M. (1994, April). N-grambased text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (Vol. 161175).
[4]. Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012, September). Detecting offensive language in social media to protect adolescent online safety. In 2012, International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, 71-80. https://doi.org/10.1109/SocialCom-PASSAT.2012.55
[5]. Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012, September). Detecting offensive language in social media to protect adolescent online safety. In 2012, International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing (pp. 71-80). IEEE. 10.1109/SocialCom-PASSAT.2012.55
[6]. Dinakar, K., Jones, B., Havasi, C., Lieberman, H., & Picard, R. (2012). Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(3), 1-30.
[7]. Dinakar, K., Reichart, R., & Lieberman, H. (2011). Modeling the detection of textual cyberbullying. In Proceedings of the International AAAI Conference on Web and Social Media, 5(3), 11-17.
[8]. Facebook. (2013). What does Facebook consider to be hate speech? Retrieved from https://www.facebook.com/help/135402139904490
[9]. FBI. (2015). 2015 Hate Crime Statistics. Retrieved from https://ucr.fbi.gov/hate-crime/
[10]. Gardner, M. W., & Dorling, S. R. (1998). Artificial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmospheric Environment, 32(14-15), 2627-2636. https://doi.org/10.1016/S1352-2310(97)00447-0
[11]. Gitari, N. D., Zuping, Z., Damien, H., & Long, J. (2015). A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering, 10(4), 215-230. http://doi.org/10.14257/ijmue.2015.10.4.21
[12]. Gitari, N. D., Zuping, Z., Damien, H., & Long, J. (2015). A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering, 10(4), 215-230.
[13]. Greevy, E., & Smeaton, A. F. (2004, July). Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 468-469). https://doi.org/10.1145/1008992.1009074
[14]. Joachims, T. (1998, April). Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (pp. 137-142). Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026683
[15]. Köffer, S., Riehle, D. M., Höhenberger, S., & Becker, J. (2018). Discussing the value of automatic hate speech detection in online debates. Multikonferenz Wirtschafts informatik (MKWI 2018): Data Driven X-Turning Data in Value, Leuphana, Germany.
[16]. Kovács, G., Alonso, P., & Saini, R. (2021). Challenges of hate speech detection in social media. SN Computer Science, 2(2), 1-15. https://doi.org/10.1007/s42979-021-00457-3
[17]. Kwok, I., & Wang, Y. (2013, June). Locate the hate: Detecting tweets against blacks. In Twenty-seventh AAAI conference on artificial intelligence.
[18]. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International Conference on Machine Learning (pp.1188-1196). PMLR.
[19]. Lewis, D. D. (1998, April). Naive (Bayes) at forty: The independence assumption in information retrieval. In European Conference on Machine Learning (pp. 4-15). Springer, Berlin, Heidelberg. https://doi.org/10.1007/Bfb0026666
[20]. Liu, S., & Forss, T. (2014, October). Combining Ngram based Similarity Analysis with Sentiment Analysis in Web Content Classification. In KDIR (pp. 530-537).
[21]. Malmasi, S., & Zampieri, M. (2017). Detecting hate speech in social media. arXiv preprint arXiv:1712.06427.
[22]. Mehdad, Y., & Tetreault, J. (2016, September). Do characters abuse more than words?. In Proceedings of the 17 Annual Meeting of the Special Interest Group on Discourse and Dialogue, 299-303.
[23]. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26.
[24]. Mujtaba, G., Shuib, L., Raj, R. G., Rajandram, R., & Shaikh, K. (2018). Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study. Journal of Forensic and Legal Medicine, 57, 41-50. https://doi.org/10.1016/j.jflm.2017.07.001
[25]. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016, April). Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (pp. 145-153). https://doi.org/10.1145/2872427.2883062
[26]. Ramos, J. (2003, December). Using TF-IDF to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, 242(1), 29-48.
[27]. Razavi, A. H., Inkpen, D., Uritsky, S., & Matwin, S. (2010, May). Offensive language detection using multilevel classification. In Canadian Conference on Artificial Intelligence, 16-27. https://doi.org/10.1007/978-3-642-13059-5_5
[28]. Schmidt, A., & Wiegand, M. (2017, April). A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, 1-10. http://doi.org/10.18653/v1/W17-1101
[29]. Schmidt, A., & Wiegand, M. (2017, April). A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (pp. 1-10). 10.18653/v1/W17-1101
[30]. Shaikh, S., & Doudpotta, S. M. (2019). Aspects based opinion mining for teacher and course evaluation. Sukkur IBA Journal of Computing and Mathematical Sciences, 3(1), 34-43. https://doi.org/10.30537/sjcms.v3i1.375
[31]. Sharma, S., Agrawal, S., & Shrivastava, M. (2018). Degree based classification of harmful speech using twitter data. arXiv preprint arXiv:1806.04197.
[32]. Sigurbergsson, G. I., & Derczynski, L. (2019). Offensive language and hate speech detection for Danish. arXiv preprint arXiv:1908.04531.
[33]. Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, B., & Daelemans, W. (2016). A dictionary-based approach to racism detection in dutch social media. arXiv preprint arXiv:1608.08738.
[34]. Twitter. (2017). The Twitter Rules. Retrieved from https://help.twitter.com/en/rules-and-policies/twitter-rules
[35]. Wenando, F. A., Adji, T. B., & Ardiyanto, I. (2017). Text classification to detect student level of understanding in prior knowledge activation process. Advanced Science Letters, 23(3), 2285-2287. https://doi.org/10.1166/asl.2017.8768
[36]. Wendling, M. (2015). 2015: The Year that Angry Won the Internet. Retrieved from http://www.bbc.com/news/blogs-trending-35111707
[37]. Xu, B., Guo, X., Ye, Y., & Cheng, J. (2012). An improved random forest classifier for text categorization. Journal of Computers, 7(12), 2913-2920.
[38]. Ying, C., Qi-Guang, M., Jia-Chen, L., & Lin, G. (2013). Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica, 39(6), 745-758. https://doi.org/10.1016/S1874-1029(13)60052-X
[39]. YouTube. (2017). Hate Speech. Retrieved from https://support.google.com/youtube/answer/2801939?hl=en
[40]. Zhang, M. L., & Zhou, Z. H. (2005, July). A k-nearest neighbor based algorithm for multi-label classification. In 2005, IEEE International Conference on Granular Computing (Vol. 2, pp. 718-721). IEEE. 10.1109/GRC.2005.1547385
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.