Impact Analysis of Feature Selection Techniques on Cyberstalking Detection

Arvind Kumar Gautam*, Abhishek Bansal**
*-** Department of Computer Science, Indira Gandhi National Tribal University, Amarkantak, Madhya Pradesh, India.
Periodicity:October - December'2022
DOI : https://doi.org/10.26634/jip.9.4.19138

Abstract

Internet-based applications are making the habitual society and exploring new ways to perform online-based crimes. Numerous cybercriminals are engaged in the different platforms of the internet-based virtual world, carrying out cybercrimes according to predetermined and preplanned agendas. As technology advances, cyberstalking, cyberbullying, and other forms of cyber harassment are growing on social media, email, and other online platforms. Cyberstalking uses internet-based technology to harass, intimidate, and undermine individuals online with different approaches. In order to examine the impact of feature selection strategies for improving model performance, this paper proposes a machine learning-based cyberstalking detection model. The proposed model used the Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction method to extract features, and three distinct approaches, TF-IDF + Chi-Square Test, and TF-IDF + Information Gain, were used to select the different numbers of relevant features. In the cyberstalking detection model, a Support Vector Machine (SVM) was employed for classification purposes. Based on the SVM classifier's performance, each feature selection approach's impact on the various feature sets was assessed. According to experimental findings, the TF-IDF + Chi-Square Test outperformed other applied approaches and improved detection mode performance. Additionally, experimental findings demonstrate that the TFIDF + Chi-Square Test approach also performs better in a small collection of relevant features than other approaches that have been utilized.

Keywords

Cyberstalking, Cyberbullying, Machine Learning, TF-IDF, Support Vector Machine, Chi-Square, Information Gain, Feature Selection, Feature Extraction.

How to Cite this Article?

Gautam, A. K., and Bansal, A. (2022). Impact Analysis of Feature Selection Techniques on Cyberstalking Detection. i-manager’s Journal on Image Processing, 9(4), 21-34. https://doi.org/10.26634/jip.9.4.19138

References

[4]. Baer, M. (2010). Cyberstalking and the internet landscape we have constructed. Virginia Journal of Law & Technology, 15(154).
[6]. Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Cambridge university press.
[11]. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3, 1289-1305.
[14]. Gautam, A. K., & Bansal, A. (2021). A machine learning framework for detection and documentation of cyberstalking on non-spam email. The Journal of Oriental Research Madras, XCII-V, 41-50.
[15]. Gautam, A. K., & Bansal, A. (2022a). Performance analysis of supervised machine learning techniques for cyberstalking detection in social media. Journal of Theoretical and Applied Information Technology, 100(2), 449- 461.
[19]. Ghasem, Z., Frommholz, I., & Maple, C. (2015). Machine learning solutions for controlling cyberbullying and cyberstalking. Journal of Information Security Research, 6(2), 55-64.
[20]. He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in Neural Information Processing Systems, 507- 514.
[21]. Kadhim, A. I. (2018). An evaluation of preprocessing techniques for text classification. International Journal of Computer Science and Information Security (IJCSIS), 16(6), 22-32.
[25]. Mori, T. (2002). Information gain ratio as term weight: the case of summarization of IR results. In Coling 2002: The 19th International Conference on Computational Linguistics, 688-694.
[38]. Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. Data Classification: Algorithms and Applications, 37-64.
[39]. Tarmizi, N., Saee, S., & Ibrahim, D. H. A. (2020). Detecting the usage of vulgar words in cyberbully activities from Twitter. International Journal on Advanced Science, Engineering and Information Technology, 10(3), 1117-1122.
[42]. Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Preprocessing techniques for text mining-An overview. International Journal of Computer Science & Communication Networks, 5(1), 7-16.
[45]. Xu, Y., Jones, G. J., Li, J., Wang, B., & Sun, C. (2007). A study on mutual information-based feature selection for text categorization. Journal of Computational Information Systems, 3(3), 1007-1012.
[46]. Yang, Y., & Pedersen, J. O. (1997, July). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 412-420).
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.