Multilayer Perceptron for Classification of Website Phishing

Maheep Singh *  Roshni Tayal **
*-** Graduate Engineer, Department of Computer Science and Engineering, Central University, Bilaspur, Chhattisgarh, India.

Abstract

Today websites are used for various purposes. There is a crime named website phishing which comes under Cybercrime. A website phishing tries to steal your account password or other private information by misleading you into believing that you're on a legitimate website. Several conventional techniques for detecting phishing websites have been suggested to cope with this problem. One could even land on a phishing site by mistyping a URL. In this study, a Multilayer Perceptron Learning approach is used after applying 10-fold cross validation as a preprocessing for website phishing classification which gives almost 100% accuracy. The experimental results show that the performance of the multilayer perceptron learning classifiers improved the results up to a greater extent.

Keywords :

Introduction

The problem of Websites phishing has grown considerably in recent years, and phishing is considered as one of the most hazardous crimes, which may cause great and adverse effects on online business. It is used to obtain user-sensitive financial and personal information. In a Website phishing attack, the phisher creates a similar webpage which forces one to believe that it is a legitimate website. The phishing problem is considered as a vital issue in industry especially in the field of e-banking and e-commerce, because huge number of online transactions involving payments are made. Website phishing is considered to be a huge problem in web security as well (Dedakia and Mistry, 2015).

Most internet resources such as search engines, email services, social network sites, gaming services, banks, financial organizations, and telecommunication operator websites are highly favored by cybercriminals, and website phishing is one of the major cybercrimes among all the cybercrimes happened (Tan and Chiew, 2014).

Since today all the information are shared online. This gives chances for criminals to hack personal or financial information such as usernames, passwords, account numbers and national insurance numbers. In website phishing, the hacker creates a similar website which forces people to believe that it is the original website, and when the user perform some action on it, they will steal that information.

Several techniques were used for the classification of website phishing. In this study, a Multilayer Perceptron algorithm is used which shows a great increment in the accuracy (Panchal et al., 2011) . Phishing is a fraud technique that uses a mixture of social engineering and technology to gather sensitive and private information, such as passwords and credit card details by masquerading as a reliable person or business in an electronic communication. The fake websites are designed to mimic the look of a real company webpage. The phishing attackers use the trick by employing different social engineering tactics such as threatening, to append user accounts if they do not complete the account update process or provide other information to validate their accounts or some other reasons to get the users to visit their fake web pages.

Detecting fake websites is a critical step towards protecting online transactions. Website Phishing has a huge effect on the financial and other online commerce; detecting this attack is a significant step towards protecting against website phishing attacks. The inspiration behind developing this system is that people nowadays heavily depend on E-commerce websites for buying anything, and for sharing the information, they use private information like username, password, credit card details, and social security number.

To detect whether websites are Legitimate, Phishing or Suspicious, this system has been developed. Hence there is need for efficient mechanism, for the detection of website phishing. Website phishing is a very complex issue to understand and analyze, as it is a mixture of technical and social dynamics in which there is no known single silver bullet to solve it entirely. Despite the great quantity of applications available for website phishing detection, there are solutions that utilize machine learning mining techniques in detecting phishing websites. It increases the safety level but not fully, hence Multilayer Perceptron techniques is used which shows a great result and increases the accuracy.

Website phishing is a cybercrime attack which targets the user rather than the computer system. It is a slightly new Internet crime in contrast with other forms, such as, virus and hacking. The website phishing problem is a hard problem because of the fact that it is very easy for an attacker to create a fake website similar to the original one (Kadam and Dawar, 2013).

There are some techniques which use a pictorial form of dataset to detect phishing as shown in Figures 1 and 2 (Fatt and Chiew, 2014).

Figure 1. Pictorial Form-Real and Fake Webpage

Figure 2. Difference between Genuine and Fake Webpage and URL

1. Related Work

Several approaches were applied for the classification of website phishing, so far. The method proposed by Ali (2017) used a feature selection method with Supervised Machine Learning algorithm. The method proposed by Zhuang et al. (2012) used an ensemble classification algorithm which combines the predicted results from different classification algorithms of website phishing.

The method proposed by Singh et al. (2015) used feature selection techniques and dimensionality reduction approach for the classification of website phishing and obtained as accuracy of 97.5, by using Random forest. The method proposed by Blum et al. (2010) used a Lexical Feature Based Phishing URL Detection Using Online Learning. The method proposed by Basnet et al. (2008) used various types of machine learning classification techniques and achieved AUC of 98.13%.

The method proposed by James et al. (2013) also used machine learning algorithms. The method proposed byNaresh et al. (2013) use a Link Guard Algorithm for website phishing detection and prevention. The method proposed by Nguyen et al. (2013) used a Heuristic URLBased Approach for classification. The method proposed by Afroz and Greenstadt (2011) described profiles of trusted websites' appearances to detect website phishing.

The method proposed by Jo et al. (2010) in addition to blacklists, white lists and classifications used in the systems, considered websites' identity claims. In that, their phishing detection system mimics this human skilled behavior. Given a website, their system learns the identity that this website claims, and figures out the textual relevance between the claimed identity and other features in the website. Their phishing detection system then uses this textual relevance as one of the features for classification of the websites.

Kim et al. (2013) proposed a two-phase, score-based approach to detect a phishing website. The first phase checks the format of the requested URL to detect whether it is suspicious. The second phase applies a series of validations to the URL by checking whether the specified domain name exists either in its blacklist or white list.

The method proposed by Ramanathan and Wechsler (2012) used Latent Dirichlet Allocation for extracting the features, and AdaBoost algorithm for the classification of website phishing.

In order to overcome the issue of website phishing, Layton et al. (2009) used a diff which takes the phishing websites and original websites as input and returns the differences between these two. The difference presents a new overview on the data that was previously unused and presents a novel way to increase the capability of clustering algorithms to find good, discrete and separated clusters within the data.

Shahriar and Zulkernine (2011) classified the existing works based on information sources. The classification would not only provide convenient information to develop new anti-phishing approaches or expand existing techniques, but also enable one’s understanding on the limitations of the existing techniques.

Aburrous et al. (2008) used Fuzzy Logic operators to characterize the website phishing factors, and they used indicators as fuzzy variables.

2. Methodology

First of all, dataset is collected, then 10-fold crossvalidation is applied on it and then, Multilayer perceptron algorithm is used for pattern classification. The performance measure is then evaluated in terms of accuracy as shown in Figure 3.

Figure 3. Flowchart of Methodology

2.1 Dataset

The dataset used is taken from UCI repository.1353 instances are collected from different sources as shown in Table 1. The PHP script was plugged with a browser and 548 legitimate websites out of 1353 websites were collected. There are 702 phishing URL's and 103 suspicious URL's.

Table 1. Dataset Representation

2.2 Multilayer Perceptron

Multilayer perceptrons (MLPs) are feed forward neural networks. It is trained by using the standard back propagation algorithm. Since it is a type of supervised network, it requires a desired response to be trained. They know how to transform an input data into a desired response, hence they are widely used for pattern classifications. This is perhaps the most popular network architecture in use today. In most of the neural network applications, MLPs perform very well. Important point to notice in Multilayer Perceptrons (MLP) is the choice of number of hidden layers, and the number of units in these layers and the model of MLP neural network is shown in Figure 4. MLPs are useful in research for their ability to solve problems stochastically. The perceptron equation is given below:

where, w denotes the vector of weights, x is the vector of inputs, b is the bias and is the activation function.

Figure 4. Model of Multilayer Perceptron Neural Network (Sonawane and Patil, 2014)

3. Results and Discussion

3.1 Performance Measures

The dataset contains a lot of features like URL length, URL domain, etc. and they are classified based on that features. In this paper, 10-fold cross-validation is used in preprocessing step and then Multilayer Perceptron algorithm is applied which is trained by back propagation network (Panchal et al., 2011), and an accuracy of 99.9% is obtained, which is far better from the previous methods used. In this work, Number of hidden layers=2, Number of hidden nodes=15, alpha= 0.10, verbose=0 and epoch=10000.

3.2 Comparison with Other Previous Methods

The expressed results show that it performs better than all other previously applied methods, which is tabulated in Table 2. It is also feasible as compared to other approaches. The comparison graph of accurcy is shown in Figure 5.

Table 2. Comparison of the Proposed Method with the Previous Methods

Figure 5. Comparison Graph of Accuracy

Conclusion

In this paper, a computational method is used for the classification of website phishing, and the experimental results show the feasibility of the proposed method clearly. Multilayer perceptron is more useful when number of instances is higher. The proposed method is more significant than the previous methods. Since website phishing is a huge problem, it requires a better technique to perform well on unseen dataset as compared to traditional classification techniques. In this paper, a 10- fold cross validation along with Multilayer perceptron is used, which shows a tremendous result of almost 100% accuracy. So, from this, it is clear that Multilayer perceptron is very much beneficial to that kind of dataset.

References

[1]. Aburrous, M., Hossain, M. A., Thabatah, F., & Dahal, K. (2008). Intelligent phishing website detection system using fuzzy techniques. In Information and Communication Technologies: From Theory to Applications, 2008. ICTTA 2008. 3rd International Conference on (pp. 1-6). IEEE.
[2]. Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). Predicting phishing websites using classification mining techniques with experimental case studies. In Information Technology: New Generations (ITNG), 2010 Seventh International Conference on (pp. 176-181). IEEE.
[3]. Afroz, S., & Greenstadt, R. (2011). Phishzoo: Detecting phishing websites by looking at them. In Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on (pp. 368-375). IEEE.
[4]. Ali, W. (2017). Phishing Website Detection based on Supervised Machine Learning with Wrapper Features Selection. International Journal of Advanced Computer Science and Applications, 8(9), 72-78.
[5]. Basnet, R., Mukkamala, S., & Sung, A. H. (2008). Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry (pp. 373-383). Springer, Berlin, Heidelberg.
[6]. Blum, A., Wardman, B., Solorio, T., & Warner, G. (2010). Lexical feature based phishing URL detection using online learning. In Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security (pp. 54-60). ACM.
[7]. Dedakia, M., & Mistry, K. (2015). Phishing detection using content based associative classification data mining. Journal of Engineering Computers & Applied Sciences (JECAS), 4(7), 209-214.
[8]. Fatt, J. C. S., & Chiew, K. L. (2014). Phishdentity: Leverage Website Favicon to Offset Polymorphic Phishing Website. In Availability, Reliability and Security (ARES), 2014 Ninth International Conference on (pp. 114-119). IEEE.
[9]. James, J., Sandhya, L., & Thomas, C. (2013). Detection of phishing URLs using machine learning techniques. In Control Communication and Computing (ICCC), 2013 International Conference on (pp. 304-309). IEEE.
[10]. Jo, I., Jung, E., & Yeom, H. Y. (2010). You're not who you claim to be: Website identity check for phishing th detection. In 2010 Proceedings of 19th International Conference on Computer Communications and Networks.
[11]. Kadam, A. S., & Pawar, S. S. (2013). Comparison of association rule mining with pruning and adaptive technique for classification of phishing dataset. Third International Conference on Computational Intelligence and Information Technology (CIIT 2013) 2013 (CP646), 61-67.
[12]. Kim, D., Achan, C., Baek, J., & Fisher, P. S. (2013). Implementation of framework to identify potential phishing websites. In Intelligence and Security Informatics (ISI), 2013 IEEE International Conference on (pp. 268- 268). IEEE.
[13]. Layton, R., Brown, S., & Watters, P. (2009). Using differencing to increase distinctiveness for phishing website clustering. In Ubiquitous, Autonomic and Trusted Computing, 2009. UIC-ATC'09. Symposia and Workshops on (pp. 488-492). IEEE.
[14]. Naresh, U., VidyaSagar, U., & Reddy, C. V. M. (2013). Intelligent phishing website detection and prevention system by using link guard algorithm. Proc. IOSR, 14(3), 28- 36.
[15]. Nguyen, L. A. T., To, B. L., Nguyen, H. K., & Nguyen, M. H. (2013). Detecting phishing websites: A heuristic URLbased approach. In Advanced Technologies for Communications (ATC), 2013 International Conference on (pp. 597-602). IEEE.
[16]. Panchal, G., Ganatra, A., Kosta, Y. P., & Panchal, D. (2011). Behaviour analysis of multilayer perceptrons with multiple hidden neurons and hidden layers. International Journal of Computer Theory and Engineering, 3(2), 332- 337.
[17]. Ramanathan, V., & Wechsler, H. (2012). Phishing Website detection using latent Dirichlet allocation and AdaBoost. In Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on (pp. 102-107). IEEE.
[18]. Shahriar, H., & Zulkernine, M. (2011). Information source-based classification of automatic phishing website detectors. In Applications and the Internet  (SAINT), 2011 IEEE/IPSJ 11th International Symposium on (pp. 190-195). IEEE.
[19]. Singh, P., Jain, N., & Maini, A. (2015). Investigating the effect of feature selection and dimensionality reduction on phishing website classification problem. In Next Generation Computing Technologies (NGCT), 2015 1st International Conference on (pp. 388-393). IEEE.
[20]. Sonawane, J. S., & Patil, D. R. (2014). Prediction of heart disease using multilayer perceptron neural network. In Information Communication and Embedded Systems (ICICES), 2014 International Conference on (pp. 1-6). IEEE.
[21]. Tan, C. L., & Chiew, K. L. (2014). Phishing website detection using URL-assisted brand name weighting system. In Intelligent Signal Processing and Communication Systems (ISPACS), 2014 International Symposium on (pp. 54-59). IEEE.
[22]. Zhuang, W., Jiang, Q., & Xiong, T. (2012). An intelligent anti-phishing strategy model for phishing website detection. In Distributed Computing Systems Workshops (ICDCSW), 2012 32nd International Conference on (pp. 51-56). IEEE.