Intrusion Detection System Using Data Mining

Minakshi Sahu *  Brojo Kishore Mishra ** Susanta Kumar Das ***  Ashok Mishra ****
* Research Scholar, Department of Computer Science and Engineering, Centurion University of Technology and Management, Odisha, India.
** Associate Professor, Department of Information Technology, C.V. Raman College of Engineering, Bhubaneswar, Odisha, India.
*** Reader, P.G Department of Computer Science, Berhampur University, Odisha, India.
**** Professor, Department of Mathematics, Centurion University of Technology and Management, Odisha, India.

Abstract

Intrusion Detection system has become the main research focus in the area of information security. Last few years have witnessed a large variety of technique and model to provide increasingly efficient intrusion detection solutions. Traditional Network IDS are limited and do not provide a comprehensive solution for these serious problems which are causing many types of security breaches and IT service impacts. They search for potential malicious abnormal activities on the network traffics; and sometimes succeed to find true network attacks and anomalies (true positive). However, in many cases, systems fail to detect malicious network behaviors (false negative) or they fire alarms when there is nothing wrong in the network (false positive). In accumulation, they also require extensive and meticulous manual processing and interference. The authors advocate here applying Data Mining (DM) techniques on the network traffic data is a potential solution that helps in design and development of a better efficient intrusion detection system. Data mining methods have been used to build the automatic intrusion detection systems. The central idea is to utilize auditing programs to extract the set of features that describe each network connection or session, and apply data mining programs, to learn that capture intrusive and non-intrusive behavior. In this research paper, the authors are focusing on Data Mining based intrusion detection system.

Keywords :

Introduction

In recent years, as second line of security defense after firewall, the intrusion detection technique has got rapid development. It plays a very important role in attack detection, security check and network inspect. But with the continuous popularization of network application, the rapid broadening of network bandwidth and the rapid improvement, the problems of misjudgment, misdetection and lack of real-time response to attack that are inherent for the intrusion detection technique are becoming more and more thrown out, which has badly affected the practical value of intrusion detection product. The authors may analyze its root cause by the data resource of intrusion detection technique. There are two main data resource for the intrusion detection technique: network data packs for the network-based IDS (intrusion detection system) and system audit logs for the host-based IDS. The increasing speed of the former data traffic is greater than the one of processing capacity of IDS; the latter isn’t designed especially for IDS and its recorded characteristic variables can’t usually meet the need of IDS, more or less, and usually need complex processing algorithm for data mining. So enormous processing data and complicated signature selecting are the main problems that make the intrusion detection technique get into difficulties and directly affect the per formance and real-time characteristic of intrusion detection.

1. Intrusion Detection

An intrusion is an active sequence of related events that deliberately try to cause harm, such as rendering system unusable,accessing unauthorized information or manipulating such information. To record the information about both successful and unsuccessful attempts, the security professionals place the devices that examine the network traffic, called sensors. These sensors are kept in both front of the firewall (the unprotected area) and behind the firewall (the protected area) and values through comparing the information are recorded by the two.

An Intrusion Detection System (IDS) can be defined as a tool, method and resource to help identify, access and report unauthorized activity. Intrusion Detection is typically one part of an overall protection system that is installed around a system or device. IDS work at the network layer of the OSI model and sensors are placed at the choke points on the network. They analyze packets to find specific patterns in the network traffic, if they find such a pattern in the traffic, an alert is logged and a response can be based on data recorded.

1.1 The need for Intrusion Detection Systems

A computer system should provide confidentiality, integrity and assurance against denial of service. However, due to increased connectivity (especially on the Internet), and the vast spectrum of financial possibilities that are opening up, more and more systems are subject to attack by intruders. These subversion attempts try to exploit flaws in the operating system as well as in application programs and have resulted in spectacular incidents like the Internet Worm incident of 1988 [5].

There are two ways to handle subversion attempts. One way is to prevent subversion itself by building a completely secure system. The authors could, require all users to identify and authenticate themselves; they could protect data by various cryptographic methods and very tight access control mechanisms. However this is not really feasible because,

 

The authors see that they are stuck with systems that have vulnerabilities for a while to come. If there are attacks on a system, they would like to detect them as soon as possible (preferably in real-time) and take appropriate action. This is essentially what an Intrusion Detection System (IDS) does. An IDS does not usually take preventive measures when an attack is detected, it is a reactive rather than pro-active agent. It plays the role of an informant rather than a police officer.

The most popular way to detect intrusions has been by using the audit data generated by the operating system. An audit trail is a record of activities on a system that are logged to a file in chronologically sorted order. Since almost all activities are logged on a system, it is possible that a manual inspection of these logs would allow intrusions to be detected. However, the incredibly large sizes of audit data generated (on the order of 100 Megabytes a day) make manual analysis impossible. IDSs automate the drudgery of wading through the audit data jungle. Audit trails are particularly useful because they can be used to establish guilt of attackers, and they are often the only way to detect unauthorized but subversive user activity.

Many times, even after an attack has occurred, it is important to analyze the audit data so that the extent of damage can be determined, the tracking down of the attackers is facilitated, and steps may be taken to prevent such attacks in the future. An IDS can also be used to analyze audit data for such insights. This makes IDSs valuable as real-time as well as post-mortem analysis tools.

1.2 Classification of Intrusion Detection Systems

Intrusions can be divided into 6 main types [21]

 

However, The techniques of intrusion detection can be divided into two main types.

 

Figure 1. A Block diagram of a typical anomaly detection system

Figure 2. A Block diagram of a typical misuse detection system

2. Data Mining

In this information age, the authors believe that the information leads to power and success, and thanks to sophisticated technologies such as computers, satellites, etc., we have been collecting tremendous amounts of information. Initially, with the advent of computers and means for mass digital storage, we started collecting and storing all sorts of data, counting on the power of computers to help sort through this amalgam of information. Unfortunately, these massive collections of data stored on disparate structures very rapidly became overwhelming. This initial chaos has led to the creation of structured databases and Database Management Systems (DBMS). The efficient database management systems have been ver y important assets for management of a large corpus of data and especially for effective and efficient retrieval of particular information from a large collection whenever needed. The proliferation of database management systems has also contributed to recent massive gathering of all sorts of information. Today, we have far more information than they can handle: from business transactions and scientific data, to satellite pictures, text reports and military intelligence. Information retrieval is simply not enough anymore for decision-making. Confronted with huge collections of data, we have now created new needs to help us to make better managerial choices. These needs are automatic summarization of data, extraction of the “essence” of information stored, and the discovery of patterns in raw data.

With the enormous amount of data stored in files, databases, and other repositories, it is increasingly important, if not necessary, to develop a powerful means for analysis and perhaps interpretation of such data and for the extraction of interesting knowledge that could help in decision-making.

Data Mining, also popularly known as Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process. Figure 3 shows the data mining as a step in an iterative knowledge discovery process.

In principle, data mining is not specific to one type of media or data. Data mining should be applicable to any kind of information repository. However, algorithms and approaches may differ when applied to different types of data.

Figure 3. Data Mining is the core of Knowledge Discovery process

3. Intrusion Detection Using Data Mining Techniques

Many researchers have investigated the deployment of data mining algorithms and techniques for intrusion detection [1, 10-13, 17-20]. Examples of these techniques include [11]:

 

Conclusion

Intrusion Detection is still a fledgling field of research. However, it is beginning to assume enormous importance in today's computing environment. The combination of facts such as the unbridled growth of the Internet, the vast financial possibilities opening up in electronic trade, and the lack of truly secure systems make it an important and pertinent field of research. In this paper, the authors have presented Intrusion Detection Systems by applying Data mining techniques to efficiently detect the various types of network intrusions [9,22].

References

[1]. A. Chauhan, G. Mishra, and G. Kumar, (2011). “Survey on Data mining Techniques in Intrusion Detection”, International Journal of Scientific & Engineering Research Vol.2 Issue 7.
[2]. A. Sharma, A.K. Pujari, and K.K. Paliwal, (2007). "Intrusion detection using text processing techniques with a kernel based similarity measure", presented at Computers & Security, pp.488-495.
[3]. Barton P Miller, David Koski, Cjin Pheow Lee, Vivekananda Maganty, Ravi Murthy, Ajitkumar Natarajan, Jeff Steidl. (1995). “Fuzz Revisited: A Reexamination of the Reliability of UNIX Utilities and Services”. Computer Sciences Department, University of Wisconsin.
[4]. Dartigue, C., Hyun Ik Jang, Wenjun Zeng, (2009). “A New Data-Mining Based Approach for Network Intrusion  Detection”, 7th Annual Communication Networks and Services Research Conference (CNSR), 11-13 May.
[5]. Eugene H Spafford. (1989). “The Internet Worm Program: An Analysis”. In ACM Computer Communication Review, 19(1), pages 17-57, Jan.
[6]. G. Qian, S. Sural, Y. Gu, and S. Pramanik, (2004). "Similarity between Euclidean and cosine angle distance for nearest neighbor queries", in Proc. SAC, pp.1232- 1237.
[7]. Gudadhe, M., Prasad, P., Wankhade, K., “A new data mining based network Intrusion Detection model” International Conference on Computer and Communication Technology (ICCCT), 17-19 Sept
[8]. Guyon and A. Elisseeff, (2003). “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research 3, 1157-1182.
[9]. Jack Timofte and Praktiker Romania, (2007). “Securing the Organization with Network Performance Analysis”, Economy Informatics, 1-4.
[10]. Jiawei Han and. Micheline Kamber, (2011). “Data Mining: Concepts and Techniques”, Morgan Kufmann, 2nd edition, 3rd edition.
[11]. M. Hossain “Data Mining Approaches for Intrusion Detection : Issues and Research Directions ” , http://www.cse.msstate.edu/~bridges/papers/iasted. pdf.
[12]. Mohmood Husain, “Data Mining Approaches for Intrusion Detection: Issues and Research Directions”, Department of Computer Science, Mississippi State University, MS 39762, USA.
[13]. P. Dokas, L. Ertoz, V. Kumar, A. Lazaevic. J. Srivastava, and P. Tan, (2002). “Data Mining for Network Intrusion Detection”, http://minds.cs.umn.edu/papers/nsf_ngdm_ .pdf.
[14]. P. Kumar, M.V. Rao, P.R. Krishna, and R.S. Bapi, (2005). "Using Sub-sequence Information with kNN for Classification of Sequential Data", in Proc. ICDCIT, pp.536- 546.
[15]. P. Kumar, P.R. Krishna, B. S Raju and T. M Padmaja, (2008). “Advances in Classification of Sequence Data”, Data Mining and Knowledge Discovery Technologies. IGI Global, pp.143-174.
[16]. P. Kumar, R.S. Bapi, and P.R. Krishna, (2010). "A New Similarity Metric for Sequential Data", presented at IJDWM, pp.16-32.
[17]. S. Axelsson, (2000). “Intrusion Detection Systems: A Survey and Taxonomy”. Technical Report 99-15, Chalmers Univ. Marc h. http://citeseer.ist. psu .edu/viewdoc/summary?doi=1 0.1.1.1.6603.
[18]. S. Mukkamala et al. (2002). “Intrusion detection using neural networks and support vector machines”, IEEE IJCNN.
[19]. S. Terry Brugger, (2004). “Data Mining Methods for Network Intrusion detection”, University of California, Davis. http://www.mendeley.com/research/dataminingmethods- for-network-intrusion-detection/.
[20]. S.J. Stolfo, W. Lee. P. Chan, W. Fan and E. Eskin, (2001). “Data Mining – based Intrusion Detector: An overview of the Columbia IDS Project” ACM SIGMOD Records Vol. 30, Issue 4.
[21]. Steven E Smaha. (1988). Haystack: An Intrusion Detection System. In Fourth Aerospace Computer Security Applications Conference, pages 37-44, Tracor Applied Science Inc., Austin, Texas, December.
[22]. Weili Han, Dianxun Shuai and Yujun Liu, (2004). “Network Performance Analysis Based on a Computer Network Model”, Lecture Notes in Computer Science, Volume 3033/2004, 418-421, DOI: 10.1007/978-3-540- 24680-0_69.