Abstract

As the cost of information processing and Internet accessibility falls, organizations are becoming gradually defenceless to potential cyber threats such as network intrusions. So, there exists a need to run secure and safe transactions through the use of Intrusion Detection Systems, authentication, firewall and other hardware and software solutions. The existing Intrusion Detection system abilities to be adapted are very limited. This makes them ineffective for new or unknown attacks detection or to be adapted to an evolutionary environment. Machine learning approaches offer a potential solution to adaptation and correctness problems in Intrusion detection.Some Intrusion Detection systems does not deal with real time high speed networks. The high false positive rate is another issue with existing intrusion detection systems.

In this paper, we present the machine learning approach for Intrusion Detection system which helps to reduce the false positive rates and increase the classification accuracy. We are going to train our system using the Real time data set using Naïve Bayes machine learning algorithm. The role of our system is to attempt to trap an adversary's attendance on a compromised network. Our System notices vulnerable packets that are trying to come into the network. We capture live packets and extract only the relevant header features.This improves the accuracy of the proposed system.Finally, using Naïve Based off-line trainer, we were able to achieve 90.2233 percent accuracy using Cross Validation of 10-fold and 76.6812 percent using supplied test dataset while maintaining 0.102 false positive rates.

Intrusion detection is a vital part of a complete security policy in information systems. Its function consists of analyzing information collected by security audit mechanisms in order to find possible attacks. In general, IDSs can be divided into two categories: Anomaly and Misuse (signature) detection based on their detection methods. Anomaly detection attempts to determine whether deviation from the established normal usage patterns can be identified as intrusions. On the other hand, misuse detection uses patterns of well-known attacks. IDS have three common problems: temporal complexity, correctness and adaptability. The temporal complexity problem results from the widespread quantity of data that the system must administer in order to notice the whole situation. False Positive rate and False Negative rate are commonly used to evaluate the precision of IDS. False Positive can be defined as alarms which are triggered from authentic activities. False Negatives are attacks, which are not noticed by the system. An Intrusion Detection system is more exact if it detects more attacks and gives few false alarms. In the case of misuse detection systems, security specialists must observe new attacks to add their corresponding signatures. In anomaly detection systems, human experts are necessary to define relevant attributes for defining the normal behavior. This leads us to the adaptability problem. The field of machine learning is concerned with the higherlevel question of how to construct computer programs that automatically study with experience. As machine learning algorithm aids to build detection model from training data automatically, this approach will save human effort from writing signature of attacks or stipulating the normal behavior of a user of the system. In many cases, the applicability of machine learning ideologies coincides with that for the statistical methods, while the former is attentive on building a model that improves its performance on the basis of previous results. Hence, Intrusion detection system using machine learning can spot new kinds of attacks from the typical features of system users and identify significant variation from the user's recognized behavior. Although to identify new attack could make it needed to use such systems for all circumstances, the major fault is their resource lavish nature.

In this paper, we present the real time anomaly based intrusion detection system using machine learning approach. The objective of this paper is to design easy, fast and reliable intrusion detection system software. The Naïve Bayes classifier is used for on-line training. The security mechanisms of a system are designed so as to prevent unauthorized access to system resources and data. The system is capable to detect intrusion attempts and take action for repairing the damage later.

Overview of paper: The rest of this paper is organized as follows. In Section 1, related works are discussed. Section 2 is dedicated to describe the proposed system architecture of intrusion detection. The last section is dedicated to discuss the results. Finally, the paper is concluded in conclusions Section.

We have reviewed various papers of researchers. The contribution of researchers has been discussed as below. Christine Dartigue, Hyun Ik Jang and Wensum Zeng (2009) have been into several research works on how Knowledge Development and Data mining (KDD) task can help improve Intrusion Detection Systems (IDSs): classification, sequential analysis, time series analysis, prediction, clustering, and association rules. Among these approaches, they were interested in leveraging feature selection for improving the detection of attacks that occur infrequently in the training data, and multiboosting for reducing both variance and bias. Sandhya Peddabachigari, Ajith Abraham and Johnson Thomas (2008) have proposed decision tree induction as one of the classification algorithms in data mining. The Classification algorithm is inductively learned to construct a model from the preclassified data set. Each data item is defined by values of the attributes. Classification may be viewed as mapping from a set of attributes to a particular class. The Decision tree classifies the given data item using the values of its attributes. G.Prashanth, V.Prashanth, P. Jayashree and N.Srinivasan (2008) have proposed Random Forests algorithm in NIDSs (Network Intrusion Detection Systems) to improve detection performance. To increase the rate of minority intrusion detection, they build the balanced dataset by over-sampling the minority classes and down-sampling the majority classes. Random forests can build patterns efficiently over the balanced dataset, which is much smaller than the original one. Using Random Forests algorithm, a model is generated using the training set, which is used to classify the test cases. Mehdi Moradi and Mohammad Zulkernine have proposed Neural Network based intrusion detection system to classify the normal and attack patterns.They applied the early stopping validation method which increased the generalization capability of the Neural Network and at the same time decreased the training time. It should be mentioned that the long training time of the neural network was mostly due to the huge number of training vectors of computation facilities. Amira Sayed A. Aziz,Mostafa A. Salama,Aboul ella Hassanien and Sanaa El-Ola Hanafi(2012) have proposed GA in Intrusion Detection system. It has been applied for intrusion detection since 1990's, and still being used up till the current time. The Minkowski distance function was applied to detect anomalies, against using the euclidean distance. The investigation was held using different values for the parameters used in the Genetic Algorithm to find those which can give better results. The system is basically an Intrusion Detection system which uses detectors generated by genetic algorithm combined with deterministiccrowding niching technique. Jonathan Palmer (2011) has proposed rule based Intrusion Detection. But rule based IDSs can sometimes be difficult and time consuming to maintain. This paper set out to determine if the KDD '99 data set was indeed suitable for this application. Salem Benferhat, Abdelhamid Boudjelida and Habiba Drias (2009) presented the most adapted Bayesian classification model for intrusion detection. They present several advantages due to their simple structure. Bayesian naïve networks construction is very simple; it is always easy to consider new scenarios. Inference is polynomial, while inference in Bayesian networks with general structures is known to be a hard problem. Amjad Hussain Bhat, Sabyasachi Patra, Dr. Debasish Jena(2013) have proposed Decision Tree based which is used to classify network connections into intrusion and normal data based on a labelled training dataset that helps it in building classification patterns. In the second anomaly detection part, they have used hybrid approach of NB Tree and Random forest algorithm. Upendra(2013) has applied C4.5 to implement IDS. The selected 7 attributes are used to build model for intrusion detection. From the result, it is observed that after applying the feature selection from 41 attributes to 11 and 7 attributes, the overall performance of C4.5 has increased their performance than NB. Mohan Banerjee, Roopali Soni (2013) researched on two learning algorithms of data mining i.e. K-means and Naive Bayes classifier. K-means is a clustering algorithm, which works to provide grouping of data sample on the basis of their similarities and dissimilarities. Naive Bayes classifier is a classification algorithm which correctly classifies the intrusion. The combination of these two algorithms is used in order to improve accuracy, precision rate and reduce the false positive rate. In this paper, they apply one of the efficient data mining algorithms called k-means clustering via Naïve Bayes classification for anomaly based network intrusion detection. It is observed that the proposed technique performs better in terms of Detection rate when applied to KDD'99 data sets compared to a Naïve Bayes based approach. Neethu B(2009) has proposed IDSs based on human experts, Intrusion Detection techniques using machine learning. Machine learning is a field of study which provides the computers with the ability of learning from previous experience. Machine learning is based heavily on the statistical analysis of data and some algorithms can use the patterns found in previous data to make decisions about new data. Dewan Md. Farid, Nouria Harbi, and Mohammad Zahidur Rahman(2010) have introduced a new hybrid learning algorithm for adaptive network intrusion Detection using Naive Bayesian classifier and ID3 algorithm, which analyzes the large volume of network data and considers the complex properties of attack behaviours to improve the performance of detection speed and detection accuracy. In this paper, they have concentrated on the development of the performance of Naïve Bayesian classifier and ID3 algorithm. It has been successfully tested that this hybrid algorithm minimized false positives, as well as maximized balance detection rates on the 5 classes of KDD99 benchmark dataset. Mrutyunjaya Panda and Manas Ranjan Patra (2007) have proposed a framework of NIDS based on Naïve Bayes algorithm. The framework builds the patterns of the network services over data sets labelled by the services. With the built patterns, the framework detects attacks in the datasets using the Naïve Bayes Classifier algorithm. Compared to the Neural network based approach, the proposed approach achieves higher detection rate, less consuming time and has low cost factor. D.P.Gaikwad and Dr.R.C.Thool (2010) have done survey on architecture taxonomy and product of IDS. They mention limitation of various IDS available in market that complete attack prevention is not realistically attainable due to the configuration and administration, system complexity, and abuse by user. They have discussed some aspects of IDS such as role of IDS, categories of IDS, modes of IDS. They have also discussed the general architecture, network parameter and architectural taxonomy. Various features of different IDSs such as Snort, MacAfee and Tripwire are discussed in detail.

The objective of this work is to make networking secure by designing easy, fast security mechanisms for intrusion detection system. The system is designed so as to prevent unauthorized access to system resources and data. The proposed intrusion detection system is reliable and robust. In this project, a new learning algorithm for network intrusion detection using Naive Bayesian classifier is presented, which performs balance detections and keeps false positives at acceptable level for different types of network attacks. This Real time Intrusion Detection System is a software that provides security to local network. In this project, a runtime database is created which consists of identity of packets. This identity field gives a detailed identity of the intruded malicious packet. For each incoming packet in a network, packet header is scanned for its structure against each record in database. If match is found then corresponding action is taken to address the intrusion. Through Intrusion detection system, goal of attaining network security can be fulfilled efficiently. Following sections describe the proposed intrusion detection system in detail.

2.1 Feature selection for off-line and on-line training

The NSL-KDD99 dataset is used to carry out off line training and measuring performance of Naïve Bayes. Feature selection is the pre-processing of dataset which is used to improve the performance of the classifier. In feature selection, subsets of relevant features are selected and irrelevant features are eliminated. Feature selection is essential to reduce dimension, boost generalization capability, accelerate learning and enhance model interpretation. More feature selection may produce the problem of lack of generalization, whereas less feature selection causes degradation in the level of classification quality. Out of the 41 features present in the packet header, we are going to select only few important features such asSource IP address, Destination IP address, Source address, Destination address, Packet length, Protocol type etc. This system can be trained on-line using real time live packet capturing method. On-line training is essential for updating the model to cope with new packets or new threads. The administrator can add the suspicious packet in database and can retrain the system. This approach of on-line training using live packet makes the system very robust and reliable. The Source IP address, Destination IP address, Source address, Destination address, Packet length, Protocol type, Flag, SYN, HOP LIMIT, RST, ACK and FIN etc. are the parameters considered for on-line training of the system.

2.2 System Architecture of the Proposed Intrusion Detection System

Rapid increase of internet caused the exploitations of vital resources in a network. Network security has become a major anxiety in computer network; it has given rise to the use of plans providing security to the network. Intrusion Detection System gives the efficient way to detect malicious activity in a network. We proposed the effective intrusion detection system to provide security of computers in network. The system consists of different modules. Each module is described in detail in this section. Figure 1 depicts the system architecture of the proposed intrusion detection system. This architecture is responsible for capturing live packets from network. The captured packets are passed to packet pre-processor module. Packet pre-processor module categorizes the captured packets according to protocols like TCP, UDP, HTTP, etc. These packets are then passed to intrusion detector module.

The intrusion detector checks for intrusion. If the packet is intruder, then the detector creates a log of attack and generates alarm. The functions of each module of architecture are described in short. Incoming packet module is directly connected to network to capture online packets and transfers to the packet scanner. Packet scanning is the important part of our system. After catching the packet, we use packet scanner module for the purpose of scanning the packets. Packet analyzer module is also called as Packet Sniffer. As data streams flow across the network, sniffer captures each packet and if needed decodes the packet, raw data showing the values of various fields in the packet and analyzes its content according to specification. Packet analyzer is specially used for analyzing the network problem and detecting network misuse by internal or external user. Signature module is used to define the signature for each packet. We can define the signature for each packet using the attribute value and parameter. After generating the signature, Data set generation module is used to collect the signature for the corresponding packet and generate the data set and it is used for labeling. Labeling module is used for defining the corresponding packet. Training model is used to train the system using Naïve Bayes classifier. Training model contains a set of trained data which is used to detect attack in the packet.Based on the training model, prediction module is used to make the prediction whether the packet is normal packet or abnormal packet. The last module, output is used to generate output in the form of alarm for abnormal, based on the prediction (i.e. normal or abnormal).

2.3.Data Flow Diagram of the system

A data flow diagram is a graphical representation that depicts information flow and the transforms that are applied as data move from input to output. The data flow diagram may be used to represent a system or software at any level of abstraction.In fact, DFDs may be partitioned into levels that represent increasing information flow and functional detail. Therefore, the DFD provides a mechanism for functional modelling as well as information flow modelling. Figure 2 shows the DFD which represents the entire software element as a single bubble with input and output data indicated by incoming and outgoing arrows, respectively.

The performance of off-line classifier used in this project is evaluated using the Cross validation method of 10-fold. The proposed system is evaluated in terms of False Positives, model building, ROC area and classification accuracy. The experimental results are listed in Table 1.The proposed intrusion detection system is designed and implemented to deal with on-line packets over network. The system is capable to deal with all types of attacks.

All simulated attacks were classified, according to the actions and goals of the attacker. Each attack type falls into one of the following four main categories:

It is observed that the system is capable to deal with all the above mentioned attack. The system is very user-friendly and easy to analyze the packets for future work. Figure 3 depicts the Graphical User Interface of the system.

According to the figure, we can easily start the system to display the on line captured packets. All anomaly packets are displayed in lower window of the system. The display shows the source and destination IP address of the anomaly packets. This information can be used for investigation or forensic purpose in future. The system also can help to ring the alarm for the administrator’s notice. The information about suspicious attack can be sent to administrator through e-mail of administrator.

In this paper, we have introduced the approaches of intrusion detection system in short. Two main approaches are used to seek intrusions trace: Misuse detection and Anomaly detection. Misuse detection approach uses attacks signatures’ knowledge. These systems are very precise to detect known attacks. Anomaly detection first defines a profile for normal traffic, and then checks deviations from this normal behavior. Many classification approaches try to construct an explicit function from a common set of features values to obtain instances labels. Several approaches based on statistical learning were proposed for intrusion detection. In this paper, the machine learning algorithm have been presented for online intrusion detection system. The Naïve Bayes classifier has several advantages due to their simple structure. It is very easy and simple to construct mode. The Naïve Bayes classifier is used to model attacks and their temporal evolution. The proposed intrusion detection system is real time and it is trained on-line. The user can directly add the new suspicious Packets to database for retraining the network. This approach adds the capability to deal with new unknown attacks over network. The facilities of displaying the details of suspicious packets and emailing it to administrator is added for making the system very interactive and user friendly. The system is off-line tested using Weka tool. The off-line accuracy of the classifier exhibits 90.2233 percentage on Cross Validation of 10- fold and 76.6812 percentage on Supplied test data.The proposed system also addresses some difficulties of data mining such as handling continuous attribute, dealing with missing attribute values, and reducing noise in training data. The on-line performance evaluation of the proposed system is very difficult.

Online Anomaly Based Intrusion Detection System Using Machine Learning

Abstract

Keywords :

Introduction

1. Related Work

2.Proposed Intrusion Detection System

2.1 Feature selection for off-line and on-line training

2.2 System Architecture of the Proposed Intrusion Detection System

2.3.Data Flow Diagram of the system

Results and Discussions

Conclusions

References