Cloud computing has become popular due to its numerous advantages, which include high scalability, flexibility, and low operational cost. It is a technology that gives access to shared pool of resources and services on pay per use and at minimum management effort over the internet. Because of its distributed nature, security has become a great concern to both cloud service provider and cloud users. That is why Cloud Intrusion Detection System (CIDS) has been widely used to the cloud computing setting, which detects and in some cases prevents intrusion. In this paper, the authors have proposed a conceptual framework that detects intrusion attacks within the cloud environment using Ant Lion Optimization (ALO) algorithm for feature selection and Bayesian Classifier. This framework is expected to detect cloud intrusion accurately at low computational cost and reduce false alert rate.
Cloud computing is becoming more popular among people because of its numerous advantages, such as scalability, flexibility, and low operational cost. It is an Information Technology (IT) paradigm that helps ubiquitous access to common pools of computing resources and services that can be quickly provided with minimal management effort, through the Internet. Cloud services rendered to cloud users are of three types (Mehmood, Shibli, Kanwal, & Masood, 2015; Latiff, 2017); Software-as-a-Service (SaaS), Infrastructure-as-a-Service (IaaS), and Platform-as-a-Service (PaaS). Applications are made available to user through the internet in SaaS. Examples are email management, and Google Docs. In IaaS, customers are allowed to have access to the entire Virtual Machine. Examples are the Amazon Web Services (AWS) and Amazon EC2. Finally, PaaS offers tools for development, deployment, and to run applications.
The main objective of cloud service provider is to effectively and efficiently utilize resources within the limit of Service Level Agreement (SLA) (Madni, Latiff, & Coulibaly, 2016). Resources on cloud are provisioned via internet for scientific and the use of resources at low cost. It provides a great reduction in the cost of installing and maintaining computing resources.
The cloud has become vulnerable to attacks both from outside and inside because of its distributed nature and enterprise is worried about the safety of their resources. Insider attack is an attacker within the cloud network who tries to gain access to cloud user's resources. They could be from the cloud provider side or cloud provider itself. External attack is an attacker outside the cloud network. An attacker who is able to perform attacks through DoS/DDoS attack, phishing attacks, etc. Attackers tend to explore the distributed nature of cloud to launch attack, which affects integrity, confidentiality, and availability of services rendered by cloud (Mahajan & Peddoju, 2017). Intrusion Detection System (IDS) is been installed in cloud computing environment to address the various attacks in cloud. It is defined as “the process of monitoring the events occurring in a computer system or network and analyzing them for signals of intrusion, which attempts to protect confidentiality, integrity, availability, or even bypassing the security mechanisms of a computer or network” (Nagar, Nanda, He, & Tan, 2017). These IDS can either be a software or hardware placed in strategic points in the system or network which detects intrusion automatically and prevents them from further attacks.
IDS can be categorized into two based on what it protects: Host Based Intrusion Detection System (HIDS) and Network Based Intrusion Detection System (NIDS). NIDS monitors network traffic to identify malicious activities while HIDS monitors host machine for activities like pattern of system call, file access, system logs to know if there is any malicious activities. Based on detection technique, it can be categorized into three: Signature or misuse Base Detection Technique, Anomaly based Detection Technique, and Hybrid based Detection Technique. Signature technique detects intrusion by matching observed signature to attack pattern in the database. This works effectively by detecting all attacks that their signatures are in the signature database. Conversely, unknown attacks will not be detected because such attacks are not in the attack database. Continuous update of the attack database is required for it to detect new attacks. Anomaly detection technique is designed to detect intrusion by identifying all activities that deviates from the normal pattern and considers them as threats. This has overcome the shortcomings of signature based technique because it has been able to identify unknown attacks including new attacks. However, there is a high probability of generating high false alarm rate in this type of technique.
False alarm rate is the ability of a system to trigger in intrusion alert when there is no intrusion. Hybrid detection technique is the combination of misuse detection technique and anomaly detection technique.
The major contributions of this research work are chronicled as shown below:
The goal of this research work is to put forward a conceptual framework of Ant Lion Optimization (ALO) - based feature selection technique for Cloud Intrusion Detection System (CIDS) using Bayesian Classifier.
A method for detecting intrusion in cloud using genetic algorithm was proposed in (Singh, Verma, Kulshrestha, & Katiyar, 2016). This method optimizes the network path for which data is transmitted thereby increasing the speed. This makes intrusion almost impossible. The authors compared four different algorithms that have been used to optimize network path (Aho-Corasick Algorithm, Split AC Algorithm, Genetic Algorithm, and Rabin Karp Algorithm), where genetic algorithm is found to be more effective. However, this method is only a network based system that prevents intrusion within the network. Hybridizing both NIDS and HIDS will provide a complete security. While, (Pratik & Madhu, 2013) presents a data mining based CIDS called “cloud intrusion detection system for masquerade attacks (DCIDSM)”. It has the following components: sensor, Extraction Translation and Loading (ETL), centralized data warehousing, automated rule generation, real-time and offline detection report and analysis, and automated alert. The CIDD dataset which is a masquerade type attack was used to test the authors’ system and it was able to detect malicious activities. However, their work only covers one type of attack which is masquerading attack and does not do real time detection. Meanwhile, (Aljurayban & Emam, 2015) investigates into IDS that works with different cloud layers and to detect traffics normal among monitored cloud traffics. It is called “Layered Intrusion Detection Framework (LIDF)”. LIDF uses feedforward ANN as a classification tool between normal and abnormal traffic. The proposed LIDF consist of a passive traffic capturing layer which transfers raw traffic to the reduction layer. The reduction layer filters the traffic and then passes the output to the detection engine which will now use ANN to select malicious behavior. If there is an abnormal behavior, the detection engine will be simulated to show the existence of abnormal behavior and then alerts an administrator. LIDF when tested with real traffic was able to detect normal and abnormal instances. It shows 80% and 100% in some cases. However, the framework cannot specify in terms of abnormal behavior the type of attack or intrusion.
In (Salek & Madani, 2016), an IDS architecture that is multilevel and based on different level of risk level identified for each cloud user is proposed. The authors state that if a risk level of a user is identified, a proper IDS will be chosen and activated on the users’ Virtual Machine (VM). The proposed IDS has the following agents: dispatchers agent used to identify risk level for each user, IDS manager agent does the performance and accuracy in detection rate of IDSs assigned to it, and rule-set manager, which automatically downloads from rule- set database, updated set of rules because the proposed architecture is signature based. Xen Virtual Machine Monitor (VMM) has been used to develop the simulated environment, CentOS as the operating system and snort 2.9.4.6 as the IDS. Experimental results have shown that the system has been able to decrease execution time and drop packet rate, with just a little reduction in accuracy. However, scenarios whereby an IDS is configured dynamically according to dynamic security level ought to have been considered. Also this system is not able to group together the services efficiently with the same security issues.
(Bhat, Patra, & Jena, 2013) presented a Machine Learning approach for the detection of intrusion in cloud’s Virtual Machines. In their work, the machine learning approaches: Naïve Bayes and random forest performs better in detecting intrusion in cloud’s Virtual Machine than the traditional and the extended Naïve Bayes method. It shows that the accuracy is high and false positive rate is low. However, this method focuses only on the Virtual Machine while other area of cloud needs to be considered for the deployment of cloud IDS. (Xing, Huang, Xu, Chung, & Khatkar, 2013) examines into an OpenFlow and Snort-based Intrusion Prevention System (IPS) called “SnorFlow” that is able to detect intrusion and provides preventive measures in cloud environment. They clearly pointed out that an IPS is preferred to the IDS for automatic action towards attackers. The authors state the challenges faced by current IPS to be latency, accuracy, and flexibility. They emphasize on the use of attack graph to choose the right countermeasures and reset the cloud’s virtual networking system to prevent intrusion. However, the alert interpreter module and rule generator need to be properly optimized by an optimization algorithm so that the alerts can be correlated and the network configured without breaking down all identified vulnerable service.
(Yassin, Udzir, Muda, Abdullah, & Abdullah, 2012) presented an intrusion detection service framework that is capable of identifying activities that are malicious in cloud. The framework generates alert and notifies an administrator accordingly if there is an intrusion attempt. The authors elaborate further that Cloud-Based Intrusion Detection Service (CBIDS) is composed of three major components which are: User Data Controller (UDC), Cloud Service Controller (CSC), and Cloud Intrusion Detection Component (CIDC). The CBIDS matches information from user with signature in the database then analyze the user through the user console. However, the proposed framework is focused only on signature based technique, which cannot identify zero-day or new attacks. It is also theoretical and has not been implemented.
(Rajendran, Muthukumar, & Nagarajan, 2015) suggested a systematic approach to Hybrid intrusion detection system in private cloud. They stressed the need for HIDS considering the fact that some IDS are either based on signature technique or anomaly technique and that combination of both techniques will increase the efficiency of the IDS. Its major characteristics include: dynamic nature, self-adaptive, scalability, and efficiency. This model had been executed by means of .Net framework as front end and SQL Server as back end to store information. It has been used in Microsoft Azure cloud environment and the dynamic characteristic, scalability, and self-adaptive property is achieved. Furthermore, the efficiency property is also achieved by detecting intrusion using both anomaly and signature technique. However, the proposed model is only for private cloud, where we have limited number of cloud users.
This framework will overcome some of the challenges faced in the existing approaches discussed above. It will first select best features using Ant Lion algorithm before the prediction between normal and abnormal behavior using Bayesian Classifier. This will not just improve detection rate, but also reduce false alarm rate as shown in Figure 1.
Figure 1. Cloud Attack Model
The cloud is threatened by two types of attacks: Internal and External attacks. Insider attack is an attacker within the cloud network who tries to gain access to cloud user's resources. They could be from the cloud provider side or cloud provider itself. External attack is an attacker outside the cloud network. An attacker who is able to perform attacks through DoS/DDoS attack, phishing attacks, etc.
Ant Lion Optimization (ALO) is a process, which is nature inspired (Mirjalili, 2015). The ALO system imitates the hunting style of antlions in nature. It creates a small circular pit by digging backwards into the sand, and patiently waits for its prey at the bottom. When an ant or other small insect falls into it, the hunter grabs it, pull it under the sand, and injects a special liquefying agent into its meal in order to consume it. The following are formulated as a set of condition for the whole process. The random walk of ants for every iteration is formulated in equation (1)
where cumsum represents the accumulative total, n is the max repetition, t is the phase of random walk, and r(t) is a stochastic function which has the value (1) if a random number is less than 0.5 and 0 otherwise.
Ants use random walk to update their position at each phase during optimization. Since each exploration space has a limit, nonetheless, equation (1) cannot be openly used for apprising position of ants. For us to be able to keep within the search space random walk, it will be normalized using equation (2):
where ai is the minimum of random walk of ith variable, di is the maximum of random walk in ith variable, cti is the minimum of ith variable tth at iteration, and dti is the maximum of ith variable at tth iteration.
The random walks of ants in search of a space are affected by antlions' traps modeled in equations (3) and (4).
where ct is the smallest of all variables at tth repetition, dt specifies the vector with the highest of all variables at tth repetition, cti is the lowest of all variables for ith ant, dti is the maximum of all variables for ith ant, and Antlionjt indicates the position of the nominated jth antlion at tth repetition.
The radius of ants’ random walk is reduced using equations (5) and (6) to effectively model the sliding of ant towards antlion.
When an ant is caught by the jaw of an antlion, it represents the end of hunt for the ant lion. For the authors to model this process, it is assumed that catching prey occurs when ants becomes fitter (goes inside sand) than its corresponding antlion. An antlion is expected to increase its own chances of catching new prey by changing its position to the latest position of the hunted ant. This process is modeled below as shown in equation (7):
where t represents the current iteration, Antlionjt shows the position of selected jth antlion at tth iteration, and Antjt indicates the position of ith ant at tth iteration.
Elitism is a vital property of swarm systems that lets them to sustain the best optimal solution(s) gotten at any phase of optimization method. Since elite is the fittest antlion, it would be able to affect the activities of all the ants throughout iterations. Consequently, it is presumed that all ants walk randomly round a nominated antlion by the roulette wheel and the elite instantaneously:
Hence, ALO algorithm is defined as (Mirjalili, 2015):
Initialize the first population of ants and antlions randomly Calculate the fitness of ants and antlions
Find the best antlions and assume it as the elite (determined optimum)
while the end criterion is not satisfied
for every ant
Select an antlion using Roulette wheel
Update c and d using equations Eqs. (5) and (6)
Create a random walk and normalize it using Eqs. (l) and (2)
Update the position of ant using Eq. (8)
end for
Calculate the fitness of all ants
Replace an antlion with its corresponding ant if it becomes fitter Eq. (7)
Update elite if an antlion becomes fitter than the elite
end while
Return elite
Bayesian Classifier is an arithmetic classifier that forecasts the possibility of a given network occurrence belonging to a specific class (normal or intrusion) (Shafi'I et al., 2017; Madni, Latiff, Abdullahi, & Usman, 2017; Latiff, Madni, & Abdullahi, 2018). To calculate the probability of an event for each class is shown below:
The proposed model as shown in Figure 2 consists of three major components, they are:
Figure 2. Architectural Framework for ALO-BC CIDS Model
This section presents a detailed summary of publicly available research dataset in CIDS. The authors have listed the reference papers, URLs, and number of instances for each dataset in Table 1.
Preliminary results of an initial experiment were obtained using BC algorithm without the ALO feature selection part. The KDD'99 dataset (CISDA, 2009; UNB, 2018) used in (Bhat et al., 2013; Idris & Abdulhamid, 2014) was utilized for the experiment. The dataset has a total of 494,021 instances. The simulation was done in a standard MATLab 7.13 R2011b data mining toolkit. Standard metrics were used for the measurement of the parametric performance as applied in (Madni, Latiff, & Coulibaly, 2017; Abdullahi & Ngadi, 2016; Latiff, Abdul-Salaam, & Madni, 2016). The results of the experiment are presented in Table 2.
Table 2. Initial Results using BC Algorithm
Basically, CIDS has helped in detection of intrusion in cloud environment. In this conceptual framework proposed, a feature selection technique is utilized in CIDS using ALO algorithm. The authors have also presented an architectural framework for ALO and Bayesian classifier to detect CIDS. A detailed summary of research datasets for CIDS for investigators use have been outlined.
This framework is expected to improve detection accuracy, reduce detection time, and affordable computational cost.
Developing a more detailed and holistic framework for Cloud Intrusion Detection system will be considered in the future. The authors will be doing a validation of the proposed conceptual framework in cloud computing environment to actually verify the expected outcome.
Issue of security in cloud computing has become a major concern and has reduced the rate of acceptance of cloud technology. That is why CIDS has been widely deployed in cloud to reduce the issue of cloud attacks.
In this research work, a best possible technique for cloud IDS is proposed. It uses Ant Lion Optimization technique for feature selection and Bayesian classifier for classification between normal and abnormal network traffic. It is expected to identify intrusion effectively at low computational cost with low false alarm rate.