Cloud Computing has revolutionized the Information and Communication Technology (ICT) industry by enabling ondemand provisioning of elastic computing resources on a pay-as-you-go basis. Resource Scheduling is a way of determining schedule on which activities should be performed. Resource scheduling is a complicated task in a Cloud environment because of heterogeneity of the computing resources. To allocate the best resource to a Cloud job is a tedious task and the problem of finding the best resource – job pair according to Cloud consumer application requirements is an optimization problem. The main goal of the Cloud scheduler is to schedule the resources effectively and efficiently. Dispersion, heterogeneity and uncertainty of resources bring challenges to resource allocation, which cannot be satisfied with traditional resource allocation policies in Cloud circumstances. In this research paper, the clustering based cost optimized resource scheduling technique has been proposed. In clustering based resource scheduling, classification of these workloads is done through k-means clustering algorithm by assigning the weights to the different quality attributes. The experimental results gathered through Cloud environment clearly demonstrate that the proposed technique has better performance for cost as compared to the existing resource scheduling technique.
Cloud Computing infrastructure is a group of integrated and networked hardware and software and an internet infrastructure. Cloud computing is different from Grid Computing. Cloud computing is defined as a type of computing that relies on sharing computing resources rather than having local servers or personal devices to handle applications. Ever ywhere there is cloud computing [1]. Cloud computing comes within center progress of grid computing, virtualization and web technologies. This computing permits for one of the efficient computing through central storage and memory. Any magazines, websites, radios or TV channels are opened by us,” cloud” will be definitely catches by everyone. As a service, cloud computing is made software more attractive. Hence, as compared to Grid, Cloud has an extra layer as Virtualization that acts as an execution and hosting environment for cloud-based application services. Resource Scheduling is a way of determining schedule on which activities should be performed. Resource scheduling is a key technology in cloud computing [2].
Cloud Computing has basically two parts, the first part is of Client Side and the second part is of Server Side. The Client Side requests to the Server and the Server responds to the Clients. The request from the client first goes to the Master Processor of the server side. The Master Processor have many Slave Processors, the Master processor sends that request to any one of the Slave Processor which is free at that time [8]. Simulation opens the possibility to evaluate the hypothesis prior to actual software development in an environment where one can reproduce tests. Simulation is required because it provides repeatable and controllable environment to test the services [3] .
The motivation behind this research work is to propose clustering based cost optimized resource scheduling technique in which classification of these workloads is done through k-means clustering algorithm by assigning the weights to the different quality attributes. In this research work, cost is considered as a QoS parameter which is optimized in the proposed technique using cloud environment. The rest of paper is structured as follows. Section 1 presents related work. Section 2 describes the proposed technique, and experimental results has been presented in section 4. Conclusions and future work is presented in the last section.
Mazandarani and Momeni [1] discussed that to implement large scale experiments, various scientific applications are casted in the form of workflows. Intensive computation and data requirements are required because of complication of scientific processes. According to the required QoS, scientific applications run on cloud. The new proposed algorithm is used to select execute plan established on QoS parameters’ time and cost. In workflow, ranks are given to the tasks by proposed algorithm. Cloud resources called QoS-aware Scientific Application Scheduling Algorithm (QSASA) is used to present the scientific workflow. QSASA algorithm is used to reduce the total cost of execution. Result is compared with Heterogeneous Earliest Finish Time (HEFT) algorithm which shows that QSASA improve cost by 15%.
Panchal and Kapoor [2] defined that Cloud computing is known as a dynamic service provider using very large scalable and virtualized resources over the Internet. These services include IaaS, PaaS, SaaS and DaaS (Data as a Service). VM allocation allows efficient sharing of virtual machines to available datacenters and these allocation policies help to evaluate and enhance the cloud performance.
Han et al. [3] declared that resource's geographic sharing, heterogeneity and dynamic management and scheduling in cloud computing environment difficult. Some scheduling strategies and QoS guided scheduling Sufferage min heuristic algorithm presents a new model known as QoS guided task scheduling model. The new model uses the division of tasks and resources into low and high level and gets better scheduling efficiency. Proposed scheme is compared with existing algorithm and shows that proposed scheme reduces the makespan.
Tripathy and Patra [5] designed a new method is designed to minimize the switching time, improve the resource utilization and also to improve the server performance and throughput. The method is based on scheduling the jobs in the cloud and to solve the drawbacks in the existing methods. Here the priority is assigning to the jobs which gives better performance to the computer and to minimize the waiting time and switching time.
Xu et al. [6] proposed Multiple QoS Constrained Scheduling Strategy of Multi-Workflows (MQMW). The existing algorithms are not used for multiple workflows. This strategy is used to schedule multiple workflows which are started at different time. This strategy is capable to increase success rate. The comparative experiments showed that this strategy produced better scheduling results than RANK_HYBD. Objective of the work is to improve the success rate of scheduling and reduce the cost of workflows. The new algorithm takes overall performance into the account instead of completion time.
Abdullaha and Othman [7] designed Divisible Load Theory (DLT) in cloud environment to minimize overall processing time. Divisible load theory is a methodology involving the linear and continuous modeling of partitionable computation and communication loads for parallel processing. Homogeneous processors are used for assigning the load fractions to processors by closedform solution. Real time job restrictions like machine failure, political concerns are not considered.
Calheiros et al. [9] highlighted the CloudSim Architecture. There are two policies of VM Allocation i.e. Time Shared and Space Shared. CloudSim evaluation is performed for comparing machines hosted with one and two data centers using time, memory and overall number of machines as parameters.“2 data centers are better” can be explained with the help of an efficient use of a multicore machine by Java.
Sukhpal et al. [10] proposed efficient Allocation algorithm of Virtual Machine in Cloud Computing Environment which uses all the combination of allocation sequence and chooses the allocation sequence on the basis of strength of allocation. Proposed model is implemented in JAVA using Net Beans IDE in Cloud Sim. The algorithm is tested for different sets of VM instance request and computing nodes. In this work, the experimental results show that proposed algorithm can improve resource utilization by efficient VM allocation. Sukhpal et al. [11] described that Cloud computing is a network-based environment that focuses on sharing computations. Cloud computing networks access to a shared pool of configurable networks, servers, storage, service, application. The paper discussed the advantages, disadvantages, characteristics, challenge, deployment model, cloud service model, cloud service provider & various application areas of cloud computing such as small & large scale (manufacturing, automation, television, broadcast, constructions industries), Geographical Information System (GIS). Sukhpal et al. [12] described that video streaming ser vices are implemented on cloud. Too many cloud providers are there because of the increased use of cloud platforms. Multi-cloud makes use of more than one data center. Objective of the paper is to use the closed-loop approach for cost as QoS. A new algorithm is proposed for cloud providers and data centers in multi-cloud environment. Different video service workloads are used for evaluating the performance of this algorithm. The algorithm is helpful when cloud server costs are different for different data centers. Sukhpal et al. [13] focused the trust in workflows. Conventional scheduling approaches focused only on QoS time and cost requirements. New approach uses two stages of scheduling: the macro multi-workflow scheduling and the micro single workflow scheduling. Time-sensitive, cost-sensitive and balance are three types of workflows. Verification of new model is done on Net Logo and shows that trust scheme gives better transaction success rate. A workflow scheduling simulation platform is also designed by using CloudSim. The following gaps identified from existing literature works [8] [14-22] are discussed below:
Cloud Service Providers (CSPs) need to ensure that sufficient amount of resources are provisioned to ensure that QoS requirements of Cloud Service Consumers (CSCs) such as cost and budget constraints are met. Therefore, CSPs need to ensure that these violations are avoided or minimized by dynamically provisioning the right amount of resources in a timely manner.
Dispersion, heterogeneity and uncertainty of resources brings challenges to resource allocation, which cannot be satisfied with traditional resource allocation policies in Cloud circumstances. Thus, there is a need to make Cloud services and Cloud-oriented applications efficient by taking care of these properties of the Cloud environment. Aim of resource scheduling is to allocate appropriate resources at the right time to the right workloads, so that applications can utilize the resources effectively. In other words, the amount of resources should be minimum for a workload to maintain a desirable level of service quality, or maximize throughput (or minimize workload completion time) of a workload. To address this problem, new solutions need to be developed.
Although there are few algorithms in the literature for heterogeneous resources, they usually require significant high scheduling costs and they may not deliver good quality schedules with lower cost. The numbers of tasks and resources are extremely huge in cloud computing environment; especially for big data applications, the problem of resource scheduling has become a major challenge. There is, so far, no dedicated scheduling algorithm for time and cost-constrained cloud workflows. On one hand, users are always concerned about the execution time of workflows. On other hand, users are normally sensitive to execution cost that becomes another major concern for cloud workflow applications. There is primary requirement for designing scheduling algorithms for instance-intensive cost-constrained cloud workflows.
The broad objectives of this research work are:
Cloudlets, Status, Data center ID, Virtual Machine ID, Start Time, Finish Time, cost are parameters used in both algorithms.
Figure 1 depicts the flow of process of execution of resources. Based on cloud consumer details, resources are allocated and executes the job. Information about job is collected from cloud consumer, then that job is analyzed. Scheduling technique is selected, after that resources are allocated and the job is executed. If actual execution cost (costmin) is less than threshold value (C ), t then continue execution otherwise reallocate the resources.
Figure 1. Process of Execution of Resources
It is used to compare the actual execution cost (Cmin) with threshold value (Ct ). If execution cost Cmin is less than threshold value of execution cost, then execution of job continues otherwise it will generate for analysis. Ct is the threshold value of execution cost and Cmin is actual execution cost.
K-Means follows the partitioned or non-hierarchical clustering approach. It involves partitioning the given data set into specific number groups called Clusters. Each cluster is associated with a centre point called centroid. Each point is assigned to a cluster with the closest centroid. The Initial centroid will be chosen randomly. The centroid is the mean of the points in the cluster. Euclidean distance is used to measure the closeness. K-Means generates different clusters in different runs. Proposed dynamic VM allocation algorithm using clustering is given below.
In FCFS algorithm, execution cost is calculated. Actual execution cost is compared with Threshold value. In Clustering algorithm, it involves partitioning the given data set into specific number groups called Clusters. Execution cost is more in FCFS as compared to the Clustering algorithm. Clustering helps service provider to increase availability of software services on the cloud computing environment suitable for cloud users’ demand and requirement. Clustering algorithm shows that the utilization of resources has been improved. Clustering algorithm performs better load balancing.
FCFS Algorithm
Clustering Algorithm
This section focuses on tools for setting Cloud environment, implementation of heterogeneous workload consolidation technique on Cloud Sim Toolkit and experimental results of this approach.
Cloud applications have different composition, configuration and deployment requirements. So, tool required to implement the workload consolidation technique on Cloud is described below.
CloudSim is an extensible simulation toolkit that enables modeling and simulation of Cloud computing systems and application provisioning environments. The CloudSim toolkit supports both system and behavior modeling of Cloud system components such as data centers, Virtual Machines (VMs) and resource provisioning policies. It implements generic application provisioning techniques that can be extended with ease and limited effort. Currently, it supports modeling and simulation of Cloud computing environments consisting of both single and internet worked Clouds (federation of Clouds). Moreover, it exposes custom interfaces for implementing policies and provisioning techniques for allocation of VMs under inter-networked Cloud computing scenarios. Cloud Sim offers the following novel features [4]:
In Figure 2, CloudSim package is initialized. Data Centers are created for execution of CloudSim. Brokers, virtual machines, cloudlets are created. Then simulation started. Brokers are responsible for negotiation between SaaS and Cloud providers. Cloudlet specifies the set of user requests i.e. application id, size of execution command.
Figure 2. CloudSim Life Cycle
The NetBeans IDE 8.0.1 is a modular, standards-based, Integrated Development Environment (IDE) written in the Java programming language [16]. The NetBeans IDE 8.0.1 project consists of an open source IDE and an application platform, which can be used as a generic framework to build any kind of application. The focus of NetBeans IDE 8.0.1 improves developer’s productivity through a smarter, faster editor, and the integration of all NetBeans products into one IDE. NetBeans IDE 8.0.1 is an Integrated Development Environment (IDE) for developing primarily with Java.
Table 1 shows the characteristics of resources and Cloudlets that has been used for all the experiments. User Cloud workloads as independent parallel applications are modeled which is computation intensive. Thus the data dependency among the Cloud workloads in the parallel applications is negligible. Each Cloud workload is parallel and is hence considered to be independent of any other Cloud workload.
Table 1. Scheduling Parameters and their Values
Figure 3 shows that result of FCFS technique based on simulation. The result shows the output in terms of cloudlet id, status, datacenter id, VM id, time, start time, finish time. Cost is the final output obtained from FCFS technique. Figure 4 shows the result of clustering technique based on simulation. The result shows the output in terms of cloudlet id, status, datacenter id, VM id, time, start time, finish time. Cost is the final output obtained from Clustering technique which is less as compared to FCFS technique. Figure 5 shows the comparison of cost of FCFS and Clustering Technique. X-axis represents both the techniques FCFS and clustering. Y-axis represents the execution cost ($). It clearly depicts that cost is reduced in case of Clustering as compare to FCFS. Clustering technique is more efficient as compared to FCFS.
Figure 6 shows the effect of change in number of resources submitted on execution time. X-axis represents the execution cost and Y-axis represents the number of resources. With increase in number of resources, execution time decreases. Execution time is less in case of FCFS technique as compared to the Clustering technique.
Figure 7 shows the Effect of change in size of resources submitted on Execution Cost. X-axis represents the size of resources and Y-axis represents the execution cost. With increase in size of resources, execution cost is increased more in FCFS technique as in case of clustering technique with increase in size of resources cost is also increased but cost is less in clustering as compare to FCFS technique.
Figure 8 shows the effect of change in number of resources submitted on Execution Cost. X-axis represents number of resources and Y-axis represents the execution cost. With increase in number of resources, the execution cost is increased more in FCFS technique and in case of clustering technique with increase in size of resources, cost is also increased but cost is less in clustering as compared to FCFS technique.
Cloud Computing and its vital characteristics have been discussed in this project. This project focuses on the resource scheduling challenges that Cloud Computing is facing today. Several resource scheduling algorithms are compared with respect to the Cloud workload as an answer for the dynamic scalability of resources. The different Cloud workloads and design patterns have been identified and analyzed. The key QoS requirements for every Cloud workload have been identified. The clustering of these Cloud workloads is done through K-Means Clustering Algorithm by assigning the appropriate weights to the different quality attributes. The results of clustering of Cloud workloads have been presented with the help of CloudSim tool. Further, workload based Cloud Framework is proposed and implemented in this research work. The experimental results gathered through CloudSim 3.0 clearly demonstrate that Clustering technique has a better performance in terms of execution cost as compared to FCFS scheduling algorithms.