Clustering based Cost Optimized Resource Scheduling Technique in Cloud Computing

Kamalpreet Kaur * Kanwalvir Singh Dhindsa **

* PG Scholar, Department of Computer Science and Engineering, BBSBEC, Fatehgarh Sahib, Punjab, India.

** Associate Professor, Department of Computer Science and Engineering, BBSBEC, Fatehgarh Sahib, Punjab, India.

Abstract

Cloud Computing has revolutionized the Information and Communication Technology (ICT) industry by enabling ondemand provisioning of elastic computing resources on a pay-as-you-go basis. Resource Scheduling is a way of determining schedule on which activities should be performed. Resource scheduling is a complicated task in a Cloud environment because of heterogeneity of the computing resources. To allocate the best resource to a Cloud job is a tedious task and the problem of finding the best resource – job pair according to Cloud consumer application requirements is an optimization problem. The main goal of the Cloud scheduler is to schedule the resources effectively and efficiently. Dispersion, heterogeneity and uncertainty of resources bring challenges to resource allocation, which cannot be satisfied with traditional resource allocation policies in Cloud circumstances. In this research paper, the clustering based cost optimized resource scheduling technique has been proposed. In clustering based resource scheduling, classification of these workloads is done through k-means clustering algorithm by assigning the weights to the different quality attributes. The experimental results gathered through Cloud environment clearly demonstrate that the proposed technique has better performance for cost as compared to the existing resource scheduling technique.

Keywords :

Cloud Computing,
Resource Scheduling,
Quality of Service (QoS),
Cost,
Clustering.

Introduction

Cloud Computing infrastructure is a group of integrated and networked hardware and software and an internet infrastructure. Cloud computing is different from Grid Computing. Cloud computing is defined as a type of computing that relies on sharing computing resources rather than having local servers or personal devices to handle applications. Ever ywhere there is cloud computing [1]. Cloud computing comes within center progress of grid computing, virtualization and web technologies. This computing permits for one of the efficient computing through central storage and memory. Any magazines, websites, radios or TV channels are opened by us,” cloud” will be definitely catches by everyone. As a service, cloud computing is made software more attractive. Hence, as compared to Grid, Cloud has an extra layer as Virtualization that acts as an execution and hosting environment for cloud-based application services. Resource Scheduling is a way of determining schedule on which activities should be performed. Resource scheduling is a key technology in cloud computing [2].

Cloud Computing has basically two parts, the first part is of Client Side and the second part is of Server Side. The Client Side requests to the Server and the Server responds to the Clients. The request from the client first goes to the Master Processor of the server side. The Master Processor have many Slave Processors, the Master processor sends that request to any one of the Slave Processor which is free at that time [8]. Simulation opens the possibility to evaluate the hypothesis prior to actual software development in an environment where one can reproduce tests. Simulation is required because it provides repeatable and controllable environment to test the services [3] .

The motivation behind this research work is to propose clustering based cost optimized resource scheduling technique in which classification of these workloads is done through k-means clustering algorithm by assigning the weights to the different quality attributes. In this research work, cost is considered as a QoS parameter which is optimized in the proposed technique using cloud environment. The rest of paper is structured as follows. Section 1 presents related work. Section 2 describes the proposed technique, and experimental results has been presented in section 4. Conclusions and future work is presented in the last section.

1. Related Work

Mazandarani and Momeni [1] discussed that to implement large scale experiments, various scientific applications are casted in the form of workflows. Intensive computation and data requirements are required because of complication of scientific processes. According to the required QoS, scientific applications run on cloud. The new proposed algorithm is used to select execute plan established on QoS parameters’ time and cost. In workflow, ranks are given to the tasks by proposed algorithm. Cloud resources called QoS-aware Scientific Application Scheduling Algorithm (QSASA) is used to present the scientific workflow. QSASA algorithm is used to reduce the total cost of execution. Result is compared with Heterogeneous Earliest Finish Time (HEFT) algorithm which shows that QSASA improve cost by 15%.

Panchal and Kapoor [2] defined that Cloud computing is known as a dynamic service provider using very large scalable and virtualized resources over the Internet. These services include IaaS, PaaS, SaaS and DaaS (Data as a Service). VM allocation allows efficient sharing of virtual machines to available datacenters and these allocation policies help to evaluate and enhance the cloud performance.

Han et al. [3] declared that resource's geographic sharing, heterogeneity and dynamic management and scheduling in cloud computing environment difficult. Some scheduling strategies and QoS guided scheduling Sufferage min heuristic algorithm presents a new model known as QoS guided task scheduling model. The new model uses the division of tasks and resources into low and high level and gets better scheduling efficiency. Proposed scheme is compared with existing algorithm and shows that proposed scheme reduces the makespan.

Tripathy and Patra [5] designed a new method is designed to minimize the switching time, improve the resource utilization and also to improve the server performance and throughput. The method is based on scheduling the jobs in the cloud and to solve the drawbacks in the existing methods. Here the priority is assigning to the jobs which gives better performance to the computer and to minimize the waiting time and switching time.

Xu et al. [6] proposed Multiple QoS Constrained Scheduling Strategy of Multi-Workflows (MQMW). The existing algorithms are not used for multiple workflows. This strategy is used to schedule multiple workflows which are started at different time. This strategy is capable to increase success rate. The comparative experiments showed that this strategy produced better scheduling results than RANK_HYBD. Objective of the work is to improve the success rate of scheduling and reduce the cost of workflows. The new algorithm takes overall performance into the account instead of completion time.

Abdullaha and Othman [7] designed Divisible Load Theory (DLT) in cloud environment to minimize overall processing time. Divisible load theory is a methodology involving the linear and continuous modeling of partitionable computation and communication loads for parallel processing. Homogeneous processors are used for assigning the load fractions to processors by closedform solution. Real time job restrictions like machine failure, political concerns are not considered.

Calheiros et al. [9] highlighted the CloudSim Architecture. There are two policies of VM Allocation i.e. Time Shared and Space Shared. CloudSim evaluation is performed for comparing machines hosted with one and two data centers using time, memory and overall number of machines as parameters.“2 data centers are better” can be explained with the help of an efficient use of a multicore machine by Java.

Sukhpal et al. [10] proposed efficient Allocation algorithm of Virtual Machine in Cloud Computing Environment which uses all the combination of allocation sequence and chooses the allocation sequence on the basis of strength of allocation. Proposed model is implemented in JAVA using Net Beans IDE in Cloud Sim. The algorithm is tested for different sets of VM instance request and computing nodes. In this work, the experimental results show that proposed algorithm can improve resource utilization by efficient VM allocation. Sukhpal et al. [11] described that Cloud computing is a network-based environment that focuses on sharing computations. Cloud computing networks access to a shared pool of configurable networks, servers, storage, service, application. The paper discussed the advantages, disadvantages, characteristics, challenge, deployment model, cloud service model, cloud service provider & various application areas of cloud computing such as small & large scale (manufacturing, automation, television, broadcast, constructions industries), Geographical Information System (GIS). Sukhpal et al. [12] described that video streaming ser vices are implemented on cloud. Too many cloud providers are there because of the increased use of cloud platforms. Multi-cloud makes use of more than one data center. Objective of the paper is to use the closed-loop approach for cost as QoS. A new algorithm is proposed for cloud providers and data centers in multi-cloud environment. Different video service workloads are used for evaluating the performance of this algorithm. The algorithm is helpful when cloud server costs are different for different data centers. Sukhpal et al. [13] focused the trust in workflows. Conventional scheduling approaches focused only on QoS time and cost requirements. New approach uses two stages of scheduling: the macro multi-workflow scheduling and the micro single workflow scheduling. Time-sensitive, cost-sensitive and balance are three types of workflows. Verification of new model is done on Net Logo and shows that trust scheme gives better transaction success rate. A workflow scheduling simulation platform is also designed by using CloudSim. The following gaps identified from existing literature works [8] [14-22] are discussed below:

1.1 Quality of Service (QoS)

Cloud Service Providers (CSPs) need to ensure that sufficient amount of resources are provisioned to ensure that QoS requirements of Cloud Service Consumers (CSCs) such as cost and budget constraints are met. Therefore, CSPs need to ensure that these violations are avoided or minimized by dynamically provisioning the right amount of resources in a timely manner.

Resource Scheduling

Dispersion, heterogeneity and uncertainty of resources brings challenges to resource allocation, which cannot be satisfied with traditional resource allocation policies in Cloud circumstances. Thus, there is a need to make Cloud services and Cloud-oriented applications efficient by taking care of these properties of the Cloud environment. Aim of resource scheduling is to allocate appropriate resources at the right time to the right workloads, so that applications can utilize the resources effectively. In other words, the amount of resources should be minimum for a workload to maintain a desirable level of service quality, or maximize throughput (or minimize workload completion time) of a workload. To address this problem, new solutions need to be developed.

2. Proposed Clustering based Cost Optimized Resource Scheduling Technique

Although there are few algorithms in the literature for heterogeneous resources, they usually require significant high scheduling costs and they may not deliver good quality schedules with lower cost. The numbers of tasks and resources are extremely huge in cloud computing environment; especially for big data applications, the problem of resource scheduling has become a major challenge. There is, so far, no dedicated scheduling algorithm for time and cost-constrained cloud workflows. On one hand, users are always concerned about the execution time of workflows. On other hand, users are normally sensitive to execution cost that becomes another major concern for cloud workflow applications. There is primary requirement for designing scheduling algorithms for instance-intensive cost-constrained cloud workflows.

2.1 Objectives

The broad objectives of this research work are:

To study existing resource scheduling techniques.
To compare the performance of two scheduling techniques (FCFS based, clustering based [Proposed]) on QoS parameter.
To measure and evaluate the performance of both the scheduling techniques in cloud based simulated environment.

2.2 Research Methodology

To achieve first objective: Study existing resource scheduling techniques and explore optimization techniques.
To achieve second objective: Compare selected resource scheduling techniques by fixing QoS parameters like cost.
To achieve third objective: Cloud test bed would be setup to evaluate both the techniques.

Cloudlets, Status, Data center ID, Virtual Machine ID, Start Time, Finish Time, cost are parameters used in both algorithms.

Figure 1 depicts the flow of process of execution of resources. Based on cloud consumer details, resources are allocated and executes the job. Information about job is collected from cloud consumer, then that job is analyzed. Scheduling technique is selected, after that resources are allocated and the job is executed. If actual execution cost (costmin) is less than threshold value (C ), t then continue execution otherwise reallocate the resources.

Figure 1. Process of Execution of Resources

2.3 Resource Scheduling Algorithms

2.3.1 FCFS Algorithm

It is used to compare the actual execution cost (C_min) with threshold value (C_t ). If execution cost C_min is less than threshold value of execution cost, then execution of job continues otherwise it will generate for analysis. C_t is the threshold value of execution cost and C_min is actual execution cost.

2.3.2 Clustering Algorithm

K-Means follows the partitioned or non-hierarchical clustering approach. It involves partitioning the given data set into specific number groups called Clusters. Each cluster is associated with a centre point called centroid. Each point is assigned to a cluster with the closest centroid. The Initial centroid will be chosen randomly. The centroid is the mean of the points in the cluster. Euclidean distance is used to measure the closeness. K-Means generates different clusters in different runs. Proposed dynamic VM allocation algorithm using clustering is given below.

2.3.3 Analysis

In FCFS algorithm, execution cost is calculated. Actual execution cost is compared with Threshold value. In Clustering algorithm, it involves partitioning the given data set into specific number groups called Clusters. Execution cost is more in FCFS as compared to the Clustering algorithm. Clustering helps service provider to increase availability of software services on the cloud computing environment suitable for cloud users’ demand and requirement. Clustering algorithm shows that the utilization of resources has been improved. Clustering algorithm performs better load balancing.

FCFS Algorithm

Clustering Algorithm

3. Experimental Setup and Results

This section focuses on tools for setting Cloud environment, implementation of heterogeneous workload consolidation technique on Cloud Sim Toolkit and experimental results of this approach.

3.1 Tools for setting Cloud Environment

Cloud applications have different composition, configuration and deployment requirements. So, tool required to implement the workload consolidation technique on Cloud is described below.

3.1.1 CloudSim

CloudSim is an extensible simulation toolkit that enables modeling and simulation of Cloud computing systems and application provisioning environments. The CloudSim toolkit supports both system and behavior modeling of Cloud system components such as data centers, Virtual Machines (VMs) and resource provisioning policies. It implements generic application provisioning techniques that can be extended with ease and limited effort. Currently, it supports modeling and simulation of Cloud computing environments consisting of both single and internet worked Clouds (federation of Clouds). Moreover, it exposes custom interfaces for implementing policies and provisioning techniques for allocation of VMs under inter-networked Cloud computing scenarios. Cloud Sim offers the following novel features [4]:

Support for modeling and simulation of large-scale Cloud computing environments, including data centers, on a single physical computing node.
A self-contained platform for modeling Clouds, service brokers, provisioning and allocation policies.
Support for simulation of network connections among the simulated system elements.
Facility for simulation of federated Cloud environment that internetworks resources from both private and public domains, a feature critical for research studies related to Cloud-Bursts and automatic application scaling.
Availability of a virtualization engine that aids in the creation and management of multiple, independent, and co-hosted virtualized services on a data center node.
Flexibility to switch between space-shared and timeshared allocation of processing cores to virtualized services.

In Figure 2, CloudSim package is initialized. Data Centers are created for execution of CloudSim. Brokers, virtual machines, cloudlets are created. Then simulation started. Brokers are responsible for negotiation between SaaS and Cloud providers. Cloudlet specifies the set of user requests i.e. application id, size of execution command.

Figure 2. CloudSim Life Cycle

3.1.2 NetBeans IDE 8.0.1

The NetBeans IDE 8.0.1 is a modular, standards-based, Integrated Development Environment (IDE) written in the Java programming language [16]. The NetBeans IDE 8.0.1 project consists of an open source IDE and an application platform, which can be used as a generic framework to build any kind of application. The focus of NetBeans IDE 8.0.1 improves developer’s productivity through a smarter, faster editor, and the integration of all NetBeans products into one IDE. NetBeans IDE 8.0.1 is an Integrated Development Environment (IDE) for developing primarily with Java.

3.1.3 Simulation Parameters

Table 1 shows the characteristics of resources and Cloudlets that has been used for all the experiments. User Cloud workloads as independent parallel applications are modeled which is computation intensive. Thus the data dependency among the Cloud workloads in the parallel applications is negligible. Each Cloud workload is parallel and is hence considered to be independent of any other Cloud workload.

Table 1. Scheduling Parameters and their Values

3.2 Experimental Results

Figure 3 shows that result of FCFS technique based on simulation. The result shows the output in terms of cloudlet id, status, datacenter id, VM id, time, start time, finish time. Cost is the final output obtained from FCFS technique. Figure 4 shows the result of clustering technique based on simulation. The result shows the output in terms of cloudlet id, status, datacenter id, VM id, time, start time, finish time. Cost is the final output obtained from Clustering technique which is less as compared to FCFS technique. Figure 5 shows the comparison of cost of FCFS and Clustering Technique. X-axis represents both the techniques FCFS and clustering. Y-axis represents the execution cost ($). It clearly depicts that cost is reduced in case of Clustering as compare to FCFS. Clustering technique is more efficient as compared to FCFS.

Figure 3. Simulation Results of FCFS Technique

Figure 4. Simulation Results of Clustering Technique (Proposed)

Figure 5. Comparison of Cost of FCFS and Clustering Technique

3.2.1 Test Case 1: Effect of change in number of resources submitted on Execution Time

Figure 6 shows the effect of change in number of resources submitted on execution time. X-axis represents the execution cost and Y-axis represents the number of resources. With increase in number of resources, execution time decreases. Execution time is less in case of FCFS technique as compared to the Clustering technique.

Figure 6. Effect of Change in Number of Resources Submitted on Execution Time

3.2.2 Test Case 2: Effect of change in size of resources submitted on Execution Cost

Figure 7 shows the Effect of change in size of resources submitted on Execution Cost. X-axis represents the size of resources and Y-axis represents the execution cost. With increase in size of resources, execution cost is increased more in FCFS technique as in case of clustering technique with increase in size of resources cost is also increased but cost is less in clustering as compare to FCFS technique.

Figure 7. Effect of change in size of resources submitted on Execution Cost

3.2.3 Test Case 3: Effect of change in number of resources submitted on Execution Cost

Figure 8 shows the effect of change in number of resources submitted on Execution Cost. X-axis represents number of resources and Y-axis represents the execution cost. With increase in number of resources, the execution cost is increased more in FCFS technique and in case of clustering technique with increase in size of resources, cost is also increased but cost is less in clustering as compared to FCFS technique.

Figure 8. Effect of change in number of resources submitted on Execution Cost

Conclusions and Future Scope

Cloud Computing and its vital characteristics have been discussed in this project. This project focuses on the resource scheduling challenges that Cloud Computing is facing today. Several resource scheduling algorithms are compared with respect to the Cloud workload as an answer for the dynamic scalability of resources. The different Cloud workloads and design patterns have been identified and analyzed. The key QoS requirements for every Cloud workload have been identified. The clustering of these Cloud workloads is done through K-Means Clustering Algorithm by assigning the appropriate weights to the different quality attributes. The results of clustering of Cloud workloads have been presented with the help of CloudSim tool. Further, workload based Cloud Framework is proposed and implemented in this research work. The experimental results gathered through CloudSim 3.0 clearly demonstrate that Clustering technique has a better performance in terms of execution cost as compared to FCFS scheduling algorithms.

Future Scope

This research work will be extended to design autonomic resource scheduling technique which will manage resources efficiently.
The proposed scheduling technique will incorporate important QoS parameters such as execution time, energy etc. other than cost.
Further the proposed will technique will be validated through the cloud based case study.
Both proposed scheduling techniques and case study will be evaluated in cloud based simulated environment.

References

[1]. Ayda Mazandarani and Hossein Momeni, (2013). “QoS-aware Scientific Application Scheduling Algorithm in Cloud Environment”, Computer Engineering and Intelligent Systems, Vol.4, No.12, pp. 17-19.

[2]. Bhupendra Panchal and Prof. R. K. Kapoor, (2013). “Dynamic VM Allocation Algorithm using Clustering in Cloud Computing”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 3, No. 9, pp.18-23.

[3]. Haiwen Han, Qi Deyu, Weiping Zheng and Feng Bin, (2013). “A QoS Guided task Scheduling Model in cloud computing environment ”, Four th International Conference on Emerging Intelligent Data and Web Technologies, China.

[4]. JayshriDamodarPagare and Dr. Nitin A Koli, (2015). “Design and simulate cloud computing environment using CloudSim”, International Journal of Computer Technology & Applications, Vol. 6, No. 1, pp. 35-42.

[5]. Lipsa Tripathy, Rasmi Ranjan Patra, (2014). “Scheduling in cloud computing” International Journal on Cloud Computing: Services and Architecture, Vol. 4, No. 5, pp. 28-35.

[6]. Meng Xu, Lizhen Cui, Haiyang Wang and Yanbing Bi, (2009). “A Multiple QoS Constrained Scheduling Strategy of Multiple Workflows for Cloud”, IEEE International Symposium on Parallel and Distributed Processing with Applications.

[7]. Monir Abdullah and Mohamed Othman, (2013). “Cost-Based Multi-QoS Job Scheduling using Divisible Load Theory in Cloud Computing”, International Conference on Computational Science, Vol. 18, No. 2, pp. 928-935.

[8]. Ranjan Kumar, G. Sahoo, (2014). “Cloud Computing Simulation Using CloudSim”, International Journal of Engineering Trends and Technology , Vol. 8, No. 2, pp.1-5.

[9]. Rodrigo N. Calheiros, Rajiv Ranjan, César A. F. De Rose, and Rajkumar Buyya, (2011). “CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing” Environments and Evaluation of Resource Provisioning Algorithms, Vol. 41, No.1, pp. 23-50.

[10]. Sukhpal Singh and Inderveer Chana, (2015). “Qaware: Quality of service based cloud resource provisioning”, Computers & Electrical Engineering, Vol. 47, pp.138-160. DOI: http://dx.doi.org/10.1016/j. compeleceng.2015.02.003

[11]. Sukhpal Singh and Inderveer Chana, (2015). “QRSF: QoS-aware resource scheduling framework in cloud computing”, The Journal of Supercomputing, Vol. 71, No. 1, pp. 241-292.

[12]. Sukhpal Singh and Inderveer Chana, (2015). “EARTH: Energy-aware Autonomic Resource Scheduling in Cloud Computing”, Journal of Intelligent and Fuzzy Systems Systems, Preprint: 1-17, IOS Press. DOI: http://dx.doi.org/ 10.3233/IFS-151866

[13]. Sukhpal Singh, Inderveer Chana and RajkumarBuyya, “Agri-Info: Cloud Based Autonomic System for Delivering Agriculture as a Service”, pp. 1-31, Technical Report CLOUDS-TR-2015-2, Cloud Computing and Distributed Systems Laboratory, The University of Melbourne, 2015. Retrieved from http://www. cloudbus.org/reports/AgriCloud2015.pdf

[14]. Inderveer Chana and Sukhpal Singh, (2014). “Quality of Service and Service Level Agreements for Cloud Environments: Issues and Challenges”, Cloud Computing-Challenges, Limitations and R&D Solutions. pp. 51-72, Springer International Publishing.

[15]. Sukhpal Singh and Inderveer Chana, (2016). “Cloud Resource Provisioning: Survey, Status and Future Research Directions”, Knowledge and Information Systems.

[16]. Sukhpal Singh and Inderveer Chana, (2016). “A Survey on Resource Scheduling in Cloud Computing Issues and Challenges”, Journal of Grid Computing.

[17]. Sukhpal Singh and Inderveer Chana, (2016). “Resource Provisioning and Scheduling in Clouds: QoS Perspective”, The Journal of Supercomputing.

[18]. Sukhpal Singh and Inderveer Chana, (2015). “QoSaware Autonomic Cloud Computing for ICT”, In the Proceeding of International Conference on Information and Communication Technology for Sustainable Development (ICT4SD - 2015), Ahmedabad, India, 3 - 4 July, 2015, Springer International Publishing.

[19]. Sukhpal Singh and Inderveer Chana, (2015). “QoSaware Autonomic Resource Management in Cloud Computing: A Systematic Review”, ACM Computing Surveys , Vol. 48, No. 3, pp. 1-46.

[20]. E. RaviKondal and B. Mounika, (2015). Data Scheduling and Mapreducing in Big Data. i-manager's Journal on Cloud Computing.,2(2), Feb-Apr 2015, Print ISSN 2349-6835, E-ISSN 2350-1308, pp. 1-6.

[21]. D.R.Robert Joan, (2015). Encroachment of Cloud Education for the Present Educational Institutions. imanager's Journal on Cloud Computing.,2(2), Feb-Apr 2015, Print ISSN 2349-6835, E-ISSN 2350-1308, pp. 7-13.

[22]. Shalin Elizabeth. S and S. Sarju (2015). A Scalable and Cost-Effective Data Anonymization over Big Data using Mapreduce on Cloud. i-manager's Journal on Cloud Computing.,2(2), Feb-Apr 2015, Print ISSN 2349- 6835, E-ISSN 2350-1308, pp. 31-39.