Applications of Cluster Computing in Data Centers Along with its Advantages and Disadvantages

Abdul Rehman *  M. Shoaib Omer **
*-** Department of Computer Science, Bahria University, Karachi, Pakistan.

Abstract

We have conducted a survey in the filed of cluster computing with intention to see its applications in data centers, advantages and disadvantages. Although the field of cluster computing is not novel, however a brief and comprehensive survey is conducted to help novice users to get a fair idea about data centers, cluster computing its applications, advantages and disadvantages. A thorough study was carried out from available resources, i.e., journals, conferences and websites to help conclusion. A number of free COTS (Commercial-Off-The-Shelf) software are also available to get cluster computing done in any given environment. There is still a need to produce more middle-ware software for cluster computing to overcome the heterogeneous hardware environment.

Keywords :

Introduction

Data processing on large scale has a paramount value in the filed of computer systems. With the passage of time, amount of data to be processed is getting manifolds and multitude in dimensions. It is not possible to process this amount of data on systems having limited processing and memory. Thus High Performance Computing (HPC) has been emerged during past decades for processing large amount of data as in Big Data. Increase of processors, memory, and interconnection of computer systems were considered to achieve high performance computing. New systems were evolved from HPC. The most common systems include “vector Computers (VC), Massively Parallel Processors (MPP), Symmetric Multi-Processors (SMP), Cache Coherent Non-Uniform Memory Access (CC-NUMA), distributed systems and clusters” (Jin & Buyya, 2001). A cluster is defined as “a collection of interconnected standalone workstations or PCs cooperatively working together as a single, integrated computing resource” (Jin et al, 2001). Although the field of cluster computing is not new, certain latest technical capabilities in the area of networking have brought this to the forefront as a platform of choice to run all types of parallel and distributed applications (Jin et al, 2001). The datacenter comprises of building structure having dedicated rooms for power, servers and communication equipment, Network Operation Center (NOC), Security Operation Center (SOC), etc. The ultimate purpose of the datacenter is to provide infrastructure for computer systems to run round the clock to provide seamless services to the end users. It is important to understand that datacenter does not include computer systems in its definition but the infrastructure only. There are four types of datacenters, viz. Tier 1 to Tier 4 datacenters. These four tiers tells us about minimum up time required for a datacenter. Higher is the tier; the maximum is the up time rate. To get high tier datacenter, we need multiple power backups for a datacenter, i.e., main power supply, uninterrupted power supply (UPS) and generators, etc. Even sometimes backups of backups are acquired (Rashid, 2019).

1. History

History of cluster computing starts with the presentation of paper by Gene M. Amdahl in AFIPS spring joint computer conference, 1967, regarding theoretical speed up of latency of a process, if processed by multiple resources at the same time subject to not exceeding the time taken by the instructions those required sequential processing (Amdahl, 1967). The relationship between sequential and parallel instruction processing is shown in Equation 1. Later on this becomes the base for parallel and cluster computing.

(1)

where rs + rp = 1 and r represents the ratio of the sequential portion in one program and rp represents the parallel proportion of the program (Amdahl, 1967). This equations explains that no matter what a program must take the time equivalent to the time of sequential process.

Then in 1969, ARPANET came into being. That can be called as first non-commercial cluster computer as four different sites were connected together to share their resources. Then this become the mother for invention of internet later. In 1977, first commercially admitted cluster network was produced, which is called ARCnet. Although ARCnet was not taken as a commercial success, Digital Equipment Corporation (DEC) released their VAX cluster product in 1984 for the VAX/VMS operating system, which was commercially accepted. There were other significant early commercial clusters: Tandem Himalaya and the IBM S/390 Parallel Sysplex (Wikipedia, n.d.) (Strey, 2001).

If we see in terms of cluster computing systems, a number of COTS operating systems are available to get this functionality done. GLUnix, Solaris MC, UnixWare and MOSIX, are some of its examples (Martincova, 2003). Moreover cluster computing programming environment is also divided into two categories: with shared memory and without shared memory. For example, Parallel Virtual Machines (PVM) is an example of no shared memory programming environment for cluster computing (Martincova, 2003).

2. Architecture

Now we will discuss the architecture of cluster computing. The basic of cluster computing is sharing of resources of multiple system through a common net- work switch. As can be seen in the Figure 1, some systems are interconnected though a network switch to share their resources. Most of the time a common disk space is shared between different systems known as Storage Area Network (SAN). Most of the data centers are running in storage sharing principal. A number of HDDs are placed at a single location named as SAN. Then these devices are shared among different servers to store their data on them. This redundant storage provide backup in case of any failure of any disk. Notwithstanding above, memory and processors are also shared in cluster computing. The cluster architecture has following three types (Educba, n.d.).

Figure 1. Architecture of Cluster Computing

2.1 Types of Cluster Architecture

2.1.1 Load Balancing Cluster

In the cluster network, the workload is divided and equally allocated to multiple computers systems. The work can be done in a quick manner because each node performs small piece of work rather than the whole task.

2.1.2 High Availability (HA) Clusters

In case of failure of a computer node, the other node will perform the operations of former computer without any delay, so that this cluster ensures high availability and can be considered as reliable solution for performing operations which does not stop even for a small time.

2.1.3 High Performance (HP) Clusters

This is a group of clusters which utilizes super computers and cluster computing to provides a very high processing speed in order to perform highly advanced computations.

Cluster can be classified into two categories open and close clusters (Thakur, n.d.).

2.2 Classification of Cluster

2.2.1 Open Cluster

All nodes in Open Cluster are IP based. These can only be accessed through internet/ web that causes more security concern.

2.2.2 Close Cluster

On the other hand Close Cluster are hidden behind the gateway node and provide better security.

3. Cluster Operating System

Besides hardware architecture, software environment of cluster computers play an important role in its design and implementation. Cluster operating systems are basically responsible for failure management, load balancing and parallelization. Two approaches are used for failure management: high available clusters and fault tolerant clusters. Load balancing manages shifting of load from busy nodes to idle or less busy nodes. Lastly, certain software are designed to be executed in cluster environment (Martincova, 2003).

4. Cluster Computing Programming Environment

Cluster programming environment highly depends on memory. There are two conditions; either the memory can be shared or through message passing as it is achieved in normal operating systems. In case of shared memory we have: (1) DSM (Distributed Shared Memory (2) Threads/ OpenMP (enabled for clusters) and (3) Java threads (HKU JESSICA, IBM cJVM). If there is no shared memory, then we can use message- passing mechanism. The most popular environments are (1) PVM (Parallel Virtual Machine) and (2) MPI (Message Passing Interface) (Martincova, 2003).

5. Cluster Computing Applications

A cluster computer is a set of computers which are either loosely or tightly connected with each other. In the perspective of the end-user, they can be considered to be running as a single entity. This concept is most beneficial for web, because the absence of powerful super computer system has become a greater challenge to handle millions of requests in a day. Cluster computers are mostly used for parallel computations / complex computation problems. These days many of the giant internet websites are hosted with the help of use of cluster computing. High performance clusters are being utilized in variety of range of computer system applications, such as:

These are perfect to be employed in life saving applications, such as:

These cluster are also playing a fundamental role in running several web applications. For example email, search engine, data security, cloud computing, database servers, proxy and web hosting services.

Business and trading applications are also increasingly being deployed in cluster computing architecture. The developers has started to automate business operations and are pushing these applications on the server. The operation are executed more accurately in automated system and treats more customers than non-automated system. However, if the server fails, it can stall the whole transactions/ trading processes until server start to work properly again. This might create a great annoyance for the customers and the business can go into loss. This problem can be reduced or may be completely solved through the utilization of cluster computers. The application will not be halted even after the server stops. The operation will be continued by automatically shifting operation to another computer and separating the failure parts from the system. As a result, the system will continuously provide the services to the customer without any delay and inconvenience.

The clusters have been applied in many parts of the internet application as shown in the Figure 2.

Figure 2. Application of Cluster Computing in the Industry

6. Web Servers

Load balancing clusters computers, i.e., web servers utilize cluster technology to run wide range of web application and to provide services to the large number of end-users. Normally every request is transferred to a particular computer node. Assigning jobs to each specific nodes in parallel. Users can take the advantage of parallel process of cluster in most complex computation problem. When a user request to a specific URL or route, the response is generated from the certain node in High Performance Distributed Computing (HPDC). Instead of using a single server node cluster computing has an advantage of not to put burden on a single computer. This results in the high performance of web application and sharing of hardware and data with other resource. Another benefit is, if any node fails, the other node can immediately take the responsibly of failed component without discontinuing the operation (Google Cloud, n.d.).

7. Search Engines

Search engines are the fundamental tools of receiving the information from the internet. When a user enters one or more keyword in the search engine, then it returns large number of pages with the result in just few seconds by taking full advantage of parallel computing. Google is the most popular search engine in the world. Google Cloud includes a simple service for scheduling a Docker-based task on Compute Engine for high throughput workloads (Google Cloud, n.d.).

8. Email

There are tens of hundred millions emails sent and received in a single day. The email sever nodes helps to split a task of receiving and sending these email in parallel without any delay. One of the most used free email website is outlook.com (that uses Inktomi technologies as clusters) (Jin et al., 2001).

9. Security

An increasing number of attacks occur in the public network interface. One of the possible reason is the lack of security throughout the internet. It is easy to predict and understand the behavior when using an individual component. The fusion of multiple computers to performing the single task is more complex, difficult and sometimes it is impossible to understand and predict these unintended behavior. (Pourzandi et al., 2005).

10. Database Servers

The database is the most important aspect of every application which is employed to store and manipulate the required data from it. The process to make a group of two or more database servers is called database clustering. Cluster computer is essential to isolate web application from its database in order to secure the database from unusual activities. When the application runs on separate node then the processor of that node only needs to run the application and the other node will be responsible to store or retrieve information from the database to accelerate the operation (Google Cloud, n.d.).

10.1 Advantages and Disadvantages of Cluster Computing

Some of the advantages and disadvantages of cluster computing are given in Table 1.

Table 1. Advantages and Disadvantages of Cluster Computing

10.2 Advantages of Cluster Computing

There are several benefits of implementing cluster computing in the applications. Some important advantages are discussed as follows (Google Cloud, n.d.) (WatElectronics, 2020).

Cost Effectiveness: Cluster computing systems are cost effective as compared to the other technologies with respect to the cost of power and speed. These techniques provide better result than even mainframe computers with very low budget (WatElectronics, 2020).

Processing Speed: In cluster environment, numerous computers are utilized to perform task together in the same time to provide uniform processing, due to which this system can perform faster than other high- end computing devices (WatElectronics, 2020).

Flexibility: These systems can be upgraded easily at any time with the higher specification as compared to mainframe computers. Also new nodes can be connected, which means that the cluster can be expanded without much difficulty (WatElectronics, 2020).

Extended Resource Availability: If the individual computer gets failed, the whole system can be halted. In order to overcome this problem cluster computer are being used. In this architecture, if a node gets failed then the other nodes takes its place and handles the job of failed nodes. This technique provides an enhanced availability of resources (WatElectronics, 2020).

Single Entity: The entire system operates as a single entity. Multiple nodes are tightly or loosely connected with each other as a single system. The end-user cannot tell any difference. They will be enjoying a seamless performance. This is main concept behind cluster computing i.e., to give a feel of single system instead of many as given by Gene M. Amdahl (Amdahl, 1967).

10.3 Disadvantages of Cluster Computing

Every system that has advantages also comes with disadvantages. Some of the disadvantages of cluster computing are as under.

Homogeneous Architecture: All the connected nodes should have to be homogeneous that means each node must have same hardware and operating system. In case of heterogeneous environment, cluster computer would be difficult, if not impossible (WatElectronics, 2020).

Development Difficulty: The most challenging task is to develop software programs for the distributed cluster computing system (Naeem et al., 2016).

Maintenance Difficulty: Since there are multiple resources, i.e., hardware and software to establish a single entity, monitoring and maintenance is complicated.

Conclusion

From above discussion we conclude that cluster computing is being used as a best alternative for parallel computing and other high performance computing technologies. The main benefit of cluster computer is cost effectiveness. Beside other advantages it also seems very feasible to use cluster computing in datacenters because datacenters have a homogeneous environment, which supports cluster computing. However, there is still a dire need to overcome its disadvantages, most importantly the requirement of homogeneous architecture. As new technology is emerging day by day, implementation of cluster computer on heterogeneous environment will become an essential need. If this limitation is overcome, then it will become more useful technology.

References

[1]. Amdahl, G. M. (1967, April). Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, AFIPS '67 (Spring): Spring Joint Computer Conference, (pp. 483- 485). https://doi.org/10.1145/1465482.1465560
[2]. Educba. (n.d.). What is cluster computing - A concise guide to cluster computing EDVCBA. Retrieved from https:// www.educba.com/what-is-cluster-computing/
[3]. Google Cloud. (n.d.). Using clusters for large scale technical computing in the cloud. Google Cloud. Retrieved from https://cloud.google.com/solutions/usingclusters- for-large-scale-tech nical-computing
[4]. Jin, H., Buyya, R., & Baker, M. (2001). Cluster computing tools, applications, and Australian initiatives for low cost supercomputing. The Institution of Engineers Australia, 25(4). Retrieved from https://citeseerx.ist.psu.edu/viewdoc/down load?doi=10.1.1.30.4857&rep=rep1&type=pdf
[5]. Mahmud, M. S., Huang, J. Z., Salloum, S., Emara, T. Z., & Sadatdiynov, K. (2020). A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics, 3(2), 85-101. https://doi.org/10.265 99/BDMA.2019.9020015
[6]. Martincova, P. (2003). Software Environment for Cluster. Journal of Information, Control and Management Systems, 1(1), 67–76.
[7]. Naeem, M. M., Mahar, H., Memon, F., Siddique, M., & Chohan, A. (2016). Cluster computing vs cloud computing a comparison and an overview. Science International (Lahore), 28(6), 5267–5271.
[8]. Pourzandi, M., Gordon, D., Yurcik, W., & Koenig, G. A. (2005). Clusters and security: Distributed security for distributed systems. In 2005, IEEE International Symposium on Cluster Computing and the Grid, 1, 96-104. https:// doi.org/10.1109/CCGRID.2005.1558540
[9]. Rashid, A. M. (2019). Data center architecture overview. National Academy for Planning and Development, Dhaka, Bangladesh. Retrieved from https://www.cisco.com/c/en/ us/td/docs/solutions/Enterprise/DataCenter/DCInfra25/DC Infra1.pdf
[10]. Strey, A. (2001). High Performance Computing. In Neil J. Smelser and Paul B. Baltes (Eds.), International st Encyclopedia of the Social & Behavioral Sciences (1 Ed.), (pp.6693-6697). Pergamon. https://doi.org/10.1016/B0-08- 043076-7/00574-X
[11]. Thakur, D. (n.d.). What is cluster computing? - Computer Notes. Retrieved from https://ecomputernotes. com/computernetworkingnotes/network-technologies/
[12]. WatElectronics. (2020). Cluster Computing: Definition, types, advantages & applications. Retrieved from https://www.watelectronics.com/cluster-computingarchitecture- its-types/
[13]. History of Computer Clusters. (2020, December 13). In Wikipedia. https://en.wikipedia.org/wiki/History_of_comp uter_clusters