Amandeep Kaur * Kanwalvir Singh Dhindsa **

* PG Scholar, Department of Computer Science and Engineering, BBSB Engineering College, Fatehgarh Sahib, Punjab, India.

** Professor, Department of Computer Science and Engineering, BBSB Engineering College, Fatehgarh Sahib, Punjab, India.

Abstract

With the Web growing rapidly and increase in user-generated content websites such as Facebook and Twitter, there is a need for fast databases that can handle huge amounts of data. For this purpose, new database management systems collectively called NoSQL are being developed. There are many NoSQL database types with different performances, and thus it is important to evaluate performance. To check the performance, three major NoSQL databases called MongoDB, Cassandra, and Couchbase have been considered. For performance analysis, different workloads were designed. The evaluation has been done on the basis of read and update operations. This evaluation enables users to choose the most appropriate NoSQL database according to the particular mechanisms and application needs.

Nowadays, databases are thought of an important part of the organizations and are used all over the globe. Relational databases permit data storage, extraction and manipulation employing a standard SQL language [6]. Until now, relational databases were the best enterprise selection. However, with the constant growth of stored and analyzed data, relational databases exhibit a range of limitations, e.g. the restrictions of scalability [7] and storage, and efficiency losing of query because of the massive volumes of data, and therefore the storage and management of larger databases become difficult [3]. In order to beat these limitations, a brand new database model was developed with a collection of recent features, called NoSQL databases [5]. Nonrelational databases emerged as a breakthrough technology, and can be used sole or as complement to the relational database. In other words, RDBMS that don't use SQL aren't NoSQL. DBMS using models other than the relational one, and hence not SQL, are NoSQL. A NoSQL DBMS is just a DBMS that doesn't use the relational model. NoSQL databases usually have in common the power to easily scale horizontally also as being non-relational and not requiring fixed schemas [2]. The shortage of relations typically means join operations that in SQL are common, here are superfluous. Except that, totally different NoSQL implementations will look very different [9]. NoSQL increases the performance of relational databases by a collection of new characteristics and benefits [17]. Compared to relational databases, NoSQL databases are more flexible and horizontally scalable [11]. NoSQL databases are projected to automatically manage and distribute data, get over faults and repair the complete system automatically [15]. When NoSQL technology started to emerge, NoSQL databases were best-known and characterised by the lack of consistency of its stored data [14]. For the companies and systems, wherever strong consistency was essential, the shortage of consistency might be a big limitation.

Literature review is a survey of published materials that are relevant to a particular issue, theory of area of research. It provides a description, summary and critical evaluation of each work. The study of the existing literature has been carried out during the research and is given as follows:

Moniruzzaman and Hossain [1] discussed classification, characteristics and analysis of NoSQL databases in massive data Analytics. The report is meant to assist users, particularly to the organizations to get an independent understanding of the strengths and weaknesses of assorted NoSQL information approaches to supporting applications, those method include Brobdingnagian volumes of information. The study report motivated to offer an independent understanding of the strengths and weaknesses of varied NoSQL database approaches to supporting applications that has vast volumes of data; similarly to provide a worldwide summary of the nonrelational NoSQL databases. Lourenço, et al. [8] highlighted the performance comparison of various NoSQL databases. In this paper, the authors have gathered a brief and up-to-date comparison of NoSQL engines, their most useful use case situations from the programmer viewpoint, their benefits and disadvantages by measuring the present market literature. They also concluded that, though there is a spread of studies and evaluations of NoSQL technology, there's still not enough data to verify however suited every non-relational database is during a specific state of affairs or system. Moreover, every operating system differs from one another and the required functionalities and mechanisms extremely have an effect on the database selection. Typically, there's no chance of clearly stating the simplest information answer.

Madden [16] presented a view on big databases. He concluded that, though databases don't solve all aspects of the big data problem, many tools of some supported databases get part-way there. What's missing is twofold: Firstly, the user should improve statistics and machine learning algorithms to be additional strong and easier for unsophisticated users to use, whereas at the same time training students in their intricacies. Secondly the, users need to develop an information management scheme around these algorithms in order that users will manage and evolve their data, enforce consistency properties over it, and browse, visualize, and perceive their algorithms' results. Chitra and Jeevarani [10] focused primarily on the market, scalable and Eventually Consistent NoSQL Databases. The paper analyses the requirement of following generation data storage that is the need of the present large-scale social networking or cloud applications additionally analyze the capabilities of assorted NoSQL models like BigTable, Cassandra, CouchDB, generator and MongoDB. The authors concluded that NoSQL databases usually performs the data faster than relative databases. Developers typically don't have their NoSQL databases supporting ACID properties, so as to extend performance, however this will cause issues once used for applications that need nice exactness. Sharma and Dave [19] discussed about NoSQL, its background, fundamentals like ACID, BASE and CAP theorem. The main aim of the paper is to give a summary of NoSQL databases, regarding however it declined the dominance of SQL, with its background and characteristics. It also describes the fundamentals of the NoSQL databases like ACID, BASE and CAP theorem. ACID property isn't utilized in the NoSQL databases therefore SQL lags data consistency. Soni and Yadav [13] described measuring of Document holding on Databases. The authors provide a quick introduction to NoSQL information operation and comparative study between MongoDB and Cassandra. The operations are performed on Ubuntu system to explore the results to distinguish between each NoSql databases. The paper shows the performance of MongoDB and Cassandra. Results prove that Cassandra is additionally powerful than MongoDB to load and the methods on big data and processing in lesser time as compared to MongoDB. The paper describes the functionality of MongoDB and Cassandra over the massive dataset.

Abramova, et al. [18] reviewed the performance of various NoSQL databases. In the paper, five most well liked NoSQL databases are evaluated: Cassandra, HBase, MongoDB, OrientDB and Redis. They compared those databases in terms of performance, supported reads and updates, taking into thought the everyday workloads, as pictured by the Yahoo! Cloud Serving Benchmark. The comparison permits users to choose the most appropriate database in step with the precise mechanisms and application desires. They also concluded that MongoDB, Redis, and OrientDB are the best databases to perform scan operations, whereas Colum Family databases, Cassandra and HBase, have a far better performance throughout execution of updates.

Truica, et al. [4] describes performance analysis for CRUD operations in asynchronously replicated document oriented database. The paper examines asynchronous replication, one among the key options for an ascendible and flexible system. Three of the most in style Document- Oriented Databases, MongoDB, CouchDB, and Couchbase, are examined. They concluded that though CouchDB performs alright for the insert, update and delete, MongoDB is the quickest once it involves attractive knowledge. Overall, the NoSQL databases perform higher than the relational ones. Manoj [12] highlighted the comparative study of NoSQL database and explored NoSQL technologies. He presented a comparative study of document and column store NoSQL databases like Cassandra, MongoDB and Hbase in varied attributes of relative and distributed information system principles. He also concluded that mongoDB fits to be used cases with document storage, document search and wherever aggregation functions are mandate. Hbase suits the eventualities, wherever Hadoop map scale back is helpful for bulk read and load operations. Hbase offers optimized scan performance with hadoop platform.

For the experimental analysis, the YCSB - Yahoo! Cloud Serving Benchmark [18] was used, which allows to evaluate and compare the performance of NoSQL databases. This benchmark consists of two components: a data generator and a set of performance tests consisting of read and insert operations. Each of the test scenarios is called workload and is defined by a set of features, including a percentage of read and update operations, total number of operations, and number of records used. The main focus is on comparing execution speed and throughput of read and update operations, which are the most used operations. Therefore, workloads A, B and C have been executed. Table 1 shows the executed workloads and the respective operations.

In order to evaluate the databases, 500,000 records, each with 10 fields of 100 bytes over the key registry identification, resulting in roughly 1 kb per record are randomly generated. The execution of workloads is examined using three different set of operations listed below in Table 2.

The listed number of operations are used in each workload. The number of operations means the number of requests to the database under test, while varying the number of records and operations. This means that on using 10,000 operations, there are 10,000 requests to the database under test. Here different number of operations on different set of workloads are used.

The NoSQL databases evaluation involved testing performance for the CRUD operations. CRUD is the acronym used for the following operations:

All the tests were executed on Ubuntu 12.04 version, 64 bit with 2 GB RAM and 320 GB HDD. During the experimental evaluation, the following NoSQL databases were used, which are widely utilized ones:

The following subsections analyse the execution time and throughput based on read and update operations for different set of workloads. Workloads A, B and C are executed. The performance is said to be low in case of minimum throughput values and maximum execution time and will be high in case of maximum throughput and minimum execution time. So by obtaining values, the performance will be checked on the basis of minimum and maximum outcome of these parameters. The final results will be declared on the basis of these parameters.

3.1 Evaluation over Workload A

The evaluation of workload A is done by calculating throughput and execution time considering the same number of operations for all three databases.

3.1.1 Throughput

Throughput is defined as the number of operations performed in one second. Table 3 describes the Throughput (operations/sec) over different number of operations in all three databases. The performance is said to be maximum if throughput values are maximum as well.

Table 3 describes the throughput values of MongoDB, Cassandra and Couchbase. These values are plotted to present the clear picture of throughput while performing workload A. Figure 1 shows the results obtained while executing workload A that consists of 50% read and 50% updates, over 500,000 records.

In the above Figure 1, the result of throughput values of workload A, Couchbase performs best, which shows high values of throughput as the number of operations increases. The values in Couchbase are very much increased at 30,000 operations and continues increasing as the number of operation increases. At 10,000 operations both Cassandra and MongoDB gave lower throughput. Cassandra shows slight increase in throughput of nearly 30,000 operations. In Cassandra, as the number of operations increases, its performance increases as well. On the other hand, MongoDB shows minimum throughput in all different operation counts, which means the throughput of MongoDB doesn't affect much in all three operation sets while performing workload A.

3.1.2 Execution Time

Execution time or run time is defined as the time taken by the system to complete a given task. The performance is said be high if execution time will be minimum. Table 4 describes the Execution time in millisecond (ms) over different no. of operations in all three databases.

Table 4 describes the execution time of MongoDB, Cassandra and Couchbase. The table value varies as the number of operation varies. These values are plotted to give a graphical representation of execution time while performing workload A. Figure 2 shows the results obtained while executing workload A that consists of 50% read and 50% updates, over 500,000 records.

The results of the execution of workload A indicate that the MongoDB and Cassandra, showed the slow execution time during operations. At 10,000 operations MongoDB, Cassandra and Couchbase presented the less execution time. At 30,000 operations MongoDB and Cassandra shows larger time to execute. At 50,000 operations MongoDB shows greater execution time which means it takes very long to complete as the number of operation increases. Given a large number of records, MongoDB showed more difficulty during execution. Cassandra shows high performance over the workload A as execution time slightly increases from 10,000 operations to 50,000 operations. From Figure 2, it is clear that Couchbase takes less time to complete 50,000 operations. The best outcome was shown by Couchbase, which had less execution time and high performance for workload A.

3.2 Evaluation over Workload B

The evaluation of workload B is done by calculating throughput and execution time considering the same number of operations for all three databases.

3.2.1 Throughput

The workload B was executed to get the throughput values for performance evaluation. Table 5 describes the Throughput (operations/sec) over different number of operations in all three databases.

In Cassandra, the throughput values decreases as the number of read operations increase. But is higher when compared to MongoDB. The Couchbase here also shows higher throughput values whether the number of read operations is increased. These values are represented in Figure 3. It show the results obtained while executing workload B that consists of 80% read and 20% updates, over 500,000 records.

Figure 3 represents the throughput values of three databases at different number of operations when performed with workload B (80% reads and 20% updates). Both MongoDB and Cassandra gave lower throughput values at 10,000 operations. Both gave lower values as the number of operation increases, i.e. at 30,000, and so on. The throughput values of MongoDB slightly increases when the number of operation increases. In Cassandra, the throughput values also increases, but decreases as compared to previous workload A. In Figure 3, the throughput values are maximum in case of Couchbase. The throughput of Couchbase is five times greater than MongoDB and four times greater than Cassandra. So by observing the results, Couchbase performs better in case of workload B and gives maximum throughput.

3.2.2 Execution Time

The performance depends on execution time. As execution time decreases, the performance is said to increase. Table 6 describes the Execution time (ms) over different number of operations in all three databases.

Table 6 describes the execution time in milliseconds (ms) of MongoDB, Cassandra and Couchbase. The workload B is performed, i.e. 80% read operations, and 20% update operations. The values in case of all the considered NoSQL databases increases as the number of operation increases. The overall execution time here in case of workload B is minimum for Couchbase as compared to other two databases. The graphical representation is given below in Figure 4.

Figure 4 depicts the execution time for workload B. At 10,000 operations, the execution time of all considered NoSQL databases is less. As the number of operation increases, i.e. at 30,000 MongoDB and Cassandra showed larger execution time. MongoDB gave highest execution time at 50,000 operations. In case of MongoDB and Cassandra NoSQL databases, the execution time decreases as compared to previous workload, but is higher than Couchbase. The Couchbase performs better in all different operations and gives high performance as compared to MongoDB and Cassandra. As ratio of read operations are increased, the execution time decreases. This proves that SELECT operations can increase performance by decreasing the execution time.

3.3 Evaluation over Workload C

The evaluation of workload C, i.e. 20% read operations and 80% update operations has been done by calculating throughput and execution time considering the same number of operations for all three databases.

3.3.1 Throughput

Table 7 describes the Throughput (operations/sec) over different operations in all three databases. The performance will be maximum if the throughput is maximum. Table 7 describes the throughput values of MongoDB, Cassandra and Couchbase. These values are projected to present the throughput while performing workload C. Figure 5 show the results obtained while executing workload C that consists of 20% read and 80% updates, over 500,000 records.

In Figure 5, the result of throughput values of workload C, at 50,000 operations for both Cassandra and MongoDB shows higher values of throughput when compared with previous workloads. It gives the clear picture that both MongoDB and Cassandra performs better in case of write operations. Cassandra also shows slight increase in throughput when the number of operations are increased. On the other hand, MongoDB shows minimum throughput in all different operation counts. In Cassandra, the throughput values slightly increases which means, as the number of write operation increases, its performance increases as well. In Couchbase, the throughput values are very much increased while observing different operation counts, be it 10,000 or 30,000 and so on. Here Couchbase gave higher values, but its performance downs as write operations increased.

3.3.2 Execution Time

Maximum performance can be achieved if the time for execution will be less. Table 8 below describes the Execution time (ms) over different number of operations in all three databases.

Table 8 describes the execution time in milliseconds (ms) of MongoDB, Cassandra and Couchbase. The values are drawn to represent the execution time over workload C. Figure 6 shows the results obtained while executing workload C that consists of 20% read and 80% updates, over 500,000 records.

The results of the execution of workload C indicate that the Couchbase and Cassandra, showed the better execution time during 10,000 operations and so on. MongoDB presented the worst result, and it is much lower compared to the Column Family database Cassandra and Document Oriented database Couchbase. Its execution time increases at 30,000 operations. It has been observed that MongoDB takes larger time to complete 50,000 operations. Given a large number of records, MongoDB showed more difficulty during execution of operations as write operation ratio increases. Cassandra shows a high performance over the workload C. The best outcome was given by Couchbase, it had less execution time and a high performance, but is less as compared to workload B.

3.4 Comparative Results

The comparative results include overall results obtained from different number of operations, i.e. 10,000 operations, 30,000 operations and 50,000 operations. The comparison has been given by evaluating workloads A, B and C. Table 9 above shows the comparative results.

In above Table 9, the comparison has been given on the basis of different parameters. MongoDB has low values of throughput and its performance was medium in read and write operations. Its execution time is also high. Cassandra has medium throughput and low execution time. It gives high performance in update operations. The Couchbase gives high throughput and high performance in read and update operations. It has taken very low time for executing operations.

The performance has been evaluated for CRUD operations in NoSQL databases: MongoDB, Cassandra and Couchbase. The performance of these databases is evaluated in terms of read and update operations. The different workloads were designed to check the throughput and execution time. These workloads were performed on three different operations sets. By analyzing all the results, it is concluded that MongoDB does not support workloads and gives equal response in all the workloads. Cassandra shows its performance better in case of workload C which means its performance increases as the number of update operation increases. From the results it has been observed that Couchbase shows greater performance in all workloads. Couchbase performs the best in case of read operations. It gives little low when update operations are considered. Overall, Couchbase performs best as compared with MongoDB and Cassandra.

[1]. A.B.M. Moniruzzaman and S.A. Hossain, (2013). “NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison”. International Journal of Database Theory and Application, Vol. 6, No. 4, pp. 1–14.

[2]. A. Ron, B. Sheba, and A. Shulman-peleg, (2015). “No th SQL, No Injection? Examining NoSQL Security”. 8 International Conference on Databases, IEEE, California, US.

[3]. B. Saraladevi, N. Pazhaniraja, P.V. Paul, M.S.S. Basha, and P. Dhavachelvan, (2015). “Big Data and Hadoop-A n d Study in Security Perspective”. 2 International Symposium on Big Data and Cloud Computing, Vol. 50, pp. 596–601.

[4]. C.-O. Truica, F. Radulescu, A. Boicea, and I. Bucur, (2015). “Performance Evaluation for CRUD Operations in Asynchronously Replicated Document Oriented th Database”. 20 International Conference on Control Systems and Computer Science, IEEE, pp. 191–196.

[5]. G. Aydin, I.R. Hallac, and B. Karakus, (2015). “Architecture and implementation of a scalable sensor data storage and analysis system using cloud computing and big data technologies”. Hindawi Journal of Sensors, Vol. 9, No. 02, pp. 1-11.

[6]. I.A.T. Hashem, I. Yaqoob, N. Badrul Anuar, S. Mokhtar, A. Gani, and S. Ullah Khan, (2014). “The rise of 'Big Data' on cloud computing: Review and open research issues”. Information Systems, Vol. 47, No. 7, pp. 98–115.

[7]. J. Pokorny, (2011). “NoSQL Databases: A step to database scalability in Web environment”. International Conference on WEB Information Systems, Vol. 9, No. 1, pp. 69-82.

[8]. J.R. Lourenço, V. Abramova, M. Vieira, B. Cabral, and J.B. Bernardino, (2015). “NOSQL databases: A software engineering perspective”. Advances in Intelligent Systems and Computing, Springer, Vol. 353, No. 6, pp. 741–750.

[9]. K. Barmpis and D.S. Kolovos, (2014). “Evaluation of Contemporary Graph Databases for Efficient Persistence of Large-Scale Models”. Journal of Object Technology, Vol. 13, No. 3, pp. 1-26.

[10]. K. Chitra and B. Jeevarani, (2013). “Study on Basically Available, Scalable and Eventually Consistent NOSQL Databases”. International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 3, No. 5, pp. 991–996.

[11]. K. Zvarevashe and T.T. Gotora, (2014). “A Random Walk through the Dark Side of NoSQL Databases in Big Data Analytics”. International Journal of Science and Research, Vol. 3, No. 6, pp. 506-509.

[12]. Manoj V., (2014). “Comparative Study of NoSQL Document, Column Store Databases and Evaluation of Cassandra”. International Journal of Database Management Systems, Vol. 6, No. 4, pp. 11–26.

[13]. P. Soni and N.S. Yadav, (2015). “Quantitative Analysis of Document Stored Databases”. International Journal of Computer Applications, Vol. 118, No. 20, pp. 37–41.

[14]. R. Aniceto and R. Xavier, (2015). “Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency ”. Hindawi Publishing Corporation International Journal of Genomics, Vol. 25, No. 03.

[15]. S. Kaisler, F. Armour, and J. A. Espinosa, (2014). “Introduction to Big Data: Challenges, Opportunities, and th Realities Minitrack”. 47 Hawaii International Conference on System Sciences (HICSS), pp. 728–728.

[17]. S.S. Pore and S.B. Pawar, (2015). “Comparative Study of SQL & NoSQL Databases”. International Journal of Advanced Research in Computer Engineering & Technology, Vol. 4, No. 5, pp. 1747–1753.

[19]. V. Sharma and M. Dave, (2012). “SQL and NoSQL Databases”. International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 2, No. 8, pp. 20–27.

Performance Evaluation for Crud Operations in NoSQL Databases

Abstract

Keywords :

Introduction

1. Literature Review

2. Experimental Setup

3. Experimental Evaluation