I recently started trying out some noSQL prototypes for a customer. They got a real-time application which does lots of inserts, but less reads(Currently they are using MySql and would like to try out some noSQL solutions) Over the weekend I tried Cassandra 2.0, MongoDB 2.4.9 and Redis to be compared to a normal Mysql 5.5 DB. All are running in my Windows i3 core 2.30 Ghz/8GB RAM laptop, so no high-end fancy machines. The table structure is a simple one as below. Though it's the MySql DESC, Cassandra has the same structure, and in MongoDb it's stored as a JSON/BSON but got the same structure and indexes. It has got two indexes (oneway_id & twoway_id) for all the three db's. Structure (For all four db's) <pre class="prettyprint"><code>+--------------+---------------------+ | Field | Type | +--------------+---------------------+ | tmstamp | bigint(20) unsigned | | field_1 | bigint(20) unsigned | | field_2 | varchar(64) | | field_3 | varchar(64) | | field_4 | tinyint(3) unsigned | | field_5 | bigint(20) unsigned | | field_6 | varchar(25) | | field_7 | varchar(15) | | field_8 | varchar(15) | | field_9 | varchar(15) | +--------------+---------------------+ </code></pre> DB/Environment details <ul> <li>MySql 5.6(64 bit) with mysql java connector 5.1.28 </li> <li>Apache Cassandra 2.0 with datastax 2.0 Java drivers </li> <li>MongoDB 2.4.6 with mongo Java driver 2.12.0</li> <li>Redis 2.8.17 running on a linux machine</li> <li>Oracle Java 1.6(64 bit)</li> <li>Microsoft Windows 7(64 bit)</li> <li>Intel i3 core 2.30 Ghz processor</li> <li>8GB RAM </li> </ul> Created a simple java test cases and these are the results I got (Though not consistently the same numbers but latencies are pretty much the same way): 100,000 Records <ul> <li>MySql 1000,000 - 46 secs </li> <li>Cassandra - 54 secs </li> <li>MongoDb - 2 secs</li> </ul> 500,000 Records <ul> <li>MySql 1000,000 - 142 secs </li> <li>Cassandra - 299 secs </li> <li>MongoDb - 41 secs</li> </ul> 1,000,000 Records <ul> <li>MySql 1000,000 - 349 secs </li> <li>Cassandra - 699 secs </li> <li>MongoDb - 51 secs</li> <li>Redis - 34 secs</li> </ul> <blockquote> My question is why does Cassandra takes this long for such a small and simple table inserts? </blockquote> In Cassandra I tried both inline looped sql inserts & Batch inserts. The funny thing is batch inserts took more time. The document I followed for batch inserts is: http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0 I don't want to use asyncExecute, because it doesn't gives me the exact insert time. Batch insert which I used is something like this(Which takes ages more than normal insert) <pre class="prettyprint"><code>PreparedStatement ps = session.prepare("INSERT INTO some_table (val_1, val_2, val_3, val_4) VALUES (?, ?, ?, ?)"); BatchStatement batch = new BatchStatement(); //for loop start batch.add(ps.bind(uid, mid1, title1, body1)); //for loop end session.execute(batch); </code></pre> inline loop I used insert is something like this <pre class="prettyprint"><code>String sqlInsert = "INSERT INTO some_table (val_1, val_2, val_3, val_4) VALUES ("; // for loop start sqlInsert += uid+", "+", "+mid1+", "+title1+", "+body1+")"; session.execute(sqlInsert); // for loop end </code></pre> <blockquote> Now why is Cassandara slower than mysql and more important - why is MongoDB much much faster than Cassandra? I seriously wish I am doing something wrong? Is there a way I can insert JSON/BSON objects directly to Cassandra like MongoDB does? I guess that might make it fast? Can some experts please help me on this? If there are no answers I'll conclude that MongoDB is better than Cassandra! </blockquote>

Your code is using serial inserts. Each insert must wait for the previous to complete and return an acknowledgement before the next can begin. This is a bad way to benchmark any database that can handle multiple incoming connections. If you really don't want to use execute_async (the correct approach) you should write a multi-threaded stress program so that the inserts are not blocking (on the client side) and you are truly limited by the Cassandra node. Basically what you are seeing is the speed at which your client program can run rather than the capability of the database. Blog Post of Interest points of interest: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra <blockquote> There are only two principles to doing load generation right: Feed Cassandra enough work Generate the workload on separate machines That’s it! But it’s frequently done wrong, from the extreme case of a single-threaded client running on the same laptop as Cassandra, to more subtle problems with the Python Global Interpreter Lock. It seems that like binary search, it’s surprisingly difficult to build a good load generator. If possible, avoid the temptation of rolling your own and use something battle-tested. </blockquote>

Why Apache Cassandra writes are so slow compared to MongoDB, Redis & MySql [closed]

Tags:

mongodb

nosql

cassandra

cassandra-2.0

I recently started trying out some noSQL prototypes for a customer. They got a real-time application which does lots of inserts, but less reads(Currently they are using MySql and would like to try out some noSQL solutions)

Over the weekend I tried Cassandra 2.0, MongoDB 2.4.9 and Redis to be compared to a normal Mysql 5.5 DB. All are running in my Windows i3 core 2.30 Ghz/8GB RAM laptop, so no high-end fancy machines.

The table structure is a simple one as below. Though it's the MySql DESC, Cassandra has the same structure, and in MongoDb it's stored as a JSON/BSON but got the same structure and indexes. It has got two indexes (oneway_id & twoway_id) for all the three db's.

Structure (For all four db's)

+--------------+---------------------+
| Field        | Type                |
+--------------+---------------------+
| tmstamp      | bigint(20) unsigned |
| field_1      | bigint(20) unsigned |
| field_2      | varchar(64)         |
| field_3      | varchar(64)         |
| field_4      | tinyint(3) unsigned |
| field_5      | bigint(20) unsigned |
| field_6      | varchar(25)         |
| field_7      | varchar(15)         |
| field_8      | varchar(15)         |
| field_9      | varchar(15)         |
+--------------+---------------------+

DB/Environment details

MySql 5.6(64 bit) with mysql java connector 5.1.28
Apache Cassandra 2.0 with datastax 2.0 Java drivers
MongoDB 2.4.6 with mongo Java driver 2.12.0
Redis 2.8.17 running on a linux machine
Oracle Java 1.6(64 bit)
Microsoft Windows 7(64 bit)
Intel i3 core 2.30 Ghz processor
8GB RAM

Created a simple java test cases and these are the results I got (Though not consistently the same numbers but latencies are pretty much the same way):

100,000 Records

MySql 1000,000 - 46 secs
Cassandra - 54 secs
MongoDb - 2 secs

500,000 Records

MySql 1000,000 - 142 secs
Cassandra - 299 secs
MongoDb - 41 secs

1,000,000 Records

MySql 1000,000 - 349 secs
Cassandra - 699 secs
MongoDb - 51 secs
Redis - 34 secs

My question is why does Cassandra takes this long for such a small and simple table inserts?

In Cassandra I tried both inline looped sql inserts & Batch inserts. The funny thing is batch inserts took more time. The document I followed for batch inserts is:

http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0

I don't want to use asyncExecute, because it doesn't gives me the exact insert time.

Batch insert which I used is something like this(Which takes ages more than normal insert)

PreparedStatement ps = session.prepare("INSERT INTO some_table (val_1, val_2, val_3, val_4) VALUES (?, ?, ?, ?)");
BatchStatement batch = new BatchStatement();

//for loop start
batch.add(ps.bind(uid, mid1, title1, body1));
//for loop end

session.execute(batch);

inline loop I used insert is something like this

String sqlInsert = "INSERT INTO some_table (val_1, val_2, val_3, val_4) VALUES (";

// for loop start

sqlInsert += uid+", "+", "+mid1+", "+title1+", "+body1+")";
session.execute(sqlInsert);

// for loop end

Now why is Cassandara slower than mysql and more important - why is MongoDB much much faster than Cassandra? I seriously wish I am doing something wrong?

Is there a way I can insert JSON/BSON objects directly to Cassandra like MongoDB does? I guess that might make it fast? Can some experts please help me on this? If there are no answers I'll conclude that MongoDB is better than Cassandra!

959

asked Mar 02 '14 13:03

Aneesh Vijendran

1 Answers

Your code is using serial inserts. Each insert must wait for the previous to complete and return an acknowledgement before the next can begin. This is a bad way to benchmark any database that can handle multiple incoming connections. If you really don't want to use execute_async (the correct approach) you should write a multi-threaded stress program so that the inserts are not blocking (on the client side) and you are truly limited by the Cassandra node. Basically what you are seeing is the speed at which your client program can run rather than the capability of the database.

Blog Post of Interest points of interest:

http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra

There are only two principles to doing load generation right:

Feed Cassandra enough work Generate the workload on separate machines That’s it! But it’s frequently done wrong, from the extreme case of a single-threaded client running on the same laptop as Cassandra, to more subtle problems with the Python Global Interpreter Lock. It seems that like binary search, it’s surprisingly difficult to build a good load generator. If possible, avoid the temptation of rolling your own and use something battle-tested.

157

answered Oct 15 '22 10:10

RussS

Related questions
                            
                                MongoDB Closest Match on properties
                            
                                Accessing a MongoDB Atlas Cluster from within Google Cloud Functions Console
                            
                                Mongo query take a long time. How make it more fast?
                            
                                Incorrect UTC date in MongoDB Compass
                            
                                How to open a Mongo Atlas backup snapshot locally?
                            
                                Mongoose: deleteOne middleware for cascading delete not working
                            
                                How to implement cookie authentication | SvelteKit & MongoDB
                            
                                Using MongoDB with Rails - Any Good Articles?
                            
                                remove an embedded document in mongoid
                            
                                Inherited Resources and Mongoid
                            
                                Scalable way of logging page request data from a PHP application?
                            
                                select only subdocuments or arrays
                            
                                Mongo get users whose id ends in a certain digit
                            
                                Find matching partial string
                            
                                Mongodb setting unique field
                            
                                Serving dynamic URLs with express and mongodb
                            
                                Migrating from mongodb to postgresql in rails
                            
                                MongoDB Aggregate Query Group by Dates
                            
                                Mongoose, how to do a find() with two or conditions
                            
                                Setting up spring app with spring data repositories and mongo db

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With