Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra low performance?

I have to choose Cassandra or MongoDB(or another nosql database, I accept suggestions) for a project with a lot of inserts(1M/day). So I create a small test to measure the write performance. Here's the code to insert in Cassandra:

import time
import os
import random
import string
import pycassa

def get_random_string(string_length):
    return ''.join(random.choice(string.letters) for i in xrange(string_length))

def connect():
    """Connect to a test database"""
    connection = pycassa.connect('test_keyspace', ['localhost:9160'])
    db = pycassa.ColumnFamily(connection,'foo')
    return db

def random_insert(db):
    """Insert a record into the database. The record has the following format
    ID timestamp
    4 random strings
    3 random integers"""
    record = {}
    record['id'] = str(time.time())
    record['str1'] = get_random_string(64)
    record['str2'] = get_random_string(64)
    record['str3'] = get_random_string(64)
    record['str4'] = get_random_string(64)
    record['num1'] = str(random.randint(0, 100))
    record['num2'] = str(random.randint(0, 1000))
    record['num3'] = str(random.randint(0, 10000))
    db.insert(str(time.time()), record)

if __name__ == "__main__":
    db = connect()
    start_time = time.time()
    for i in range(1000000):
        random_insert(db)
    end_time = time.time()
    print "Insert time: %lf " %(end_time - start_time)

And the code to insert in Mongo it's the same changing the connection function:

def connect():
    """Connect to a test database"""
    connection = pymongo.Connection('localhost', 27017)
    db = connection.test_insert
    return db.foo2

The results are ~1046 seconds to insert in Cassandra, and ~437 to finish in Mongo. It's supposed that Cassandra it's much faster than Mongo inserting data. So , What i'm doing wrong?

like image 761
fasouto Avatar asked Jan 26 '11 12:01

fasouto


People also ask

Why is Cassandra so slow?

Slow Cassandra database performance encountered in the cassandra-stderr. log file indicates insufficient memory lock settings. Even though most of the time it will not cause any problem, it may potentially affect Cassandra performance in some cases.

Is Cassandra low latency?

Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages. Failed nodes can be replaced with no downtime.

How much RAM does Cassandra need?

While Cassandra can be made to run on small servers for testing or development environments (including Raspberry Pis), a minimal production server requires at least 2 cores, and at least 8GB of RAM. Typical production servers have 8 or more cores and at least 32GB of RAM.


1 Answers

There is no equivalent to Mongo's unsafe mode in Cassandra. (We used to have one, but we took it out, because it's just a Bad Idea.)

The other main problem is that you're doing single-threaded inserts. Cassandra is designed for high concurrency; you need to use a multithreaded test. See the graph at the bottom of http://spyced.blogspot.com/2010/01/cassandra-05.html (actual numbers are over a year out of date but the principle is still true).

The Cassandra source distribution has such a test included in contrib/stress.

like image 114
jbellis Avatar answered Oct 20 '22 14:10

jbellis