Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra random read speed

We're still evaluating Cassandra for our data store. As a very simple test, I inserted a value for 4 columns into the Keyspace1/Standard1 column family on my local machine amounting to about 100 bytes of data. Then I read it back as fast as I could by row key. I can read it back at 160,000/second. Great.

Then I put in a million similar records all with keys in the form of X.Y where X in (1..10) and Y in (1..100,000) and I queried for a random record. Performance fell to 26,000 queries per second. This is still well above the number of queries we need to support (about 1,500/sec)

Finally I put ten million records in from 1.1 up through 10.1000000 and randomly queried for one of the 10 million records. Performance is abysmal at 60 queries per second and my disk is thrashing around like crazy.

I also verified that if I ask for a subset of the data, say the 1,000 records between 3,000,000 and 3,001,000, it returns slowly at first and then as they cache, it speeds right up to 20,000 queries per second and my disk stops going crazy.

I've read all over that people are storing billions of records in Cassandra and fetching them at 5-6k per second, but I can't get anywhere near that with only 10mil records. Any idea what I'm doing wrong? Is there some setting I need to change from the defaults? I'm on an overclocked Core i7 box with 6gigs of ram so I don't think it's the machine.

Here's my code to fetch records which I'm spawning into 8 threads to ask for one value from one column via row key:

ColumnPath cp = new ColumnPath(); cp.Column_family = "Standard1"; cp.Column = utf8Encoding.GetBytes("site"); string key = (1+sRand.Next(9)) + "." + (1+sRand.Next(1000000)); ColumnOrSuperColumn logline = client.get("Keyspace1", key, cp, ConsistencyLevel.ONE);

Thanks for any insights

like image 720
Jody Powlette Avatar asked Jun 17 '10 12:06

Jody Powlette


People also ask

Is Cassandra DB fast?

Writing to in-memory data structure is much faster than writing to disk. Because of this, Cassandra writes are extremely fast!

How many requests can Cassandra handle?

Yes, Cassandra has Linear Scalability. The scalability is linear as shown in the chart below. Each client system generates about 17,500 write requests per second, and there are no bottlenecks as we scale up the traffic.


2 Answers

purely random reads is about worst-case behavior for the caching that your OS (and Cassandra if you set up key or row cache) tries to do.

if you look at contrib/py_stress in the Cassandra source distribution, it has a configurable stdev to perform random reads but with some keys hotter than others. this will be more representative of most real-world workloads.

like image 120
jbellis Avatar answered Oct 12 '22 22:10

jbellis


Add more Cassandra nodes and give them lots of memory (-Xms / -Xmx). The more Cassandra instances you have, the data will be partitioned across the nodes and much more likely to be in memory or more easily accessed from disk. You'll be very limited with trying to scale a single workstation class CPU. Also, check the default -Xms/-Xmx setting. I think the default is 1GB.

like image 23
Todd Avatar answered Oct 12 '22 23:10

Todd