Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandras Map Reduce Support

I recently ran into a case where Cassandra fits in perfectly to store time based events with custom ttls per event type (the other solution would be to save it in hadoop and do the bookkeeping manually (ttls and stuff, IMHO a very complex idea) or switch to hbase). The question is how good the cassandra MapReduce support works out of the box without Datastax Enterprise edition.

It seems that they invested a lot in CassandraFS but I ask myself if the normal Pig CassandraLoader is actively maintained and actually scales (as it seems to do nothing more than to iterate over the rows in slices). Does this work for 100s of millions of rows?

like image 948
Tobias Avatar asked Nov 01 '12 09:11

Tobias


1 Answers

You can map/reduce using random partitioner but of course the keys you get are in random order. you probably want to use CL = 1 in cassandra so you don't ahve to read in from 2 nodes each time while doing map/reduce though and it should read the local data. I have not used Pig though.

like image 173
Dean Hiller Avatar answered Oct 13 '22 11:10

Dean Hiller