I recently ran into a case where Cassandra fits in perfectly to store time based events with custom ttls per event type (the other solution would be to save it in hadoop and do the bookkeeping manually (ttls and stuff, IMHO a very complex idea) or switch to hbase). The question is how good the cassandra MapReduce support works out of the box without Datastax Enterprise edition.
It seems that they invested a lot in CassandraFS but I ask myself if the normal Pig CassandraLoader is actively maintained and actually scales (as it seems to do nothing more than to iterate over the rows in slices). Does this work for 100s of millions of rows?
You can map/reduce using random partitioner but of course the keys you get are in random order. you probably want to use CL = 1 in cassandra so you don't ahve to read in from 2 nodes each time while doing map/reduce though and it should read the local data. I have not used Pig though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With