Some of the answers to this question deal with older versions of Cassandra. The correct answer for this kind of problem depends on the version of Cassandra you are using.
I have a profile column family and want to store a list of skills in each profile. I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrift or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable or queryable via CQL from the command line. The other solution I thought of would be to use a super column and put the skill as the key with a null value:
skills: {
  'java': '',
  'c++': '',
  'cobol': ''
}
Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanax client library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns?
This answer dates to before the release of Cassandra version 1.2, which provided substantially different functionality for handling lists. The answer might be inappropriate if you are using Cassandra 1.2+.
I would encode lists in the column key, using composite columns with the real column name as the first dimension, ie:
row_key -> {
     [column_name; entry1] -> "",
     [column_name; entry2] -> "",
     ... 
}
Then, to read the list, you would need to do a get_slice from [column_name; ] to [column_name; ] - note the empty dimensions.
The great thing about this is it actually implements a set quite nicely; the list cannot contains the same thing twice. I think thins works in your usecase. The list would also be maintained in sorted order.
This answer dates to before the release of Cassandra version 1.2, which provided substantially different functionality for handling lists. The answer might be inappropriate if you are using Cassandra 1.2+.
As mentioned on the mailing list, my preference which has worked very well for me, is to store a single column "skills" with the value being a serialized JSON string.
Really comes down to the usage patterns you have for "skills".
In older versions of Cassandra, you had to serialize the list yourself and store it in a column, or perhaps use a super column.
Since version 1.2 of Cassandra, CQL3 has collection types for columns, so you can give list<text> as the type of a column in your schema. For example:
 CREATE TABLE Person (
    name text,
    skills list<text>,
    PRIMARY KEY (name)
 );
Or you could use set<text> if you want to automatically eliminate duplicates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With