Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra: Column Family vs Super Column Family

I have a requirement where i need my database to store the following data:

- For each build, store the results of 3 performance runs. The result includes tps and latency. 

Reading up on cassandra data model, this directly maps to a super column family of the following format:

BenchmarkSuperColumnFamily= {

build_1: {
   Run1: {1000K, 0.5ms}
   Run2: {1000K, 0.5ms}
   Run3: {1000K, 0.5ms}
}

build_2: {
   Run1: {1000K, 0.5ms}
   Run2: {1000K, 0.5ms}
   Run3: {1000K, 0.5ms}
}
...

}

But, i read in the following answer that the use of Super Column family is discouraged. I wanted to know if there is a better way of creating a model for my requirement.

PS, I borrowed the JSONish notation from the following article

like image 546
Chander Shivdasani Avatar asked Jun 25 '12 22:06

Chander Shivdasani


People also ask

What is the difference between a column and a super column in a column family database?

A column family can contain super columns or columns. A super column is a dictionary, it is a column that contains other columns (but not other super columns). A column is a tuple of name, value and timestamp (I'll ignore the timestamp and treat it as a key/value pair from now on).

Is Cassandra key-value or column family?

Cassandra is a NoSQL database, which is a key-value store. Some of the features of Cassandra data model are as follows: Data in Cassandra is stored as a set of rows that are organized into tables. Tables are also called column families.

Is column family same as table?

In analogy with relational databases, a column family is as a "table", each key-value pair being a "row". Each column is a tuple (triplet) consisting of a column name, a value, and a timestamp. In a relational database table, this data would be grouped together within a table with other non-related data.

How do you make a super column in Cassandra?

create column family User with comparator = UTF8Type and default_validation_class = UTF8Type and column_metadata = [ {column_name : name, validation_class : UTF8Type} {column_name : username, validation_class : UTF8Type} {column_name : City, validation_class : UTF8Type} ];


1 Answers

The StackOverflow answer that you linked to is correct. You shouldn't be using SuperColumns in new applications. They exist for backwards compatibility however.

In general, composite columns can be used to mimic any model provided by super columns. Basically, they allow you to separate your column names into multiple parts. So if you were to specify a comparator of 'CompositeType(UTF8Type, UTF8Type)', your data model wound end up looking like this:

BenchmarkColumnFamily= {

   build_1: {
       (Run1, TPS) : 1000K
       (Run1, Latency) : 0.5ms
       (Run2, TPS) : 1000K
       (Run2, Latency) : 0.5ms
       (Run3, TPS) : 1000K
       (Run3, Latency) : 0.5ms
    }

    build_2: {
       ...
    }
...

}

With the above model you could use a single query to get a single data point for a single run, all data points for a single run, or all data points for multiple runs.

More info on composite columns: http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1

like image 68
nickmbailey Avatar answered Sep 20 '22 14:09

nickmbailey