Reading several papers and documents on internet, I found many contradictory information about the Cassandra data model. There are many which identify it as a column oriented database, other as a row-oriented and then who define it as a hybrid way of both. According to what I know about how Cassandra stores file, it uses the *-Index.db file to access at the right position of the *-Data.db file where it is stored the bloom filter, column index and then the columns of the required row. In my opinion, this is strictly row-oriented. Is there something I'm missing?

<ul> <li>If you take a look at the Readme file at Apache Cassandra git repo, it says that,</li> </ul> <blockquote> Cassandra is a partitioned row store. Rows are organized into tables with a required primary key. Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. </blockquote> <ul> <li> Column oriented or columnar databases are stored on disk column wise. e.g: Table <code>Bonuses</code> table <pre class="prettyprint"><code> ID Last First Bonus 1 Doe John 8000 2 Smith Jane 4000 3 Beck Sam 1000 </code></pre> </li> <li> In a row-oriented database management system, the data would be stored like this: <code>1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;</code> </li> <li> In a column-oriented database management system, the data would be stored like this: <code>1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000; </code> </li> <li> Cassandra is basically a column-family store </li> <li> Cassandra would store the above data as, </li> </ul> <pre class="prettyprint"><code> "Bonuses" : { row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000}, row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000} ... } </code></pre> <ul> <li> Also, the number of columns in each row doesn't have to be the same. One row can have 100 columns and the next row can have only 1 column. </li> <li> Read this for more details. </li> </ul>

Why many refer to Cassandra as a Column oriented database?

Tags:

data-modeling

nosql

cassandra

column-oriented

wide-column-store

Reading several papers and documents on internet, I found many contradictory information about the Cassandra data model. There are many which identify it as a column oriented database, other as a row-oriented and then who define it as a hybrid way of both.

According to what I know about how Cassandra stores file, it uses the *-Index.db file to access at the right position of the *-Data.db file where it is stored the bloom filter, column index and then the columns of the required row.

In my opinion, this is strictly row-oriented. Is there something I'm missing?

552

asked Oct 22 '12 11:10

cesare

2 Answers

If you take a look at the Readme file at Apache Cassandra git repo, it says that,

Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns.

Column oriented or columnar databases are stored on disk column wise.

e.g: Table Bonuses table

  ID         Last    First   Bonus   1          Doe     John    8000   2          Smith   Jane    4000   3          Beck    Sam     1000

In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
In a column-oriented database management system, the data would be stored like this:
1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
Cassandra is basically a column-family store
Cassandra would store the above data as,

     "Bonuses" : {            row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},            row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}            ...      }

Also, the number of columns in each row doesn't have to be the same. One row can have 100 columns and the next row can have only 1 column.
Read this for more details.

117

answered Oct 18 '22 01:10

tharindu_DG

Yes, the "column-oriented" terminology is a bit confusing.

The model in Cassandra is that rows contain columns. To access the smallest unit of data (a column) you have to specify first the row name (key), then the column name.

So in a columnfamily called Fruit you could have a structure like the following example (with 2 rows), where the fruit types are the row keys, and the columns each have a name and value.

apple -> colour  weight  price variety          "red"   100     40    "Cox"  orange -> colour    weight  price  origin           "orange"  120     50     "Spain"

One difference from a table-based relational database is that one can omit columns (orange has no variety), or add arbitrary columns (orange has origin) at any time. You can still imagine the data above as a table, albeit a sparse one where many values might be empty.

However, a "column-oriented" model can also be used for lists and time series, where every column name is unique (and here we have just one row, but we could have thousands or millions of columns):

temperature ->  2012-09-01  2012-09-02  2012-09-03 ...                 40          41          39         ...

which is quite different from a relational model, where one would have to model the entries of a time series as rows not columns. This type of usage is often referred to as "wide rows".

answered Oct 18 '22 02:10

DNA

Related questions
                            
                                Cassandra Client Java API's [closed]
                            
                                Inserting a hard-coded UUID via CQLsh (Cassandra)
                            
                                How to auto generate uuid in cassandra CQL 3 command line
                            
                                How do I delete all data in a Cassandra column family?
                            
                                How to create auto increment IDs in Cassandra
                            
                                cassandra - Saved cluster name Test Cluster != configured name
                            
                                Export cassandra query result to a csv file
                            
                                Cassandra UUID vs TimeUUID benefits and disadvantages
                            
                                How to obtain number of rows in Cassandra table
                            
                                Cassandra server throws java.lang.AssertionError: DecoratedKey(...) != DecoratedKey
                            
                                Cassandra cqlsh - connection refused
                            
                                How to list all cassandra tables
                            
                                Switching from MySQL to Cassandra - Pros/Cons?
                            
                                How to run cql files (.cql) from within cqlsh?
                            
                                MongoDB vs. Cassandra vs. MySQL for real-time advertising platform
                            
                                How to run shell script file using nodejs?
                            
                                Understand cassandra replication factor versus consistency level
                            
                                What's The Best Practice In Designing A Cassandra Data Model? [closed]
                            
                                How to get current timestamp with CQL while using Command Line?
                            
                                what is the recommend cassandra gui client for cassandra-1.1.2 [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With