Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Wide Row/Dynamic Columns

I'm new to NoSQL; so, I'm trying to understand some of the Cassandra concepts that I can't really get from the dozens of sources that I have studied.

  1. Should I consider wide row, and dynamic columns as synonyms; or are they 2 different concepts?
  2. Am I correct in thinking of columns of collection types as wide row?
  3. It seems to me that wide row is a concept from earlier versions of Cassandra, and can be created only through Thrift API; whereas collection types are the modern versions of wide rows.
  4. Are collection types still limited to 64k elements? Or that after CQL 3, that limitation has been removed?
like image 964
user1888243 Avatar asked Jun 19 '17 15:06

user1888243


People also ask

How are wide rows displayed in CQL?

Cassandra uses a special primary key called a composite key (or compound key) to represent wide rows, also called partitions. The composite key consists of a partition key, plus an optional set of clustering columns.

What is wide row in Apache Cassandra?

When actual data is stored in column names, we end up with wide rows. Benefits of wide rows: Since column names are stored physically sorted, wide rows enable ordering of data and hence efficient filtering (range scans). You'll still be able to efficiently look up an individual column within a wide row, if needed.

How many columns can Cassandra have?

As per Cassandra technical limitation page, total no. of cells together cannot exceed 2 billion cells (rows X columns). You can have a table with (1 row X 2 billion columns) and no more rows will be allowed in that table, so the limit is not 2 billion columns per row but limit is on total no. of cells in a partition.

What is static column in Cassandra?

Static columns store values that are shared by all rows in the same partition.


1 Answers

A common misunderstanding is that CQL does not support dynamic columns or wide rows. On the contrary, CQL was designed to support everything you can do with the Thrift model, but make it easier and more accessible.

Let's take a look at the below cql table.

CREATE TABLE data (
  sensor_id int,
  collected_at timestamp,
  volts float,
  PRIMARY KEY (sensor_id, collected_at)
);

And insert some data

sensor_id | collected_at             | volts
----------+--------------------------+-------
   1      | 2013-06-05 15:11:00-0500 |   3.1
   1      | 2013-06-05 15:11:10-0500 |   4.3
   1      | 2013-06-05 15:11:20-0500 |   5.7
   2      | 2013-06-05 15:11:00-0500 |   3.2
   3      | 2013-06-05 15:11:00-0500 |   3.3
   3      | 2013-06-05 15:11:10-0500 |   4.3

Here clustering column collected_at is similar to Thrift dynamic column.(Q.1)

If we look at the internal structure of this table

RowKey: 1
=> (cell=2013-06-05 15:11:00-0500, value=3.1, timestamp=1370463146717000)
=> (cell=2013-06-05 15:11:10-0500, value=4.3, timestamp=1370463282090000)
=> (cell=2013-06-05 15:11:20-0500, value=5.7, timestamp=1370463282093000)
-------------------
RowKey: 2
=> (cell=2013-06-05 15:11:00-0500, value=3.2, timestamp=1370463332361000)
-------------------
RowKey: 3
=> (cell=2013-06-05 15:11:00-0500, value=3.3, timestamp=1370463332365000)
=> (cell=2013-06-05 15:11:10-0500, value=4.3, timestamp=1370463332368000)

You can see that the clustering column collected_at makes this table table wide row (Q.1).

So we can say that if a table have one or more clustering key, we can called that table wide row.

Let's take another example :

CREATE TABLE example (
    key1 text PRIMARY KEY,
    map1 map<text,text>,
    list1 list<text>,
    set1 set<text>
);

Insert a data :

 key1 | list1             | map1                                         | set1
------+-------------------+----------------------------------------------+-----------------------
 john | ['doug', 'scott'] | {'doug': '555-1579', 'patricia': '555-4326'} | {'patricia', 'scott'}

Now take a look at the internal structure :

RowKey: john
=> (column=, value=, timestamp=1374683971220000)
=> (column=map1:doug, value='555-1579', timestamp=1374683971220000)
=> (column=map1:patricia, value='555-4326', timestamp=1374683971220000)
=> (column=list1:26017c10f48711e2801fdf9895e5d0f8, value='doug', timestamp=1374683971220000)
=> (column=list1:26017c12f48711e2801fdf9895e5d0f8, value='scott', timestamp=1374683971220000)
=> (column=set1:'patricia', value=, timestamp=1374683971220000)
=> (column=set1:'scott', value=, timestamp=1374683971220000)

You can see that map key and set value stored as dynamic column and map value and list value stored as the value of that column. It's similar to wide row (Q.2)

And the last one : Collection type map key and set size is limited to 64k.

  • Collection (List): collection limit: ~2 billion (2^31); values size: 65535 (216-1)
  • Collection (Set): collection limit: ~2 billion (2^31); values size: 65535 (216-1)
  • Collection (Map): collection limit: ~2 billion (2^31); number of keys: 65535 (216-1); values size: 65535 (216-1)

Source :
https://www.datastax.com/blog/2013/06/does-cql-support-dynamic-columns-wide-rows https://teddyma.gitbooks.io/learncassandra/content/model/cql_and_data_structure.html http://docs.datastax.com/en/cql/3.3/cql/cql_reference/refLimits.html

like image 125
Ashraful Islam Avatar answered Oct 30 '22 03:10

Ashraful Islam