Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Documentation on Cassandra Composite Columns

When I try to find information on composite columns, I cannot find anything newer than 2013 (specifically this one is Google's top link, which has no CQL code when talking about using the composite columns, and apparently uses a very old Java driver). Do composite columns still exist in newer versions of Cassandra? I mean, apart from having a composite key.

I am new to Cassandra and actually want to learn if they are suitable for my use-case, described in the following. Consider a table with 4 double-valued columns, say w, x, y, z. These data are collected from 3 sources, say a, b and c. Each source may be missing some part of the data, so there are a maximum of 12 numbers at each row of the table. Instead of creating 3 tables with 4 columns to store values from the different sources, and later merging the tables to fill in the missing fields, I am thinking of having a table that models the 4 data columns as 4 super columns or composite columns. Something like a:w, b:w, c:w, a:x, b:x, c:x, a:y, b:y, c:y, a:z, b:z, c:z. Additionally, every row has a timestamp as the primary key.

What I want to find out is whether I can have a query like SELECT *:w AS w FROM MyTable such that for every row, one value for x is returned from any source that is available (doesn't matter from which source). Although I want to also preserve the capability to retrieve data from a specific source, like SELECT a:w FROM MyTable.

----------------------------------------------------------------
| key | a:w | b:w | c:w | a:x | b:x | c:x | a:y | b:y | c:y | ...
----------------------------------------------------------------
|  1  | 10  |  10 |  -  | ....
|  2  |  -  |  1  |  2  | ....
|  3  | 11  |  -  |  -  | ....
|  4  | 12  |  11 |  11 | ....
-----------------------------------------------------------------

SELECT *:w AS w FROM MyTable
(10, 1, 11, 12)   // would be an acceptable answer

SELECT a:w AS w FROM MyTable
(10, 11, 12)      // would be an acceptable answer
like image 497
Mahdi Avatar asked Mar 13 '23 03:03

Mahdi


1 Answers

Composite column is a vocabulary related to Thrift protocol. Internally, until Cassandra 2.2 the storage engine still deals with composite columns and translates them into clustering column, the new vocabulary that comes with CQL.

Since Cassandra 3.x, the storage engine has been rewritten so we no longer store data using composite columns. We align the storage engine with the new CQL semantics e.g. Partition key/clustering column. For backward compatibility we still translate clustering column back to composite column semantics when dealing with legacy Thrift protocol.

If you just start with Cassandra, forget about the old Thrift protocol and use right-away CQL semantics.

For your needs, the following schema should do the job:

CREATE TABLE my_data(
   data text,
   source text,
   PRIMARY KEY ((data), source)
);

INSERT INTO my_data(data, source) VALUES('data1','src1');
INSERT INTO my_data(data, source) VALUES('data1','src2');
...
INSERT INTO my_data(data, source) VALUES('dataN','src1');
...
INSERT INTO my_data(data, source) VALUES('dataN','srcN');

//Select all sources for data1
SELECT source FROM my_data WHERE data='data1';

//Select data and source
SELECT * FROM my_data WHERE data='data1' AND source='src1';
like image 77
doanduyhai Avatar answered Mar 14 '23 17:03

doanduyhai