When I try to find information on composite columns, I cannot find anything newer than 2013 (specifically this one is Google's top link, which has no CQL code when talking about using the composite columns, and apparently uses a very old Java driver). Do composite columns still exist in newer versions of Cassandra? I mean, apart from having a composite key.
I am new to Cassandra and actually want to learn if they are suitable for my use-case, described in the following. Consider a table with 4 double-valued columns, say w
, x
, y
, z
. These data are collected from 3 sources, say a
, b
and c
. Each source may be missing some part of the data, so there are a maximum of 12 numbers at each row of the table.
Instead of creating 3 tables with 4 columns to store values from the different sources, and later merging the tables to fill in the missing fields, I am thinking of having a table that models the 4 data columns as 4 super columns
or composite columns
. Something like a:w
, b:w
, c:w
, a:x
, b:x
, c:x
, a:y
, b:y
, c:y
, a:z
, b:z
, c:z
. Additionally, every row has a timestamp as the primary key.
What I want to find out is whether I can have a query like SELECT *:w AS w FROM MyTable
such that for every row, one value for x
is returned from any source that is available (doesn't matter from which source). Although I want to also preserve the capability to retrieve data from a specific source, like SELECT a:w FROM MyTable
.
----------------------------------------------------------------
| key | a:w | b:w | c:w | a:x | b:x | c:x | a:y | b:y | c:y | ...
----------------------------------------------------------------
| 1 | 10 | 10 | - | ....
| 2 | - | 1 | 2 | ....
| 3 | 11 | - | - | ....
| 4 | 12 | 11 | 11 | ....
-----------------------------------------------------------------
SELECT *:w AS w FROM MyTable
(10, 1, 11, 12) // would be an acceptable answer
SELECT a:w AS w FROM MyTable
(10, 11, 12) // would be an acceptable answer
Composite column is a vocabulary related to Thrift protocol. Internally, until Cassandra 2.2 the storage engine still deals with composite columns and translates them into clustering column, the new vocabulary that comes with CQL.
Since Cassandra 3.x, the storage engine has been rewritten so we no longer store data using composite columns. We align the storage engine with the new CQL semantics e.g. Partition key/clustering column. For backward compatibility we still translate clustering column back to composite column semantics when dealing with legacy Thrift protocol.
If you just start with Cassandra, forget about the old Thrift protocol and use right-away CQL semantics.
For your needs, the following schema should do the job:
CREATE TABLE my_data(
data text,
source text,
PRIMARY KEY ((data), source)
);
INSERT INTO my_data(data, source) VALUES('data1','src1');
INSERT INTO my_data(data, source) VALUES('data1','src2');
...
INSERT INTO my_data(data, source) VALUES('dataN','src1');
...
INSERT INTO my_data(data, source) VALUES('dataN','srcN');
//Select all sources for data1
SELECT source FROM my_data WHERE data='data1';
//Select data and source
SELECT * FROM my_data WHERE data='data1' AND source='src1';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With