I want to insert a single row with 50,000 columns into Cassandra 1.2.8. Before inserting, I have all the data for the entire row ready to go (in memory):
+---------+------+------+------+------+-------+
| | 0 | 1 | 2 | ... | 49999 |
| row_id +------+------+------+------+-------+
| | text | text | text | ... | text |
+---------+------+------+------|------+-------+
The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.
CQL3 table definition:
create table results (
row_id text,
index int,
value text,
primary key (row_id, index)
)
with compact storage;
As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.
The only thing I can seem to find is to do execute the following 50,000 times:
INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
the first ?
is is an index counter (i
) and the second ?
is the text value to store at location i
.
This takes a lot of time. Even when we put the above INSERTs into a batch, it takes a lot of time.
We have all the data we need (the complete row) in its entirety, I would assume it to be very easy to just say "here, Cassandra, store this data as a single row in one request", for example:
//EXAMPLE-BUT-INVALID CQL3 SYNTAX:
insert into results (row_id, (index,value)) values
((0,text0), (1,text1), (2,text2), ..., (N,textN));
This example isn't possible via current CQL3 syntax, but I hope it illustrates the desired effect: everything would be inserted as a single query.
Is it possible to do this in CQL3 and the DataStax Java Driver? If not, I suppose I'll be forced to use Hector or the Astyanax driver and the Thrift batch_insert
operation instead?
There is a batch insert operation in Cassandra. You can batch together inserts, even in different column families, to make insertion more efficient. In Hector, you can use HFactory. createMutator then use the add methods on the returned Mutator to add operations to your batch.
When actual data is stored in column names, we end up with wide rows. Benefits of wide rows: Since column names are stored physically sorted, wide rows enable ordering of data and hence efficient filtering (range scans). You'll still be able to efficiently look up an individual column within a wide row, if needed.
To calculate the size of a row, we need to sum the size of all columns within the row and add that sum to the partition key size. Assuming the size of the partition key is consistent throughout a table, calculating the size of a table is almost identical to calculating the size of a partition.
The LIMIT option sets the maximum number of rows that the query returns: SELECT lastname FROM cycling. cyclist_name LIMIT 50000; Even if the query matches 105,291 rows, Cassandra only returns the first 50,000.
Multiple INSERTs / UPDATEs can be done using batch_mutate method in Thrift APIs, by making use of mutation multi-maps.
Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>();
List<Mutation> mutationList = new ArrayList<Mutation>();
mutationList.add(mutation);
Map<String, List<Mutation>> m = new HashMap<String, List<Mutation>>();
m.put(columnFamily, mutationList);
mutationMap.put(key, m);
client.batch_mutate(mutationMap, ConsistencyLevel.ALL);
Edit: only 4 days after I posted this question regarding Cassandra 1.2.9, Cassandra 2.0 final was released. 2.0 supports batch prepared statements, which should be much faster than the non-batched CQL3 that was required to be used for C* < 2.0. We have not yet tested this to be sure.
When this question was posted 4 days ago on 30 August 2013, it was not possible in CQL3 for C* versions less than 2.0. It was only possible via a Thrift client, e.g. Astyanax's MutationBatch.
Per Alex's suggestion, I created CASSANDRA-5959 as a feature request, but it was marked as a duplicate to CASSANDRA-4693, which supposedly solved the issue for C* 2.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With