I want to insert a single row with 50,000 columns into Cassandra 1.2.8. Before inserting, I have all the data for the entire row ready to go (in memory): <pre class="prettyprint"><code>+---------+------+------+------+------+-------+ | | 0 | 1 | 2 | ... | 49999 | | row_id +------+------+------+------+-------+ | | text | text | text | ... | text | +---------+------+------+------|------+-------+ </code></pre> The column names are integers, allowing slicing for pagination. The column values are a value at that particular index. CQL3 table definition: <pre class="prettyprint"><code>create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; </code></pre> As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible. The only thing I can seem to find is to do execute the following 50,000 times: <pre class="prettyprint"><code>INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); </code></pre> the first <code>?</code> is is an index counter (<code>i</code>) and the second <code>?</code> is the text value to store at location <code>i</code>. This takes a lot of time. Even when we put the above INSERTs into a batch, it takes a lot of time. We have all the data we need (the complete row) in its entirety, I would assume it to be very easy to just say "here, Cassandra, store this data as a single row in one request", for example: <pre class="prettyprint"><code>//EXAMPLE-BUT-INVALID CQL3 SYNTAX: insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); </code></pre> This example isn't possible via current CQL3 syntax, but I hope it illustrates the desired effect: everything would be inserted as a single query. Is it possible to do this in CQL3 and the DataStax Java Driver? If not, I suppose I'll be forced to use Hector or the Astyanax driver and the Thrift <code>batch_insert</code> operation instead?

Multiple INSERTs / UPDATEs can be done using batch_mutate method in Thrift APIs, by making use of mutation multi-maps. <pre class="prettyprint"><code>Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>(); List<Mutation> mutationList = new ArrayList<Mutation>(); mutationList.add(mutation); Map<String, List<Mutation>> m = new HashMap<String, List<Mutation>>(); m.put(columnFamily, mutationList); mutationMap.put(key, m); client.batch_mutate(mutationMap, ConsistencyLevel.ALL); </code></pre>

Using Cassandra and CQL3, how do you insert an entire wide row in a single request?

Tags:

cassandra

datastax-java-driver

cql3

I want to insert a single row with 50,000 columns into Cassandra 1.2.8. Before inserting, I have all the data for the entire row ready to go (in memory):

+---------+------+------+------+------+-------+
|         | 0    | 1    | 2    | ...  | 49999 |
| row_id  +------+------+------+------+-------+
|         | text | text | text | ...  | text  |
+---------+------+------+------|------+-------+

The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.

CQL3 table definition:

create table results (
    row_id text,
    index int,
    value text,
    primary key (row_id, index)
) 
with compact storage;

As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.

The only thing I can seem to find is to do execute the following 50,000 times:

INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);

the first ? is is an index counter (i) and the second ? is the text value to store at location i.

This takes a lot of time. Even when we put the above INSERTs into a batch, it takes a lot of time.

We have all the data we need (the complete row) in its entirety, I would assume it to be very easy to just say "here, Cassandra, store this data as a single row in one request", for example:

//EXAMPLE-BUT-INVALID CQL3 SYNTAX:
insert into results (row_id, (index,value)) values 
    ((0,text0), (1,text1), (2,text2), ..., (N,textN));

This example isn't possible via current CQL3 syntax, but I hope it illustrates the desired effect: everything would be inserted as a single query.

Is it possible to do this in CQL3 and the DataStax Java Driver? If not, I suppose I'll be forced to use Hector or the Astyanax driver and the Thrift batch_insert operation instead?

771

asked Aug 29 '13 22:08

Les Hazlewood

2 Answers

Multiple INSERTs / UPDATEs can be done using batch_mutate method in Thrift APIs, by making use of mutation multi-maps.

Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>();

List<Mutation> mutationList = new ArrayList<Mutation>();

mutationList.add(mutation);
Map<String, List<Mutation>> m = new HashMap<String, List<Mutation>>();

m.put(columnFamily, mutationList);

mutationMap.put(key, m);
client.batch_mutate(mutationMap, ConsistencyLevel.ALL);

answered Sep 22 '22 07:09

Mata

Edit: only 4 days after I posted this question regarding Cassandra 1.2.9, Cassandra 2.0 final was released. 2.0 supports batch prepared statements, which should be much faster than the non-batched CQL3 that was required to be used for C* < 2.0. We have not yet tested this to be sure.

When this question was posted 4 days ago on 30 August 2013, it was not possible in CQL3 for C* versions less than 2.0. It was only possible via a Thrift client, e.g. Astyanax's MutationBatch.

Per Alex's suggestion, I created CASSANDRA-5959 as a feature request, but it was marked as a duplicate to CASSANDRA-4693, which supposedly solved the issue for C* 2.0.

answered Sep 20 '22 07:09

Les Hazlewood

Related questions
                            
                                Hbase vs Cassandra vs Kafka for high resolution time series data storage
                            
                                Cassandra IN clause on index
                            
                                Cassandra read timeout
                            
                                Adding nodes to Cassandra Cluster
                            
                                SELECT contant value is Cassandra
                            
                                Cassandra Query execution time analysis
                            
                                Columns ordering in Cassandra
                            
                                How to navigate to previous page using Cassandra manual paging
                            
                                What is the best way to get backpressure for Cassandra Writes?
                            
                                Why does it take so long to create a table?
                            
                                Range query on secondary index in cassandra
                            
                                Mixing lightweight transactions and normal writes in Cassandra
                            
                                @PrimaryKeyColumn annotations must have a type of PARTITIONED for scala Cassandra Spring Data application
                            
                                difference between mysql and cassandra
                            
                                online bulk delete (truncate) of a cassandra keyspace
                            
                                What is mutation in cassandra?
                            
                                Faster way of counting total number of columns in a cassandra row with hector
                            
                                Cassandra - how to retrieve all keys in a CF (Random Partitioner)
                            
                                Keyspace schema import and export in Cassandra
                            
                                CQL 3 Cassandra 1.2 counter: how to insert primary key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Cassandra and CQL3, how do you insert an entire wide row in a single request?

Tags:

cassandra

datastax-java-driver

cql3

Les Hazlewood

People also ask

2 Answers

Mata

Les Hazlewood

Recent Activity

Donate For Us