How to efficiently use Batch writes to cassandra using datastax java driver?

Tags:

I need to write in Batches to Cassandra using Datastax Java driver and this is my first time I am trying to use batch with datastax java driver so I am having some confusion -

Below is my code in which I am trying to make a Statement object and adding it to Batch and setting the ConsistencyLevel as QUORUM as well.

Session session = null;
Cluster cluster = null;

// we build cluster and session object here and we use  DowngradingConsistencyRetryPolicy as well
// cluster = builder.withSocketOptions(socketOpts).withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)

public void insertMetadata(List<AddressMetadata> listAddress) {
    // what is the purpose of unloggedBatch here?
    Batch batch = QueryBuilder.unloggedBatch();

    try {
        for (AddressMetadata data : listAddress) {
            Statement insert = insertInto("test_table").values(
                    new String[] { "address", "name", "last_modified_date", "client_id" },
                    new Object[] { data.getAddress(), data.getName(), data.getLastModifiedDate(), 1 });
            // is this the right way to set consistency level for Batch?
            insert.setConsistencyLevel(ConsistencyLevel.QUORUM);
            batch.add(insert);
        }

        // now execute the batch
        session.execute(batch);
    } catch (NoHostAvailableException e) {
        // log an exception
    } catch (QueryExecutionException e) {
        // log an exception
    } catch (QueryValidationException e) {
        // log an exception
    } catch (IllegalStateException e) {
        // log an exception
    } catch (Exception e) {
        // log an exception
    }
}

And below is my AddressMetadata class -

public class AddressMetadata {

    private String name;
    private String address;
    private Date lastModifiedDate;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getAddress() {
        return address;
    }

    public void setAddress(String address) {
        this.address = address;
    }

    public Date getLastModifiedDate() {
        return lastModifiedDate;
    }

    public void setLastModifiedDate(Date lastModifiedDate) {
        this.lastModifiedDate = lastModifiedDate;
    }
}

Now my question is - Does the way I am using Batch to insert into cassandra with Datastax Java Driver is correct? And what about retry policies, meaning if batch statement execution failed, then what will happen, will it retry again?

And is there any better way of using batch writes to cassandra using java driver?

887

asked Oct 08 '14 19:10

john

2 Answers

First a bit of a rant:

The batch keyword in Cassandra is not a performance optimization for batching together large buckets of data for bulk loads.

Batches are used to group together atomic operations, actions that you expect to occur together. Batches guarantee that if a single part of your batch is successful, the entire batch is successful.

Using batches will probably not make your mass ingestion run faster

Now for your questions:

what is the purpose of unloggedBatch here?

Cassandra uses a mechanism called batch logging in order to ensure a batch's atomicity. By specifying unlogged batch, you are turning off this functionality so the batch is no longer atomic and may fail with partial completion. Naturally, there is a performance penalty for logging your batches and ensuring their atomicity, using unlogged batches will removes this penalty.

There are some cases in which you may want to use unlogged batches to ensure that requests (inserts) that belong to the same partition, are sent together. If you batch operations together and they need to be performed in different partitions / nodes, you are essentially creating more work for your coordinator. See specific examples of this in Ryan's blog:

Read this post

Now my question is - Does the way I am using Batch to insert into cassandra with Datastax Java Driver is correct?

I don't see anything wrong with your code here, just depends on what you're trying to achieve. Dig into that blog post I shared for more insight.

And what about retry policies, meaning if batch statement execution failed, then what will happen, will it retry again?

A batch on its own will not retry on its own if it fails. The driver does have retry policies but you have to apply those separately.

The default policy in the java driver only retries in these scenarios:

On a read timeout, if enough replica replied but data was not retrieved.
On a write timeout, if we timeout while writing the distributed log used by batch statements.

Read more about the default policy and consider less conservative policies based on your use case.

179

answered Sep 18 '22 05:09

phact

We debated for a while between using async and batches. We tried out both to compare. We got better throughput using "unlogged batches" compared to individual "async" requests. We dont know why, but based on Ryan's blog, I am guessing it has got to do with the write size. We probably are doing too many smaller writes and so batching them probably gave us better performance as it does reduce network traffic.

I have to mention that we are not even doing "unlogged batches" in the recommended way. The recommended way is to do a batch with a single-partition key. Basically, batch all the records which belong to the same partition key. But, we were just batching some records which probably belong to different partitions.

Someone did some benchmarking to compare async and "unlogged batches" and we found that quite useful. Here is the link.

answered Sep 21 '22 05:09

Chandra

Related questions
                            
                                How do I best catch up with the latest developments in java? [closed]
                            
                                array of structures, or structure of arrays?
                            
                                How can I get/set individual bits in a float?
                            
                                Is there any reason to avoid the sentinel pattern in Java?
                            
                                Final variable and synchronized block in java
                            
                                Same keyword for two purposes in java?
                            
                                Which is the 'correct' way to do this (if statement)
                            
                                What is a hash function in java?
                            
                                C++ / Java: Toggle boolean statement?
                            
                                Send trap v2 in Java
                            
                                How to go to next iteration
                            
                                create hot keys for JButton in java using swing
                            
                                Why does a java.lang.Thread not call the run() method of its explicit java.lang.Runnable when started?
                            
                                Find the size of the file inside a GZIP file
                            
                                bit shift operation does not return expected result
                            
                                Why is String a class?
                            
                                @Secured function getting Access Denied for authorized user
                            
                                How to draw grid using swing class Java and detect mouse position when click and drag
                            
                                Eclipse shortcut to find function name
                            
                                jackson annotations being ignored

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to efficiently use Batch writes to cassandra using datastax java driver?

Tags:

java

cassandra

datastax-java-driver