Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch java bulk batch size

I want to use the elasticsearch bulk api using java and wondering how I can set the batch size.

Currently I am using it as:

BulkRequestBuilder bulkRequest = getClient().prepareBulk();
while(hasMore) {
    bulkRequest.add(getClient().prepareIndex(indexName, indexType, artist.getDocId()).setSource(json));
    hasMore = checkHasMore();
}
BulkResponse bResp = bulkRequest.execute().actionGet();
//To check failures
log.info("Has failures? {}", bResp.hasFailures());

Any idea how I can set the bulk/batch size?

like image 961
Justin Avatar asked Jun 04 '13 14:06

Justin


1 Answers

It mainly depends on the size of your documents, available resources on the client and the type of client (transport client or node client).

The node client is aware of the shards over the cluster and sends the documents directly to the nodes that hold the shards where they are supposed to be indexed. On the other hand the transport client is a normal client that sends its requests to a list of nodes in a round-robin fashion. The bulk request would be sent to one node then, which would become your gateway when indexing.

Since you're using the Java API, I would suggest you to have a look at the BulkProcessor, which makes it much easier and flexibile to index in bulk. You can either define a maximum number of actions, a maximum size and a maximum time interval since the last bulk execution. It's going to execute the bulk automatically for you when needed. You can also set a maximum number of concurrent bulk requests.

After you created the BulkProcessor like this:

BulkProcessor bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() {
    @Override
    public void beforeBulk(long executionId, BulkRequest request) {
        logger.info("Going to execute new bulk composed of {} actions", request.numberOfActions());
    }

    @Override
    public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
        logger.info("Executed bulk composed of {} actions", request.numberOfActions());
    }

    @Override
    public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
        logger.warn("Error executing bulk", failure);
    }
    }).setBulkActions(bulkSize).setConcurrentRequests(maxConcurrentBulk).build();

You just have to add your requests to it:

bulkProcessor.add(indexRequest);

and close it at the end to flush any eventual requests that might have not been executed yet:

bulkProcessor.close();

To finally answer your question: the nice thing about the BulkProcessor is also that it has sensible defaults: 5 MB of size, 1000 actions, 1 concurrent request, no flush interval (which might be useful to set).

like image 156
javanna Avatar answered Oct 24 '22 05:10

javanna