Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch Bulk API - Index vs Create/Update

I'm using the Elasticsearch Bulk API to create or update documents.

I do actually know if they are creates or updates, but I can simplify my code by just making them all index, or "upserts" in the SQL sense.

Is there any disadvantage in using index (and letting ES figure it out) over using the more explicit create and update?

like image 492
Kong Avatar asked Jan 03 '16 02:01

Kong


People also ask

What is bulk in Elasticsearch?

Bulk inserting is a way to add multiple documents to Elasticsearch in a single request or API call. This is mainly done for performance purposes - opening and closing a connection is usually expensive so you only do it once for multiple documents.

Can you update an Elasticsearch index?

For little changes in Index or index settings you can use update API where you can update index settings ( No of replicas, refresh interval etc.) . Also, you can update documents and add field using update API in Elasticsearch.

What is bulk indexing?

Bulk Index Tool. When indexing is enabled and operating correctly, an indexable object is indexed as soon as it is created. However, there are times when you need to index large amounts of data at one time. You can use the Bulk Index Tool to load Windchill Index Search libraries and their objects: •

Is Elasticsearch good for updates?

Elasticsearch allows us to do partial updates, but internally these are “get_then_update” operations, where the whole document is fetched, the changes are applied and then the document is indexed again. Even without disk hits one can imagine the potential performance implications if this is your main use case.


2 Answers

If you're sending create, you must ensure that the document doesn't exist yet in your index otherwise the call will fail, whereas sending the same document with index will always succeed.

Then, if for performance reasons, you know you'll create a document (with either create or index) and then you'll only update just a few properties, then using update might make sense.

Otherwise, if you're always sending full documents, I'd use index all the time, for both creating and updating. Whenever it sees an index action, ES will either create the document if it doesn't exist or replace it if it exists, but the call will always succeed.

like image 192
Val Avatar answered Sep 18 '22 11:09

Val


The short answer: No there is no disadvantage.

The create and update endpoint are special cases. With create you want to do nothing if the document is already there. With update you can provided less data if you do not have all the data of the document you could just add a few fields. You could also make sure the document is only indexed in case it is already there with the update.

like image 26
Jettro Coenradie Avatar answered Sep 20 '22 11:09

Jettro Coenradie