Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch bulk insert/update operation

I am not sure if I am using correctly the upsert operation in bulk indexing.

My request is:

{ "update": {"_id": "610946100"}}\n
{"doc": {"id":"610946100","uri":"/0/0/1/6/4/0/610946100.xml"}, "doc_as_upsert" : true}\n

and url is: http://localhost:9200/anIndex/aType/_bulk

I guess I missed something in the documentation but I still can't find how to make this operation.

What I want is to create the above document in the index or update it if exists.

like image 995
dimzak Avatar asked Jul 09 '14 10:07

dimzak


People also ask

What is bulk insert in Elasticsearch?

Bulk inserting is a way to add multiple documents to Elasticsearch in a single request or API call. This is mainly done for performance purposes - opening and closing a connection is usually expensive so you only do it once for multiple documents.

What is bulk index?

Bulk Index Tool. When indexing is enabled and operating correctly, an indexable object is indexed as soon as it is created. However, there are times when you need to index large amounts of data at one time. You can use the Bulk Index Tool to load Windchill Index Search libraries and their objects: •

What is bulk request?

You submit a bulk request to automatically generate bulk quotes or orders. Each action set in a bulk request generates one bulk quote or order, unless an exception occurs. The contacts, services, and actions in the action set determine the line items in the bulk quote or order.

What is bulk API?

Bulk API is based on REST principles and is optimized for loading or deleting large sets of data. You can use it to query, queryAll, insert, update, upsert, or delete many records asynchronously by submitting batches. Salesforce processes batches in the background.


2 Answers

If you add records in the index via the bulk API as

{ "create": {"_id": "someId"}}\n
{"id":"someId","uri":"/0/1/3/2/1/0511912310/511912310.xml"}\n

then if the id already exists in the index you will get an exception. If you want to either add or replace a document (depending on whether it exists or not), you should do the request as

{ "index": {"_id": "someId"}}\n
{"id":"someId","uri":"/0/1/3/2/1/0511912310/511912310.xml"}\n

create will fail if a document with the same index and type exists already, whereas index will add or replace a document as necessary

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html version 5.3

like image 163
dchar Avatar answered Oct 23 '22 15:10

dchar


The only thing I see that differs between your request and the Bulk Documentation is that the examples have the index and type defined in the update action. So based on this I would try adding those values like the following.

{"update": {"_id": "610946100", "_type": "aType", "_index": "anIndex"}}\n
{"doc": {"uri":"/0/0/1/6/4/0/610946100.xml"}, "doc_as_upsert" : true}\n

Additionally since you are specifying the document _id in the update command, I would remove it from the partial document, or mark it as _id. (You were missing the underscore)

like image 45
Paige Cook Avatar answered Oct 23 '22 15:10

Paige Cook