Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch Bulk Index JSON Data

I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so. I have the following sample data inside the JSON

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}, {"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}, {"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}, {"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"}, {"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"}, {"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"}, {"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"}, {"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"}, {"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"}, {"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"}, {"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"}, {"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"}, {"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"}, {"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"}, {"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"}, {"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"}, {"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}] 

I am using

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json  

When I try to use the standard bulk index API from Elasticsearch I get this error

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"} 

Can anyone help with indexing this type of JSON?

like image 817
Amit P Avatar asked Oct 26 '15 07:10

Amit P


People also ask

How do I push JSON data to Elasticsearch?

To push JSON data into Elasticsearch, you can use the _bulk API. This API takes an index name, type name, and id as parameters. The id is optional, but if you include it, Elasticsearch will use it to index the document.

What is bulk insert in Elasticsearch?

The Elastic platform includes ElasticSearch, which is a Lucene-based, multi-tenant capable, and distributed search and analytics engine. The ElasticSearch Bulk Insert step sends one or more batches of records to an ElasticSearch server for indexing.

What is bulk index?

Bulk indexing is the process of indexing large amounts of data into an Elasticsearch cluster. There are a few best practices to follow when doing this: 1. Use the Bulk API - The Bulk API is a more efficient way to index data into Elasticsearch as it allows you to index multiple documents in a single request.


1 Answers

What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulk endpoint, i.e. one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:

curl -XPOST localhost:9200/your_index/_bulk -d ' {"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}} {"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"} {"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}} {"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"} ... etc for all your documents ' 

Just make sure to replace your_index and your_type with the actual index and type names you're using.

UPDATE

Note that the command-line can be shortened, by removing _index and _type if those are specified in your URL. It is also possible to remove _id if you specify the path to your id field in your mapping (note that this feature will be deprecated in ES 2.0, though). At the very least, your command line can look like {"index":{}} for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case index the document)

UPDATE 2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json 

/home/data1.json should look like this:

{"index":{}} {"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"} {"index":{}} {"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"} {"index":{}} {"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"} 

UPDATE 3

You can refer to this answer to see how to generate the new json style file mentioned in UPDATE 2.

UPDATE 4

As of ES 7.x, the doc_type is not necessary anymore and should simply be _doc instead of my_doc_type. As of ES 8.x, the doc type will be removed completely. You can read more about this here

like image 82
Val Avatar answered Oct 01 '22 09:10

Val