Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch Bulk JSON Data

This question arises from this SO thread.

As it seems I have a similar but not the same query, it might be best to have a separate question for others to benefit from, as @Val suggested.

So, similar to the above, I have the need to insert a massive amount of data into an index (my initial testing is about 10 000 documents but this is just for a POC, there are many more). The data I would like to insert is in a .json document and looks something like this (snippet):

[ { "fileName": "filename", "data":"massive string text data here" }, 
  { "fileName": "filename2", "data":"massive string text data here" } ]

On my own admission I am new to ElasticSearch, however, from reading through the documentation, my assumptions were that I could take a .json file and create an index from the data within. I have now since learnt that it seems each item within the json needs to have a "header", something like:

{"index":{}}
{ "fileName": "filename", "data":"massive string text data here" }

Meaning, that this is not actual json format (as such) but rather manipulated string.

I would like to know if there is a way to import my json data as is (in json format), without having to manually manipulate the text first (as my test data has 10 000 entries, I'm sure you can see why I'd prefer not doing this manually).

Any suggestions or suggested automated tools to help with this?

PS - I am using the windows installer and Postman for the calls.

like image 693
Hexie Avatar asked Aug 09 '17 22:08

Hexie


Video Answer


1 Answers

You can transform your file very easily with a single shell command like this. Provided that your file is called input.json, you can do this:

jq -c -r ".[]" input.json | while read line; do echo '{"index":{}}'; echo $line; done > bulk.json

After this you'll have a file called bulk.json which is properly formatted to be sent to the bulk endpoint.

Then you can call your bulk endpoint like this:

curl -XPOST localhost:9200/your_index/your_type/_bulk -H "Content-Type: application/x-ndjson" --data-binary @bulk.json

Note: You need to install jq first if you don't have it already.

like image 98
Val Avatar answered Oct 09 '22 21:10

Val