Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert multiple documents in elasticsearch

I have to insert a json array in elastic. The accepted answer in the link suggests to insert a header-line before each json entry. The answer is 2 years old, is there a better solution out in the market? Need I edit my json file manually?

is there any way to import a json file(contains 100 documents) in elasticsearch server.?

[
  {
    "id":9,
    "status":"This is cool."
  },
  ...
]
like image 964
Forkmohit Avatar asked Nov 29 '15 07:11

Forkmohit


People also ask

How many BULK INSERT/UPDATE operations are there in Elasticsearch?

ElasticSearch bulk insert/update operation 73 Counting number of documents using Elasticsearch 54 Elasticsearch Bulk Index JSON Data 5 Insert multiple documents in elasticsearch 0 Elasticsearch bulk update with same script for multiple documents

How to insert multiple items with one request in Elasticsearch?

You need to use elasticsearch Bulk API. It allows you to insert multiple items with one request. Requests are POSTed to special endpoint /_bulk and look like this:

What is Elasticsearch?

Elasticsearch is a superb platform for searching and indexing large amounts of data in real time. Setting up the service and configuring compatible tools to enhance its function is a great way to get even more benefit from it.

What are some examples of indexing in Elasticsearch?

Examples work for Elasticsearch versions 1.x, 2.x and probably later ones too For these examples, let's assume you have an index called "myIndex" and a type called "person" having name and age attributes. Don't forget the extra newline after the last document! Note that the URL used contains both the index and the type.


1 Answers

OK, then there's something pretty simple you can do using a simple shell script (see below). The idea is to not have to edit your file manually, but let Python do it and create another file whose format complies with what the _bulk endpoint expects. It does the following:

  1. First, we declare a little Python script that reads your JSON file and creates a new one with the required file format to be sent to the _bulk endpoint.
  2. Then, we run that Python script and store the bulk file
  3. Finally, we send the file created in step 2 to the _bulk endpoint using a simple curl command
  4. There you go, you now have a new ES index containing your documents

bulk.sh:

#!/bin/sh

# 0. Some constants to re-define to match your environment
ES_HOST=localhost:9200
JSON_FILE_IN=/path/to/your/file.json
JSON_FILE_OUT=/path/to/your/bulk.json

# 1. Python code to transform your JSON file
PYTHON="import json,sys;
out = open('$JSON_FILE_OUT', 'w');
with open('$JSON_FILE_IN') as json_in:
    docs = json.loads(json_in.read());
    for doc in docs:
        out.write('%s\n' % json.dumps({'index': {}}));
        out.write('%s\n' % json.dumps(doc, indent=0).replace('\n', ''));
"

# 2. run the Python script from step 1
python -c "$PYTHON"

# 3. use the output file from step 2 in the curl command
curl -s -XPOST $ES_HOST/index/type/_bulk --data-binary @$JSON_FILE_OUT

You need to:

  1. save the above script in the bulk.sh file and chmod it (i.e. chmod u+x bulk.sh)
  2. modify the three variable at the top (step 0) in ordre to match your environment
  3. run it using ./bulk.sh
like image 139
Val Avatar answered Oct 19 '22 17:10

Val