Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index JSON files in elasticsearch using Python?

I have a bunch of JSON files(100), which are named as merged_file 1.json, merged_file 2. json and so on.

How do I index all these files into elasticsearch using python(elasticsearch_dsl) ?

I am using this code, but it doesn't seem to work:

from elasticsearch_dsl import Elasticsearch
import json
import os
import sys

es = Elasticsearch()

json_docs =[]

directory = sys.argv[1]

for filename in os.listdir(directory):
    if filename.endswith('.json'):
        with open(filename,'r') as open_file:
            json_docs.append(json.load(open_file))

es.bulk("index_name", "type_name", json_docs)

The JSON looks like this:

{"one":["some data"],"two":["some other data"],"three":["other data"]}

What can I do to make this correct ?

like image 403
anshaj Avatar asked May 15 '17 13:05

anshaj


People also ask

Does Elasticsearch support JSON?

Elasticsearch only supports JSON. If you want to send something else you need to transform it. You can use logstash or whatever other system (even your code).


1 Answers

For this task you should be using elasticsearch-py (pip install elasticsearch):

from elasticsearch import Elasticsearch, helpers
import sys, json

es = Elasticsearch()

def load_json(directory):
    " Use a generator, no need to load all in memory"
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='my-index', doc_type='my-type')
like image 170
Honza Král Avatar answered Nov 07 '22 09:11

Honza Král