Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index CSV to ElasticSearch in Python

Looking to index a CSV file to ElasticSearch, without using Logstash. I am using the elasticsearch-dsl high level library.

Given a CSV with header for example:

name,address,url
adam,hills 32,http://rockit.com
jane,valleys 23,http://popit.com

What will be the best way to index all the data by the fields? Eventually I'm looking to get each row to look like this

{
"name": "adam",
"address": "hills 32",
"url":  "http://rockit.com"
}
like image 224
bluesummers Avatar asked Jan 10 '17 16:01

bluesummers


People also ask

Does Elasticsearch support CSV?

Elastic Stack enables us to easily analyze any data and can help us to create dashboards with key performance indicators. CSV data for different domains like healthcare, crime, agriculture, etc are available on different government sites which we can easily download.


2 Answers

This kind of task is easier with the lower-level elasticsearch-py library:

from elasticsearch import helpers, Elasticsearch
import csv

es = Elasticsearch()

with open('/tmp/x.csv') as f:
    reader = csv.DictReader(f)
    helpers.bulk(es, reader, index='my-index', doc_type='my-type')
like image 138
Honza Král Avatar answered Oct 21 '22 11:10

Honza Král


If you want to create elasticsearch database from .tsv/.csv with strict types and model for a better filtering u can do something like that :

class ElementIndex(DocType):
    ROWNAME = Text()
    ROWNAME = Text()

    class Meta:
        index = 'index_name'

def indexing(self):
    obj = ElementIndex(
        ROWNAME=str(self['NAME']),
        ROWNAME=str(self['NAME'])
    )
    obj.save(index="index_name")
    return obj.to_dict(include_meta=True)

def bulk_indexing(args):

    # ElementIndex.init(index="index_name")
    ElementIndex.init()
    es = Elasticsearch()

    //here your result dict with data from source

    r = bulk(client=es, actions=(indexing(c) for c in result))
    es.indices.refresh()
like image 40
Alex Avatar answered Oct 21 '22 12:10

Alex