Looking to index a CSV file to ElasticSearch, without using Logstash.
I am using the elasticsearch-dsl
high level library.
Given a CSV with header for example:
name,address,url
adam,hills 32,http://rockit.com
jane,valleys 23,http://popit.com
What will be the best way to index all the data by the fields? Eventually I'm looking to get each row to look like this
{
"name": "adam",
"address": "hills 32",
"url": "http://rockit.com"
}
Elastic Stack enables us to easily analyze any data and can help us to create dashboards with key performance indicators. CSV data for different domains like healthcare, crime, agriculture, etc are available on different government sites which we can easily download.
This kind of task is easier with the lower-level elasticsearch-py
library:
from elasticsearch import helpers, Elasticsearch
import csv
es = Elasticsearch()
with open('/tmp/x.csv') as f:
reader = csv.DictReader(f)
helpers.bulk(es, reader, index='my-index', doc_type='my-type')
If you want to create elasticsearch
database from .tsv/.csv
with strict types and model for a better filtering u can do something like that :
class ElementIndex(DocType):
ROWNAME = Text()
ROWNAME = Text()
class Meta:
index = 'index_name'
def indexing(self):
obj = ElementIndex(
ROWNAME=str(self['NAME']),
ROWNAME=str(self['NAME'])
)
obj.save(index="index_name")
return obj.to_dict(include_meta=True)
def bulk_indexing(args):
# ElementIndex.init(index="index_name")
ElementIndex.init()
es = Elasticsearch()
//here your result dict with data from source
r = bulk(client=es, actions=(indexing(c) for c in result))
es.indices.refresh()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With