Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch create or update document using python

I am using elasticsearch-py for elasticsearch operation.

I am trying for elasticsearch.helpers.bulk to create or update multiple records.

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)

Is there any way to perform this operation?

Now we can give only _op_type as create or update. If we give update and record is not exist, then it will raise error.

Traceback (most recent call last):
  File "/tmp/test.py", line 37, in <module>
    print helpers.bulk(es, data)
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}])
like image 846
Nilesh Avatar asked Aug 21 '15 06:08

Nilesh


People also ask

How do I update Elasticsearch data in Python?

In order to perform any python updates API Elasticsearch you will need Python Versions 2 or 3 with its PIP package manager installed along with a good working knowledge of Python. Once you have the basics requisites you will be able to use python update Elasticsearch documentsx000D in single or multiple calls.

How do you update an existing document in Elasticsearch?

To fully replace an existing document, use the index API. This operation: Gets the document (collocated with the shard) from the index. Runs the specified script.

Is elastic search developed in Python?

What is ElasticSearch? ElasticSearch (ES) is a distributed and highly available open-source search engine that is built on top of Apache Lucene. It's an open-source which is built in Java thus available for many platforms.


2 Answers

According to the _bulk endpoint documentation, you can and should use the index action for this, provided your documents always have the same identifiers.

create is useful when creating documents the first time, and update is more meant for doing partial and/or scripted updates.

You can also not specify any _op_type at all and index will be taken by default.

like image 150
Val Avatar answered Oct 08 '22 13:10

Val


I tried solution suggested by @Val and it works as charm.

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)
like image 23
Mayank Avatar answered Oct 08 '22 14:10

Mayank