I am using elasticsearch-py for elasticsearch operation.
I am trying for elasticsearch.helpers.bulk
to create or update multiple records.
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()
data = [
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 3,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 4,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 5,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_op_type": "create",
"_id": 6,
"doc" : {"name": "test"}
},
]
print helpers.bulk(es, data)
Is there any way to perform this operation?
Now we can give only _op_type
as create
or update
. If we give update
and record is not exist, then it will raise error.
Traceback (most recent call last):
File "/tmp/test.py", line 37, in <module>
print helpers.bulk(es, data)
File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk
raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}])
In order to perform any python updates API Elasticsearch you will need Python Versions 2 or 3 with its PIP package manager installed along with a good working knowledge of Python. Once you have the basics requisites you will be able to use python update Elasticsearch documentsx000D in single or multiple calls.
To fully replace an existing document, use the index API. This operation: Gets the document (collocated with the shard) from the index. Runs the specified script.
What is ElasticSearch? ElasticSearch (ES) is a distributed and highly available open-source search engine that is built on top of Apache Lucene. It's an open-source which is built in Java thus available for many platforms.
According to the _bulk
endpoint documentation, you can and should use the index
action for this, provided your documents always have the same identifiers.
create
is useful when creating documents the first time, and update
is more meant for doing partial and/or scripted updates.
You can also not specify any _op_type
at all and index
will be taken by default.
I tried solution suggested by @Val and it works as charm.
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()
data = [
{
"_index": "customer",
"_type": "external",
"_id": 3,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_id": 4,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_id": 5,
"doc" : {"name": "test"}
},
{
"_index": "customer",
"_type": "external",
"_id": 6,
"doc" : {"name": "test"}
},
]
print helpers.bulk(es, data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With