Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

post request with \n-delimited JSON in python

I'm trying to use the bulk API from Elasticsearch and I see that this can be done using the following request which is special because what is given as a "data" is not a proper JSON, but a JSON that uses \n as delimiters.

curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
'

My question is how can I perform such request within python? The authors of ElasticSearch suggest to not pretty print the JSON but I'm not sure what it means (see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)

I know that this is a valid python request

import requests
import json

data = json.dumps({"field":"value"})

r = requests.post("localhost:9200/_bulk?pretty", data=data)

But what do I do if the JSON is \n-delimited?

like image 721
Brian Avatar asked Jul 17 '17 11:07

Brian


2 Answers

What this really is is a set of individual JSON documents, joined together with newlines. So you could do something like this:

data = [
    { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } },
    { "field1" : "value1" },
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" }, },
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" }, },
    { "field1" : "value3" },
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} },
    { "doc" : {"field2" : "value2"} }
]

data_to_post = '\n'.join(json.dumps(d) for d in data)
r = requests.post("localhost:9200/_bulk?pretty", data=data_to_post)

However, as pointed out in the comments, the Elasticsearch Python client is likely to be more useful.

like image 85
Daniel Roseman Avatar answered Nov 05 '22 04:11

Daniel Roseman


As a follow-up to Daniel's answer above, I had to add an additional '\n' to the end of the data_to_post, and add a {Content-Type: application/x-ndjson} header to get it work in Elasticsearch 6.3.

data_to_post = '\n'.join(json.dumps(d) for d in data) + "\n"
headers = {"Content-Type": "application/x-ndjson"}
r = requests.post("http://localhost:9200/_bulk?pretty", data=data_to_post, headers=headers)

Otherwise, I will receive the error: "The bulk request must be terminated by a newline [\\n]"

like image 3
Kai Peng Avatar answered Nov 05 '22 03:11

Kai Peng