Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Elasticsearch Bulk-API use "Content-Type: application/json" header?

I am just wondering why ES uses that header if the body of the request is not a json but text with multiple lines, each of which is a json. For example:

{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "135569" } }
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "122886" } }
{ "id": "122886", "title" : "Star Wars: Episode VII - The Force Awakens", "year":2015 , "genre":["Action", "Adventure", "Fantasy", "Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "109487" } }
{ "id": "109487", "title" : "Interstellar", "year":2014 , "genre":["Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "58559" } }
{ "id": "58559", "title" : "Dark Knight, The", "year":2008 , "genre":["Action", "Crime", "Drama", "IMAX"] }
{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "1924" } }
{ "id": "1924", "title" : "Plan 9 from Outer Space", "year":1959 , "genre":["Horror", "Sci-Fi"] }

This would be a valid request despite not being a well-formatted json. Is it common in RESTful interfaces to define something as application/json even if it's not? You can't even send it from Postman, only from cURL, which does not validate the body syntax.

like image 726
Phil Avatar asked Dec 18 '22 18:12

Phil


1 Answers

Technically, when calling the _bulk endpoint, the content type header should be application/x-ndjson and not application/json as stated in their docs

the final line of data must end with a newline character \n. Each newline character may be preceded by a carriage return \r. When sending requests to this endpoint the Content-Type header should be set to application/x-ndjson.

The reason it is not a JSON array is because when the coordinating node receives the bulk request, it can split it in several chunks simply by looking at how many lines (i.e. new line characters) there are and send each chunk to a different node for processing. If the content was JSON, the coordinating node would have to parse it all and for several megabyte bulk queries, it would have a negative impact on performance.

NDJSON is a convenient format for storing or streaming structured data that may be processed one record at a time.

like image 112
Val Avatar answered Apr 28 '23 21:04

Val