Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between source filtering and the fields option in the elasticsearch get API?

I'm confused between source filtering (i.e. using the _source_include parameter) and the fields option of the GET API in elasticsearch. How are they different in terms of performance? When are they supposed to be used?

like image 215
vaishaks Avatar asked Oct 19 '22 16:10

vaishaks


2 Answers

Update: re: fields

Note that this is the 1.x documentation if you just arrived here from the future.

For backwards compatibility, if the fields parameter specifies fields which are not stored (store mapping set to false), it will load the _source and extract it from it. This functionality has been replaced by the source filtering parameter.

-- https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-request-fields.html#search-request-fields


AFAICT:

_source tells elasticsearch whether to include the source of matched documents in the response. The "source" is the data in the document as it was inserted.

fields tells elasticsearch to include source, but only include the defined fields.

Permformance: Unless you have low bandwidth to the Elasticsearch server, it might be negligible.

like image 117
joar Avatar answered Oct 22 '22 08:10

joar


I had the same doubt, here I found what can be the answer.

fields restricts the fields whose contents are parsed and returned

_source_filtering restricts the fields which are returned

Another way of seeing it is to think that fields is used to optimize data transfer and CPU usage while _source_filtering only optimizes data transfer

Source filtering allows us to control which parts of the original JSON document are returned for each hit[...]It's worth keeping in mind that this only saves us on bandwidth costs between the nodes participating in the search as well as the client, not CPU or Disk, as was the case when using fields.

In addition:

One feature about fields that's not commonly known is the ability to select metadata-fields as well. Of particular note is its ability to select the _ttl-field, which actually returns the number of milliseconds until the document expires, not the original lifespan of the document. A very handy feature indeed.

like image 43
Redithion Avatar answered Oct 22 '22 09:10

Redithion