Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make elasticsearch only return certain fields?

I'm using elasticsearch to index my documents.

Is it possible to instruct it to only return particular fields instead of the entire json document it has stored?

like image 755
user1199438 Avatar asked Mar 07 '12 16:03

user1199438


People also ask

How do I capture a specific field in Elasticsearch?

There are two recommended methods to retrieve selected fields from a search query: Use the fields option to extract the values of fields present in the index mapping. Use the _source option if you need to access the original data that was passed at index time.

How do I get distinct values in Elasticsearch?

Use a terms aggregation on the color field. And you need to pay attention to how that field you want to get distinct values on is analyzed, meaning you need to make sure you're not tokenizing it while indexing, otherwise every entry in the aggregation will be a different term that is part of the field content.

What is _source in Elasticsearch query?

The _source field contains the original JSON document body that was passed at index time. The _source field itself is not indexed (and thus is not searchable), but it is stored so that it can be returned when executing fetch requests, like get or search.

What is field mapping in Elasticsearch?

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. Each document is a collection of fields, which each have their own data type. When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.


2 Answers

Yep, Use a better option source filter. If you're searching with JSON it'll look something like this:

{     "_source": ["user", "message", ...],     "query": ...,     "size": ... } 

In ES 2.4 and earlier, you could also use the fields option to the search API:

{     "fields": ["user", "message", ...],     "query": ...,     "size": ... } 

This is deprecated in ES 5+. And source filters are more powerful anyway!

like image 50
kevingessner Avatar answered Oct 06 '22 11:10

kevingessner


I found the docs for the get api to be helpful - especially the two sections, Source filtering and Fields: https://www.elastic.co/guide/en/elasticsearch/reference/7.3/docs-get.html#get-source-filtering

They state about source filtering:

If you only need one or two fields from the complete _source, you can use the _source_include & _source_exclude parameters to include or filter out that parts you need. This can be especially helpful with large documents where partial retrieval can save on network overhead

Which fitted my use case perfectly. I ended up simply filtering the source like so (using the shorthand):

{     "_source": ["field_x", ..., "field_y"],     "query": {               ...     } } 

FYI, they state in the docs about the fields parameter:

The get operation allows specifying a set of stored fields that will be returned by passing the fields parameter.

It seems to cater for fields that have been specifically stored, where it places each field in an array. If the specified fields haven't been stored it will fetch each one from the _source, which could result in 'slower' retrievals. I also had trouble trying to get it to return fields of type object.

So in summary, you have two options, either though source filtering or [stored] fields.

like image 40
Markus Coetzee Avatar answered Oct 06 '22 11:10

Markus Coetzee