When doing a search operation in elasticsearch i want the metadata to be filtered out and return only "_source" in the response. I'm able to achieve the same through "search" in the following way: <blockquote> out1 = es.search(index='index.com', filter_path=['hits.hits._id', 'hits.hits._source']) </blockquote> But when i do the same with scan method it just returns an empty list: <blockquote> out2 = helpers.scan(es, query, index='index.com', doc_type='2016-07-27',filter_path= ['hits.hits._source']) </blockquote> The problem may be with the way i'm processing the response of 'scan' method or with the way i'm passing the value to filter_path. To check the output i parse out2 to a list.

The <code>scan</code> helper currently doesn't allow passing extra parameters to the <code>scroll</code> API so your <code>filter_path</code> doesn't apply to it. It does, however, get applied to the initial <code>search</code> API call which is used to initiate the <code>scan/scroll</code> cycle. This means that the <code>scroll_id</code> is stripped from the response causing the entire operation to fail. In your case even passing the <code>filter_path</code> parameter to the <code>scroll</code> API calls would cause the helper to fail because it would strip the <code>scroll_id</code> which is needed for this operation to work and also because the helper relies on the structure of the response. My recommendation would be to use source filtering if you need to limit the size of the response or use smaller <code>size</code> parameter than the default <code>1000</code>. Hope this helps, Honza

Usage of filter_path with helpers.scan in elastisearch client

2 Answers

The scan helper currently doesn't allow passing extra parameters to the scroll API so your filter_path doesn't apply to it. It does, however, get applied to the initial search API call which is used to initiate the scan/scroll cycle. This means that the scroll_id is stripped from the response causing the entire operation to fail.

In your case even passing the filter_path parameter to the scroll API calls would cause the helper to fail because it would strip the scroll_id which is needed for this operation to work and also because the helper relies on the structure of the response.

My recommendation would be to use source filtering if you need to limit the size of the response or use smaller size parameter than the default 1000.

Hope this helps, Honza

158

answered Oct 12 '22 18:10

Honza Král

You could pass filter_path=['_scroll_id', '_shards', 'hits.hits._source'] to the scan helper to get it to work. Obviously that leaves some metadata in the response but it removes as much as possible while allowing the scroll to work. _shards is required because it is used internally by the scan helper.

answered Oct 12 '22 20:10

Jazz Kersell

Related questions
                            
                                ElasticSearch : Sorting by nested documents' values
                            
                                Why is the Elastic Search java API ignoring our query limit?
                            
                                Logstash/ElasticSearch: guesses wrong for datatype for field
                            
                                Index fields with hyphens in Elasticsearch
                            
                                Need concrete documentation / examples of building complex index using NEST ElasticSearch library
                            
                                How to get a List of Indices from ElasticSearch using Jest
                            
                                Search keyword using double quotes to get exact match in elasticsearch
                            
                                Elasticsearch: Learning from clicks (Search result ranking)
                            
                                Searching multiple strings in all fields in Elasticsearch using Java API
                            
                                How to paginate results from Elasticsearch DSL in Python
                            
                                elasticsearch failed to parse date
                            
                                elasticsearch: "More like this" combined with additional constraint
                            
                                How to know if a geo coordinate lies within a geo polygon in elasticsearch?
                            
                                How to set the data directory of ElasticSearch with Spring Boot
                            
                                Possible to store images in Elasticsearch?
                            
                                Elastic Search - exclude index and type from json response
                            
                                Searchkick not searching multiple terms when specify fields
                            
                                Sort based on length of an array in elasticsearch
                            
                                How to search array property by array in elasticsearch with nest client
                            
                                Searching multiple types in elasticsearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Usage of filter_path with helpers.scan in elastisearch client

Tags:

elasticsearch

elasticsearch-py

Jai Sharma

People also ask

2 Answers

Honza Král

Jazz Kersell

Recent Activity

Donate For Us