I have an index with multiple fields in it. I want to filter out based on presence of search string in all the fields except one - user_comments. The query search that I am doing is <pre class="prettyprint"><code>{ "from": offset, "size": limit, "_source": [ "document_title" ], "query": { "function_score": { "query": { "bool": { "must": { "query_string": { "query": "#{query}" } } } } } } } </code></pre> Although the query string is searching through all the fields, and giving me documents with matching string in the user_comments field as well. But, I want to query it against all the fields leaving out the user_comments field. The white-list is a very big list and also the name of the fields are dynamic, so it is not feasible to mention the white-listed field list using the fields parameter like. <pre class="prettyprint"><code>"query_string": { "query": "#{query}", "fields": [ "document_title", "field2" ] } </code></pre> Can anybody please suggest an idea on how to exclude a field from being searched?

There is a way to make it work, it's not pretty but will do the job. You may achieve your goal using a boost and multifield parameters of <code>query_string</code>, <code>bool</code> query to combine the scores and setting <code>min_score</code>: <pre class="prettyprint"><code>POST my-query-string/doc/_search { "query": { "bool": { "should": [ { "query_string": { "query": "#{query}", "type": "most_fields", "boost": 1 } }, { "query_string": { "fields": [ "comments" ], "query": "#{query}", "boost": -1 } } ] } }, "min_score": 0.00001 } </code></pre> <h3>So what happens under the hood?</h3> Let's assume you have the following set of documents: <pre class="prettyprint"><code>PUT my-query-string/doc/1 { "title": "Prodigy in Bristol", "text": "Prodigy in Bristol", "comments": "Prodigy in Bristol" } PUT my-query-string/doc/2 { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham", "comments": "And also in Bristol" } PUT my-query-string/doc/3 { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham and Bristol", "comments": "And also in Cardiff" } PUT my-query-string/doc/4 { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham", "comments": "And also in Cardiff" } </code></pre> In your search request you would like to see only documents 1 and 3, but your original query will return 1, 2 and 3. In Elasticsearch, search results are sorted by relevance <code>_score</code>, the bigger the score the better. So let's try to boost down the <code>"comments"</code> field so its impact into relevance score is neglected. We can do this by combining two queries with a <code>should</code> and using a negative <code>boost</code>: <pre class="prettyprint"><code>POST my-query-string/doc/_search { "query": { "bool": { "should": [ { "query_string": { "query": "Bristol" } }, { "query_string": { "fields": [ "comments" ], "query": "Bristol", "boost": -1 } } ] } } } </code></pre> This will give us the following output: <pre class="prettyprint"><code>{ "hits": { "total": 3, "max_score": 0.2876821, "hits": [ { "_index": "my-query-string", "_type": "doc", "_id": "3", "_score": 0.2876821, "_source": { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham and Bristol", "comments": "And also in Cardiff" } }, { "_index": "my-query-string", "_type": "doc", "_id": "2", "_score": 0, "_source": { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham", "comments": "And also in Bristol" } }, { "_index": "my-query-string", "_type": "doc", "_id": "1", "_score": 0, "_source": { "title": "Prodigy in Bristol", "text": "Prodigy in Bristol", "comments": "Prodigy in Bristol", "discount_percent": 10 } } ] } } </code></pre> Document 2 has got penalized, but also document 1 did, although it is a desired match for us. Why did it happen? Here's how Elasticsearch computed <code>_score</code> in this case: <blockquote> _score = max(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol" </blockquote> Document 1 matches the <code>comments:"Bristol"</code> part and it also happens to be the best score. According to our formula the resulting score is 0. What we would actually like to do is to boost first clause (with "all" fields) more if more fields matched. <h3>Can we boost <code>query_string</code> matching more fields?</h3> We can, <code>query_string</code> in multifield mode has a <code>type</code> parameter that does exactly that. The query will look like this: <pre class="prettyprint"><code>POST my-query-string/doc/_search { "query": { "bool": { "should": [ { "query_string": { "type": "most_fields", "query": "Bristol" } }, { "query_string": { "fields": [ "comments" ], "query": "Bristol", "boost": -1 } } ] } } } </code></pre> This will give us the following output: <pre class="prettyprint"><code>{ "hits": { "total": 3, "max_score": 0.57536423, "hits": [ { "_index": "my-query-string", "_type": "doc", "_id": "1", "_score": 0.57536423, "_source": { "title": "Prodigy in Bristol", "text": "Prodigy in Bristol", "comments": "Prodigy in Bristol", "discount_percent": 10 } }, { "_index": "my-query-string", "_type": "doc", "_id": "3", "_score": 0.2876821, "_source": { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham and Bristol", "comments": "And also in Cardiff" } }, { "_index": "my-query-string", "_type": "doc", "_id": "2", "_score": 0, "_source": { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham", "comments": "And also in Bristol" } } ] } } </code></pre> As you can see, the undesired document 2 is on the bottom and has score of 0. Here's how the score was computed this time: <blockquote> _score = sum(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol" </blockquote> So the documents matching <code>"Bristol"</code> in any field got selected. Relevance score for <code>comments:"Bristol"</code> got eliminated, and only documents matching <code>title:"Bristol"</code> or <code>text:"Bristol"</code> got a <code>_score</code> > 0. <h3>Can we filter out those results with undesired score?</h3> Yes, we can, using <code>min_score</code>: <pre class="prettyprint"><code>POST my-query-string/doc/_search { "query": { "bool": { "should": [ { "query_string": { "query": "Bristol", "type": "most_fields", "boost": 1 } }, { "query_string": { "fields": [ "comments" ], "query": "Bristol", "boost": -1 } } ] } }, "min_score": 0.00001 } </code></pre> This will work (in our case) since the score of the documents will be 0 if and only if <code>"Bristol"</code> was matched against field <code>"comments"</code> only and didn't match any other field. The output will be: <pre class="prettyprint"><code>{ "hits": { "total": 2, "max_score": 0.57536423, "hits": [ { "_index": "my-query-string", "_type": "doc", "_id": "1", "_score": 0.57536423, "_source": { "title": "Prodigy in Bristol", "text": "Prodigy in Bristol", "comments": "Prodigy in Bristol", "discount_percent": 10 } }, { "_index": "my-query-string", "_type": "doc", "_id": "3", "_score": 0.2876821, "_source": { "title": "Prodigy in Birmigham", "text": "Prodigy in Birmigham and Bristol", "comments": "And also in Cardiff" } } ] } } </code></pre> <h3>Can it be done in a different way?</h3> Sure. I wouldn't actually advise to go with <code>_score</code> tweaking since it is a pretty complex matter. I would advise to make a fetch of existing mapping and construct a list of fields to run the query against beforehand, this will make the code much simpler and straightforward. <h3>Original solution proposed in the answer (kept for history)</h3> Originally it was proposed to use this kind of query with exactly the same intent as the solution above: <pre class="prettyprint"><code>POST my-query-string/doc/_search { "query": { "function_score": { "query": { "bool": { "must": { "query_string": { "fields" : ["*", "comments^0"], "query": "#{query}" } } } } } }, "min_score": 0.00001 } </code></pre> The only problem is that if an index contains any numeric values, this part: <pre class="prettyprint"><code>"fields": ["*"] </code></pre> raises an error since textual query string cannot be applied to a number.

How to exclude a field from getting searched by elasticsearch 6.1?

Tags:

elasticsearch

elasticsearch-6

I have an index with multiple fields in it. I want to filter out based on presence of search string in all the fields except one - user_comments. The query search that I am doing is

{
    "from": offset,
    "size": limit,
    "_source": [
      "document_title"
    ],
    "query": {
      "function_score": {
        "query": {
          "bool": {
            "must":
            {
              "query_string": {
                "query": "#{query}"
              }
            }
          }
        }
      }
    }
  }

Although the query string is searching through all the fields, and giving me documents with matching string in the user_comments field as well. But, I want to query it against all the fields leaving out the user_comments field. The white-list is a very big list and also the name of the fields are dynamic, so it is not feasible to mention the white-listed field list using the fields parameter like.

"query_string": {
                    "query": "#{query}",
                    "fields": [
                      "document_title",
                      "field2"
                    ]
                  }

Can anybody please suggest an idea on how to exclude a field from being searched?

346

asked Oct 11 '18 09:10

Richa Sinha

1 Answers

There is a way to make it work, it's not pretty but will do the job. You may achieve your goal using a boost and multifield parameters of query_string, bool query to combine the scores and setting min_score:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "#{query}",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "#{query}",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}

So what happens under the hood?

Let's assume you have the following set of documents:

PUT my-query-string/doc/1
{
  "title": "Prodigy in Bristol",
  "text": "Prodigy in Bristol",
  "comments": "Prodigy in Bristol"
}
PUT my-query-string/doc/2
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Bristol"
}
PUT my-query-string/doc/3
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham and Bristol",
  "comments": "And also in Cardiff"
}
PUT my-query-string/doc/4
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Cardiff"
}

In your search request you would like to see only documents 1 and 3, but your original query will return 1, 2 and 3.

In Elasticsearch, search results are sorted by relevance _score, the bigger the score the better.

So let's try to boost down the "comments" field so its impact into relevance score is neglected. We can do this by combining two queries with a should and using a negative boost:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}

This will give us the following output:

{
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      }
    ]
  }
}

Document 2 has got penalized, but also document 1 did, although it is a desired match for us. Why did it happen?

Here's how Elasticsearch computed _score in this case:

_score = max(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

Document 1 matches the comments:"Bristol" part and it also happens to be the best score. According to our formula the resulting score is 0.

What we would actually like to do is to boost first clause (with "all" fields) more if more fields matched.

Can we boost `query_string` matching more fields?

We can, query_string in multifield mode has a type parameter that does exactly that. The query will look like this:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "type": "most_fields",
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}

This will give us the following output:

{
  "hits": {
    "total": 3,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      }
    ]
  }
}

As you can see, the undesired document 2 is on the bottom and has score of 0. Here's how the score was computed this time:

_score = sum(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

So the documents matching "Bristol" in any field got selected. Relevance score for comments:"Bristol" got eliminated, and only documents matching title:"Bristol" or text:"Bristol" got a _score > 0.

Can we filter out those results with undesired score?

Yes, we can, using min_score:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}

This will work (in our case) since the score of the documents will be 0 if and only if "Bristol" was matched against field "comments" only and didn't match any other field.

The output will be:

{
  "hits": {
    "total": 2,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      }
    ]
  }
}

Can it be done in a different way?

Sure. I wouldn't actually advise to go with _score tweaking since it is a pretty complex matter.

I would advise to make a fetch of existing mapping and construct a list of fields to run the query against beforehand, this will make the code much simpler and straightforward.

Original solution proposed in the answer (kept for history)

Originally it was proposed to use this kind of query with exactly the same intent as the solution above:

POST my-query-string/doc/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": {
            "query_string": {
              "fields" : ["*", "comments^0"],
              "query": "#{query}"
            }
          }
        }
      }
    }
  },
  "min_score": 0.00001
}

The only problem is that if an index contains any numeric values, this part:

"fields": ["*"]

raises an error since textual query string cannot be applied to a number.

124

answered Oct 22 '22 03:10

Nikolay Vasiliev

Related questions
                            
                                Error in BulkRequest java API in Elasticsearch : "The number of object passed must be even but was [1]"
                            
                                "boost" not working for "term" query
                            
                                Elasticsearch + Apache Spark performance
                            
                                Elasticsearch - configure lowercase analyzer with no tokenizer
                            
                                Elasticsearch cluster configuration is not discovering any nodes under both unicast and multicast
                            
                                What is the recommended setup for an Elasticsearch cluster that contains data at the scale of TBs and above? [closed]
                            
                                No property index found for type User
                            
                                Find nested documents by id
                            
                                How to pass list of values for a particular field in Elastic Search Query
                            
                                How can you set boosts and filters with Laravel Scout and ElasticSearch?
                            
                                Elasticsearch - timeout when deleting index
                            
                                Can anyone give a snapshot example of elastic-search by using python?
                            
                                Laravel Scout with Elastic search not working
                            
                                Elasticsearch::Transport::Transport::Errors::BadRequest [400] while creating index
                            
                                Elasticsearch | copy_to with partial searching
                            
                                Renaming fields to new index in Elasticsearch
                            
                                Elasticsearch in RAM data store
                            
                                Elasticsearch: How to get an exact match in a nested field
                            
                                Which would be a quicker (and better) tool for querying data stored in the Parquet format - Spark SQL, Athena or ElasticSearch?
                            
                                Spark Dataframe upsert to Elasticsearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to exclude a field from getting searched by elasticsearch 6.1?

Tags:

elasticsearch

elasticsearch-6

Richa Sinha

People also ask

1 Answers

So what happens under the hood?

Can we boost `query_string` matching more fields?

Can we filter out those results with undesired score?

Can it be done in a different way?

Original solution proposed in the answer (kept for history)

Nikolay Vasiliev

Recent Activity

Donate For Us

How to exclude a field from getting searched by elasticsearch 6.1?

Tags:

elasticsearch

elasticsearch-6

Richa Sinha

People also ask

1 Answers

So what happens under the hood?

Can we boost query_string matching more fields?

Can we filter out those results with undesired score?

Can it be done in a different way?

Original solution proposed in the answer (kept for history)

Nikolay Vasiliev

Related questions

Recent Activity

Donate For Us

Can we boost `query_string` matching more fields?