Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic search multiple simple_query_string with boost

I have an index set up for all my documents:

{
  "mappings" {
    "book" {
      "_source": { "enabled": true },
      "properties": [
        "title": { "type": "string", "analyzer": "standard", "search_analyzer": "standard" },
        "description": { "type": "string", "analyzer": "standard", "search_analyzer": "standard" },
        "author": { "type": "string", "analyzer": "standard", "search_analyzer": "standard" }
      ]
    }
  }
}

I push this through into an index called "library".

What I want to do is execute a search with the following requirements. Assuming the user entered something like "big yellow shovel"

  1. Execute a search of user entered keywords in three ways:
    1. As is as a whole phrase: "simple yellow shovel"
    2. As a set of AND keywords: "simple+yellow+shovel"
    3. As a set of OR keywords: "simple|yellow|shovel"
  2. Ensure that the keyword sets executed in order of priority (boosted?):
    1. Full text first
    2. AND'd second
    3. OR'd third

Using a simple query works find for a single search:

{
  "query": {
    "simple_query_string": {
      "query": "\"simple yellow shovel\""
    }
  }
}

How do I execute the multiple search with boosting? Or should I be using something like a "match" query on the indexed fields?

like image 613
el n00b Avatar asked Oct 28 '15 17:10

el n00b


2 Answers

I am not sure if I got this one correct. I have assumed priority order of author>title>description

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "simple yellow shovel",
                  "fields": [
                    "author^7",
                    "title^3",
                    "description"
                  ],
                  "type": "phrase",
                  "boost": 10
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "simple",
                  "fields": [
                    "author^7",
                    "title^3",
                    "description"
                  ],
                  "boost": 5
                }
              },
              {
                "multi_match": {
                  "query": "yellow",
                  "fields": [
                    "author^7",
                    "title^3",
                    "description"
                  ],
                  "boost": 5
                }
              },
              {
                "multi_match": {
                  "query": "shovel",
                  "fields": [
                    "author^7",
                    "title^3",
                    "description"
                  ],
                  "boost": 5
                }
              }
            ]
          }
        },
        {
          "multi_match": {
            "query": "simple",
            "fields": [
              "author^7",
              "title^3",
              "description"
            ],
            "boost": 2
          }
        },
        {
          "multi_match": {
            "query": "yellow",
            "fields": [
              "author^7",
              "title^3",
              "description"
            ],
            "boost": 2
          }
        },
        {
          "multi_match": {
            "query": "shovel",
            "fields": [
              "author^7",
              "title^3",
              "description"
            ],
            "boost": 2
          }
        }
      ]
    }
  }
}

could anyone please verify this? You could refer to Boost Query link for more info. Is this what you are looking for?

I hope this helps!

EDIT : Rewritten with dis_max

{
  "query": {
    "bool": {
      "should": [
        {
          "dis_max": {
            "tie_breaker": 0.7,
            "queries": [
              {
                "bool": {
                  "must": [
                    {
                      "multi_match": {
                        "query": "simple yellow shovel",
                        "fields": [
                          "author^7",
                          "title^3",
                          "description"
                        ],
                        "type": "phrase",
                        "boost": 10
                      }
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must": [
                    {
                      "dis_max": {
                        "tie_breaker": 0.7,
                        "queries": [
                          {
                            "multi_match": {
                              "query": "simple",
                              "fields": [
                                "author^7",
                                "title^3",
                                "description"
                              ],
                              "boost": 5
                            }
                          },
                          {
                            "multi_match": {
                              "query": "yellow",
                              "fields": [
                                "author^7",
                                "title^3",
                                "description"
                              ],
                              "boost": 5
                            }
                          },
                          {
                            "multi_match": {
                              "query": "shovel",
                              "fields": [
                                "author^7",
                                "title^3",
                                "description"
                              ],
                              "boost": 5
                            }
                          }
                        ]
                      }
                    }
                  ]
                }
              },
              {
                "multi_match": {
                  "query": "simple",
                  "fields": [
                    "author^7",
                    "title^3",
                    "description"
                  ],
                  "boost": 2
                }
              },
              {
                "multi_match": {
                  "query": "yellow",
                  "fields": [
                    "author^7",
                    "title^3",
                    "description"
                  ],
                  "boost": 2
                }
              },
              {
                "multi_match": {
                  "query": "shovel",
                  "fields": [
                    "author^7",
                    "title^3",
                    "description"
                  ],
                  "boost": 2
                }
              }
            ]
          }
        }
      ]
    }
  }
}

This seems to give me much better results atleast on my dataset. This is a great source to understand dismax

Please play a lot with this and see if you are getting expected results. Use the help of Explain API.

like image 113
ChintanShah25 Avatar answered Oct 13 '22 21:10

ChintanShah25


I've rewritten this using Dis Max Query. Keep in mind that you could try different types to get better results. See these:

  1. best_fields
  2. most_fields
  3. cross_fields

Query:

POST /your_index/your_type/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.7,
      "boost": 1.2,
      "queries": [
        {
          "multi_match": {
            "query": "simple yellow showel",
            "type": "phrase",
            "boost": 3,
            "fields": [
              "title^3",
              "author^2",
              "description"
            ]
          }
        },
        {
          "multi_match": {
            "query": "simple yellow showel",
            "operator": "and",
            "boost": 2,
            "fields": [
              "title^3",
              "author^2",
              "description"
            ]
          }
        },
        {
          "multi_match": {
            "query": "simple yellow showel",
            "fields": [
              "title^3",
              "author^2",
              "description"
            ]
          }
        }
      ]
    }
  }
}

Dis Max query will pick document, which scored most from all three queries. And we give additional boost for "type": "phrase" and "operator": "and", while we leave last query untouched.

like image 41
Evaldas Buinauskas Avatar answered Oct 13 '22 23:10

Evaldas Buinauskas