Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic search boosting with function_score in nested properties

In Elasticsearch, given the following document structure:

"workhistory": {
  "positions": [{
    "company": "Some company",
    "position": "Some Job Title",
    "start": 1356998400,
    "end": 34546576576,
    "description": "",
    "source": [
       "some source", 
       "some other source"
    ]
  },
  {
    "company": "Some other company",
    "position": "Job Title",
    "start": 1356998400,
    "end": "",
    "description": "",
    "source": [
       "some other source"
    ]
  }]
}

and mappings for this structure:

  workhistory: {
    properties: {    
      positions: {
        type: "nested", 
        include_in_parent: true, 
        properties: {                 
          company: {
            type: "multi_field",
            fields: {
              company: {type: "string"},
              original: {type : "string", analyzer : "string_lowercase"} 
            }              
          }, 
          position: {
            type: "multi_field",
            fields: {
              position: {type: "string"},
              original: {type : "string", analyzer : "string_lowercase"} 
            }              
          }                                                       
        }
      }        
    }
  }

I want to be able to search on "company" and match the document if company = "some company" etc. Then I want to to get the tf idf _score. I also want to create a function_score query to boost the score of this match, based on the values of the "source" field array. Basically, if the source contains "some source", boost _score with x amount. I can change the structure of the "source" property if needed.

This is what I got so far:

{
   "bool": {
      "should": [
         {
            "filtered": {
               "query": {
                  "bool": {
                     "should": [
                        {
                           "bool": {
                              "should": [
                                 {
                                    "match": {
                                       "workhistory.positions.company.original": "some company"
                                    }
                                 }
                              ]
                           }
                        }
                     ],
                     "minimum_should_match": "100%"
                  }
               },
               "filter": {
                  "and": [
                     {
                        "bool": {
                           "should": [
                              {
                                 "term": {
                                    "workhistory.positions.company.original": "some company"
                                 }
                              }
                           ]
                        }
                     }
                  ]
               }
            }
         },
         {
            "function_score": {
               "query": {
                  "bool": {
                     "should": [
                        {
                           "bool": {
                              "should": [
                                 {
                                    "match": {
                                       "workhistory.positions.company.original": "some company"
                                    }
                                 }
                              ]
                           }
                        }
                     ],
                     "minimum_should_match": "100%"
                  }
               },
               "filter": {
                  "and": [
                     {
                        "bool": {
                           "should": [
                              {
                                 "term": {
                                    "workhistory.positions.company.original": "some company"
                                 }
                              }
                           ]
                        }
                     }
                  ]
               }
            }
         }
      ]
   }
}

There are some filters in here as well, cause I only want to return documents with the filtered value. In this example the filters and the query are basically the same, but in a bigger version of this query I have a few other "optional" matches to boost optional values etc. The function_score is not doing much right now, since I cant really figure out how to work with it. The goal is to be able to adjust the number of the boost in my application code and pass it in to the search query.

I'm using Elasticsearch version 1.3.4.

like image 709
Øyvind Avatar asked Nov 01 '22 16:11

Øyvind


1 Answers

I am not sure why you repeated all those filters and queries in there, to be honest. Maybe I'm missing something, but by your description I believe all you need is a "function_score". From the documentation:

The function_score allows you to modify the score of documents that are retrieved by a query.

So, you define a query (for example - matching the company name) and then define a list of functions that should boost the _score for a certain sub-set of documents. From the same documentation:

Furthermore, several functions can be combined. In this case one can optionally choose to apply the function only if a document matches a given filter

So, you have the query that looks for companies with a certain name, then you have a filter for a function to manipulate the _score for the documents that match the filter. And your filter, in this case, is the "source" that should contain something. The function itself is a script: _score + 2. In the end, this would be my idea:

    {
      "query": {
        "bool": {
          "should": [
            {
              "function_score": {
                "query": {
                  "bool": {
                    "should": [
                      {
                        "bool": {
                          "should": [
                            {
                              "match": {
                                "workhistory.positions.company.original": "some company"
                              }
                            }
                          ]
                        }
                      }
                    ],
                    "minimum_should_match": "100%"
                  }
                },
                "functions": [
                  {
                    "filter": {
                      "nested": {
                        "path": "workhistory.positions",
                        "query": {
                          "bool": {
                            "should": [
                              {
                                "match": {
                                  "workhistory.positions.source": "some source"
                                }
                              }
                            ]
                          }
                        }
                      }
                    },
                    "script_score": {
                  "script": "_score + 2"
                }
              },
              {
                "filter": {
                  "nested": {
                    "path": "workhistory.positions",
                    "query": {
                      "bool": {
                        "should": [
                          {
                            "match": {
                              "workhistory.positions.source": "xxx"
                            }
                          }
                        ]
                      }
                    }
                  }
                },
                "script_score": {
                  "script": "_score + 4"
                }
              }
            ],
            "max_boost": 5,
            "score_mode": "sum",
            "boost_mode": "sum"
          }
        }
      ]
    }
  }
}
like image 133
Andrei Stefan Avatar answered Nov 15 '22 07:11

Andrei Stefan