Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: Order by date field (descending): gauss or field_value_factor?

I have an issue concerning the modification of the score document according to its creation date. I have tried gauss function and field_value_factor.

The fist one is (all the query clause):

@search_definition[:query] = {
                           function_score:{
                              query: {
                                  bool: {
                                      must: [
                                          {
                                  query_string: {
                                      query: <query_term>,
                                      fields: %w( field_1ˆ2
                                                         field_2ˆ3
                                                         ...
                                                         field_n^2),
                                      analyze_wildcard: true,
                                      auto_generate_phrase_queries: false,
                                      analyzer: 'brazilian',
                                      default_operator: 'AND'
                                  }
                              }
                             ],
                            filter: {
                                       bool: {
                                            should: [
                                                 { term: {"boolean_field": false}},
                                                 { terms:     {"array_field_1": options[:key].ids}},
                  { term: {"array_field_2.id": options[:key].id}}
                ]
             }
        }
                            }
                          },
                                gauss:{
                                  date_field: {
                                      scale: "1d",
                                      decay: "0.5"
                                  }
                                }
                  }
          }

With this configuration, I am telling elastic that the last documents must have a higher score. When I execute the query with it, the result is totally the opposite! The oldest documents are being returned firstly. Even if I change the origin to

origin: "2010-05-01 00:00:00"

which is the date of the first document, the oldest ones are also being retrieved firstly. What am I doing wrong?

With field_value_factor, the things are better, but not yet what I am waiting for.... (all the query clause is)

@search_definition[:query] = {
                           function_score:{
                              query: {
                                  bool: {
                                      must: [
                                          {
                                  query_string: {
                                      query: <query_term>,
                                      fields: %w( field_1ˆ2
                                                         field_2ˆ3
                                                         ...
                                                         field_n^2),
                                      analyze_wildcard: true,
                                      auto_generate_phrase_queries: false,
                                      analyzer: 'brazilian',
                                      default_operator: 'AND'
                                  }
                              }
                             ],
                            filter: {
                                       bool: {
                                            should: [
                                                 { term: {"boolean_field": false}},
                                                 { terms:     {"array_field_1": options[:key].ids}},
                  { term: {"array_field_2.id": options[:key].id}}
                ]
             }
        }
                            }
                          },
                                field_value_factor: {
                                     field: "date_field",
                                     factor : 100,
                                      modifier: "sqrt"
                                   }

                  }
          }

With this other configuration, the documents from 2016 and 2015 are being returned firstly, however there are tons of documents from 2016 that receive less score than others from 2015, even if I set a modifier "sqrt" with factor: 100 !!!!

I suppose guass function would be the appropriate solution. How can I invert this gauss result? Or how can I increase the field_value_factor so that the 2016 comes before the 2015??

Thanks a lot,

Guilherme

like image 385
gui_maranhao Avatar asked Oct 10 '16 18:10

gui_maranhao


1 Answers

You might want to try putting gauss function insides functions param and give it a weight like following query. I also think scale is too low which could be making lot of documents score zero. I have also increased decay to 0.8 and given higher weight to recent documents. You could also use explain api to see how scoring is done.

{
    "function_score": {
        query: {
            bool: {
                must: [{
                    query_string: {
                        query: < query_term > ,
                        fields: % w(field_1ˆ2 field_2ˆ3
                            ...field_n ^ 2),
                        analyze_wildcard: true,
                        auto_generate_phrase_queries: false,
                        analyzer: 'brazilian',
                        default_operator: 'AND'
                    }
                }],
                filter: {
                    bool: {
                        should: [{
                            term: {
                                "boolean_field": false
                            }
                        }, {
                            terms: {
                                "array_field_1": options[: key].ids
                            }
                        }, {
                            term: {
                                "array_field_2.id": options[: key].id
                            }
                        }]
                    }
                }
            }
        },
        "functions": [{
            "gauss": {
                "date_field": {
                    "origin": "now"
                    "scale": "30d",
                    "decay": "0.8"
                }
            },
            "weight": 20
        }]
    }
}

Also the origin should be latest date so rather than origin: "2010-05-01 00:00:00", try

origin: "2016-05-01 00:00:00"

Does this help?

like image 127
ChintanShah25 Avatar answered Nov 15 '22 06:11

ChintanShah25