Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting with only by term frequency in elasticsearch

I have users with fields city, country, followersAmount and some others. When I search by "New York, USA" in city and country fields with sorting by followers amount, I need firstly display people from "New York, USA" sorted by followersAmount descending, and after them i need display people from other cities from USA sorted also by followersAmount descending. I think i can do it with scoring only by term frequency and sorting firstly by score, secondly by followers amount, but I cannot found how can i configure that.

like image 725
dimka2014 Avatar asked Jan 27 '26 14:01

dimka2014


1 Answers

What about something like this:

{
    "query" : {
        "bool" : {
            "should" : [
                {
                    "constant_score" : {
                        "query" : {
                            "match" : {
                                "city" : "New York"
                            }
                        }
                    }
                },
                {
                    "constant_score" : {
                        "query" : {
                            "match" : {
                                "country" : "USA"
                            }
                        }
                    }
                }
            ]
        }
    },
    "sort" : [
        "_score",
        { "followersAmount" : { "order" : "desc"} }
    ]
}

You can expect the people from "New York, USA" to get the same score. The people not from New York but from USA will get the same score which is lower. For those with the same score they will be sorted by followersAmount. Of course this is just a initial query to get you started - might need more tweaks and stuff.

EDIT: Updated with constant_score

I expected the basic TF-IDF algorithm and the incorporation of field length to help out. Generally, I would expect the cities' terms to have a larger associated IDF when compared to the countries' terms. So having the higher scores for city match seems desirable. In terms of TF and field length norms, scoring a person with only a single matching city higher than a person with say two cities (if you happen to have arrays for these fields to allow multiple cities) also seems favorable. But then, I am not sure what your data looks like. I have updated the query so that Elasticsearch's basic algorithm does not have such an impact using constant_score query.

like image 128
eemp Avatar answered Jan 30 '26 05:01

eemp