Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch sort parent by inner hits doc count

Let's say I am indexing into Elasticsearch a bunch of Products and Stores in which the product is available. For example, a document looks something like:

{
  name: "iPhone 6s",
  price: 600.0,
  stores: [
    {
      name: "Apple Store Union Square",
      location: "San Francisco, CA"
    },
    {
      name: "Target Cupertino",
      location: "Cupertino, CA"
    },
    {
      name: "Apple Store 5th Avenue",
      location: "New York, NY"
    }
    ...
  ]
}

and using the nested type, the mappings will be:

"mappings" : {
  "product" : {
    "properties" : {
      "name" : {
        "type" : "string"
      },
      "price" : {
        "type" : "float"
      },
      "stores" : {
        "type" : "nested",
        "properties" : {
          "name" : {
            "type" : "string"
          },
          "location" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

I want to create a query to find all the products that are available in certain location, say "CA", and then sort by the number of stores matched. I know Elasticsearch has a inner hit feature which allows me to find hits in the nested Store documents, but is sorting Product based on the doc_count of the inner hit possible? And to extend the question further, is sorting the parent documents based on some inner aggregation possible? Thanks in advance.

like image 386
Jason Cheok Wan Avatar asked Feb 08 '23 06:02

Jason Cheok Wan


1 Answers

What you are trying to achieve is possible. Currently you are not getting expected results because by default score_mode parameter is avg in nested query, so if 5 stores match the given product they might be scored lower than say one which matches 2 stores only because the _score is calculated by taking average.

This problem can be solved by summing all the inner hits by specifying score_mode as sum. One minor problem could be field length norm i.e match in shorter field gets higher score than bigger field. so in your example Cupertino, CA will get bit higher score than San Francisco, CA. You can check this behavior with inner hits. To solve this you need to disable the field norms. Change location mapping to

"location": {
    "type": "string",
    "norms": {
        "enabled": false
    }
}

After that this query will give you desired results. I included inner hits to demonstrate equal score for every matched nested doc.

{
  "query": {
    "nested": {
      "path": "stores",
      "query": {
        "match": {
          "stores.location": "CA"
        }
      },
      "score_mode": "sum",
      "inner_hits": {}
    }
  }
}

This will sort the products based on the number of stored matched.

Hope this helps!

like image 172
ChintanShah25 Avatar answered Feb 12 '23 15:02

ChintanShah25