Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch boost documents score based on results from a query on a different type

I'm prototyping an e-commerce product catalog based on ElasticSearch. Each product is indexed as a document (which contains properties like name and description).

There's one thing I can't tackle, I want to boost score for certain products based on user's purchase history.

The only option that I can think of is to store purchase history as a child document of product. Then use custom_filters_score with filter that looks for child documents with given userId. In this case the filter determines if given product have been bought by given user, if so, it'll boost the score.

The problem with this approach is that some products might be purchased hundreds of thousands times each month and I'm not sure how ElasticSearch will perform in such circumstances.

The perfect solution would be if I could put purchase history in a separate index or in the same index but as a different document type (let say 'userspurchasehistory'). Example document:

{userId: 1234, purchesedProducts: [34,112323,1223,32342,31234]}

Then use query score boosting which expresses something like this: If term 34 (productId) present in 'purchesedProducts' (field name) of userspurchasehistory (type) document which has 'userId' equal 1234, then boost query by factor 2.

Any ideas or thoughts here ?

UPDATE:

I've performed some test for a big catalog of products and a big amount of sales data: Product(type) document count: 500 000 SalesHistory(type) document count: 14 000 000 Index size: 2.5GB Elastic Serach: one node, all default settings

SalesHistory docuemtns are child documents of Product documents. Distribution of sales entries:

~20% of products: 40 entries 
~20% of products: 30 entries 
~20% of products: 20 entries 
~20% of products: 10 entries 
~20% of products: 5 entries 

200 products with 10 000 sales entries (plus previously added 5-40 entries)
200 products with  5 000 sales entries (plus previously added 5-40 entries)
200 products with  2 500 sales entries (plus previously added 5-40 entries)
200 products with  1 000 sales entries (plus previously added 5-40 entries)
200 products with    500 sales entries (plus previously added 5-40 entries)
1 product 18 500 entries

Example query:

curl -XGET "http://localhost:9200/demoproducts/_search" -d'
{
   "query": {
      "custom_filters_score": {
         "query": {
            "match_all": {}
         }
      },
      "filters": [
         {
            "filter": {
               "has_child": {
                  "type": "saleshistory",
                  "query": {
                     "term": {
                        "userId": {
                           "value": "28875"
                        }
                     }
                  }
               }
            },
            "boost": 2
         }
      ]
   }
}'

Result:

{
  "took": 33,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 500001,
    "max_score": 2
    ...
  }
}

When I added some filter to my query (almost in all cases our queries contain some filters) response times were around 7ms

Conclusion

There is no point to implement this case in any other way then as child documents.

like image 256
Sebastian.Belczyk Avatar asked Nov 10 '22 14:11

Sebastian.Belczyk


1 Answers

Instead of modifying the documents, you could dynamically build a terms query with the user's purchase history in it.

curl -XGET "http://localhost:9200/demoproducts/_search" -d'
    {
       "query": {
           "terms": {"id":["34","112323","1223","32342","31234"]}
        }
    }
}
like image 120
Leo Bartkus Avatar answered Nov 15 '22 09:11

Leo Bartkus