I'm prototyping an e-commerce product catalog based on ElasticSearch. Each product is indexed as a document (which contains properties like name and description).
There's one thing I can't tackle, I want to boost score for certain products based on user's purchase history.
The only option that I can think of is to store purchase history as a child document of product. Then use custom_filters_score with filter that looks for child documents with given userId. In this case the filter determines if given product have been bought by given user, if so, it'll boost the score.
The problem with this approach is that some products might be purchased hundreds of thousands times each month and I'm not sure how ElasticSearch will perform in such circumstances.
The perfect solution would be if I could put purchase history in a separate index or in the same index but as a different document type (let say 'userspurchasehistory'). Example document:
{userId: 1234, purchesedProducts: [34,112323,1223,32342,31234]}
Then use query score boosting which expresses something like this: If term 34 (productId) present in 'purchesedProducts' (field name) of userspurchasehistory (type) document which has 'userId' equal 1234, then boost query by factor 2.
Any ideas or thoughts here ?
UPDATE:
I've performed some test for a big catalog of products and a big amount of sales data: Product(type) document count: 500 000 SalesHistory(type) document count: 14 000 000 Index size: 2.5GB Elastic Serach: one node, all default settings
SalesHistory docuemtns are child documents of Product documents. Distribution of sales entries:
~20% of products: 40 entries
~20% of products: 30 entries
~20% of products: 20 entries
~20% of products: 10 entries
~20% of products: 5 entries
200 products with 10 000 sales entries (plus previously added 5-40 entries)
200 products with 5 000 sales entries (plus previously added 5-40 entries)
200 products with 2 500 sales entries (plus previously added 5-40 entries)
200 products with 1 000 sales entries (plus previously added 5-40 entries)
200 products with 500 sales entries (plus previously added 5-40 entries)
1 product 18 500 entries
Example query:
curl -XGET "http://localhost:9200/demoproducts/_search" -d'
{
"query": {
"custom_filters_score": {
"query": {
"match_all": {}
}
},
"filters": [
{
"filter": {
"has_child": {
"type": "saleshistory",
"query": {
"term": {
"userId": {
"value": "28875"
}
}
}
}
},
"boost": 2
}
]
}
}'
Result:
{
"took": 33,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 500001,
"max_score": 2
...
}
}
When I added some filter to my query (almost in all cases our queries contain some filters) response times were around 7ms
Conclusion
There is no point to implement this case in any other way then as child documents.
Instead of modifying the documents, you could dynamically build a terms query with the user's purchase history in it.
curl -XGET "http://localhost:9200/demoproducts/_search" -d'
{
"query": {
"terms": {"id":["34","112323","1223","32342","31234"]}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With