Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: Learning from clicks (Search result ranking)

I have read over the chapter "Learning from clicks" in the book Programming Collective Intelligence and liked the idea: The search engine there learns on which results the user clicked and use this information to improve the ranking of results.

I think it would improve the quality of the search ranking a lot in my Java/Elasticsearch application if I could learn from the user clicks.

In the book, they build a multiplayer perceptron (MLP) network to use the learned information even for new search phrases. They use Python with a SQL database to calculate the search ranking.

Has anybody implemented something like this already with Elasticsearch or knows an example project? It would be great, if I could manage the clicking information directly in Elasticsearch without needing an extra SQL database.

like image 610
Sonson123 Avatar asked Nov 03 '14 12:11

Sonson123


2 Answers

In the field of Information Retrieval (the general academic field of search and recommendations) this is more generally known as Learning to Rank. Whether its clicks, conversions, or other forms of sussing out what's a "good" or "bad" result for a keyword search, learning to rank uses either a classifier or regression process to learn what features of the query and document correlate with relevance.

Clicks?

For clicks specifically, there's reasons to be skeptical that optimizing clicks is ideal. There's a paper from Microsoft Research I'm trying to dig up that claims that in their case, clicks are only 45% correlated with relevance. Click+dwell is often a more useful general-purpose indicator of relevance.

There's also the risk of self-reinforcing bias in search, as I talk about in this blog article. There's a chance that if you're already showing a user mediocre results, and they keep clicking on those mediocre results, you'll end up reinforcing search to keep showing users mediocre results.

Beyond clicks, there's often domain-specific considerations for what you should measure. For example, clasically in e-commerce, conversions matter. Perhaps a search result click that led to such a purchase should count more. Netflix famously tries to suss out what it means when you watch a movie for 5 minutes and go back to the menu vs 30 minutes and exit. Some search use cases are informational: clicking may mean something different when you're researching and clicking many search results vs when you're shopping for a single item.

So sorry to say it's not a silver bullet. I've heard of many successful and unsuccessful attempts at doing Learning to Rank and it mostly boils down to how successful you are at measuring what your users consider relevant. The difficulty of this problem surprises a lot of peop.le

For Elasticsearch...

For Elasticsearch specifically, there's this plugin (disclaimer I'm the author). Which is documented here. Once you've figured out how to "grade" a document for a specific query (whether its clicks or something more) you can train a model that can be then fed into Elasticsearch via this plugin for your ranking.

like image 82
Doug T. Avatar answered Sep 28 '22 01:09

Doug T.


What you would need to do is store information about the clicks in a field inside the Elasticsearch index. Every click would result in an update of a document. Since an update action is actually a delete and insert Update API, you need to make sure your document text is stored, not only indexed. You can then use a Function Score Query to build a score function reflecting the value stored in the index.

Alternatively, you could store the information in a separate database and use a script function inside the score function to access the database. I wouldn't suggest this solution due to performance issues.

like image 36
Konrad Holl Avatar answered Sep 28 '22 01:09

Konrad Holl