Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch subset filter

I have a dataset about books, each of which can be in one or more languages. Every user is registered as having one or more languages.

When a user searches for books, I'd like to return only those books where they understand all of its languages.

For example, the following two books are in the system:

Book A: English, French, German
Book B: English, Greek

If John is registered as knowing English, German, French, and Italian, then his query results should never include Book B.

My system is currently written using Apache Solr, where I ended up writing a plugin to perform a subset operation (where a record matches if the languages of the record are a subset of the languages of the user, where the user's languages are declared in the query).

However, I'd like to transition to an Elasticsearch backend. This particular subsetting behavior, however, doesn't seem to be part of the core filter package. Am I missing something, or should I look at writing a similar plugin / custom filter?

like image 868
Ryan Kohl Avatar asked Apr 11 '26 19:04

Ryan Kohl


1 Answers

This can be done using a script filter , you can pass it a comma separated list of strings as a param and use for loop to ensure each component is contained , if even one is not use break and return false. if all present loop exits and it returns a true.

I'm not sure how efficient this is, but theoretically this can be done on elasticsearch. Ideally apply an optimized filter to narrow down the set of books and then run this on those subsets look at https://www.elastic.co/blog/all-about-elasticsearch-filter-bitsets and docs on post_filters, the efficiency should be ideally tested over a bunch of queries as this filter will preform better once its result begins to be cached

like image 125
dsathe Avatar answered Apr 17 '26 13:04

dsathe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!