Given the following example solr documents:
<doc>
<field name="guid">1</field>
<field name="name">Harry Potter</field>
<field name="friends">ron</field>
<field name="friends">hermione</field>
<field name="friends">ginny</field>
<field name="friends">dumbledore</field>
</doc>
<doc>
<field name="guid">2</field>
<field name="name">Ron Weasley</field>
<field name="friends">harry</field>
<field name="friends">hermione</field>
<field name="friends">lavender</field>
</doc>
<doc>
<field name="guid">3</field>
<field name="name">Hermione Granger</field>
<field name="friends">harry</field>
<field name="friends">ron</field>
<field name="friends">ginny</field>
<field name="friends">dumbledore</field>
</doc>
and the following query (or filter query):
friends:ron OR friends:hermione OR friends:ginny OR friends:dumbledore
all three documents will be returned since they each have at least one of the specified friends.
However, I'd like to set a minimum (and maximum) threshold for how many friends are matched. For example, only return documents that have at least 2 but no more than 3 of the specified friends.
Such a query would only return the third document (Hermione Granger) as she has 3 of the 4 friends specified, while the first (Harry Potter) matches all 4 and the second (Ron Weasley) matches only 1.
Is this possible in a Solr query?
Standard solr queries use the "q" parameter in a request. Filter queries use the "fq" parameter. The primary difference is that filtered queries do not affect relevance scores; the query functions purely as a filter (docset intersection, essentially).
The fq (Filter Query) ParameterThe fq parameter defines a query that can be used to restrict the superset of documents that can be returned, without influencing score. It can be very useful for speeding up complex queries, since the queries specified with fq are cached independently of the main query.
Solr supports a variety of Response Writers to ensure that query responses can be parsed by the appropriate language or application. The wt parameter selects the Response Writer to be used.
You'll want to use a function query, termfreq
, and count the number of terms (aka "friends" in your case) matched. You can sum up the results, then only return documents within your threshold, using frange
, like this:
{!frange l=2 u=3}sum(termfreq(friends,'ron'),termfreq(friends,'hermione'),termfreq(friends,'ginny'),termfreq(friends,'dumbledore'))
termfreq(...)
will return 1
for each friend found, and the sum of those is what you test against your threshold (the lower and upper bounds you specified in the beginning of your !frange
statement).
You can place this in the q:
field or fq:
field. Here it is in the Solr admin panel for your reference:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With