Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

query for document that two fields are equal?

Tags:

solr

lucene

There are two text fields in solr, both of them are white space tokenized and have lower case filter. below is the schema:

<fieldType name="text_ac" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="field1" type="text_ac" indexed="true" stored="true" required="false" omitNorms="true" default=""/>

<field name="field2" type="text_ac" indexed="true" stored="true" required="false" omitNorms="true" default=""/>

How to query solr to return results that the whole string of field1 is same as field2 at query time (field1==field2)?

Thanks.

like image 868
Henry Avatar asked Nov 29 '22 01:11

Henry


2 Answers

For how to correctly query Solr on equality between two fields, please see Nicholas DiPiazza's answer

Given that the question specifies comparing the full contents of two text (that is analyzed) fields, I believe that won't work well with function queries and the like, so two approaches:

  • Rethink what you are trying to do, or change the index structure. Should those be strings instead of text? If so, do that then refer, as above, to Nicholas DiPiazza's answer.

  • (Original Answer here) A simple way to accomplish this would be to perform the comparison at index time, and store the result in the index. That is, if you have field1 and field2, create a field 1_equals_2, and index it with true, if they are equal based on your comparison when adding the document. Then you can simply search for 1_equals_2:true.

like image 186
femtoRgon Avatar answered Dec 01 '22 16:12

femtoRgon


Method 1 - frange parser

As mentioned by @dduo you can use the https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-FunctionRangeQueryParser. Here's the way Trey Grainger (one of the authors of Solr in Action) said to do it:

q=*:*&fq={!frange l=1 u=1 v=$equals}&equals=if(eq(field1,field2),1,0)

I tested this and it worked for a collection with 140 million documents in about 10 second query with 600,000 in the result set.

So this works, but it's kinda slow.

Method 2 - Use a streaming expression

The following expression seems to work to do what we are looking to do here:

having(search(your_collection_name, q="*:*", sort="id asc"), eq(field1, field2))

This seems to be much more performant, as it returns instant results. So if you can use streaming expressions, this is probably a faster way to get what you are looking for.

like image 34
Nicholas DiPiazza Avatar answered Dec 01 '22 15:12

Nicholas DiPiazza