I use Solr's proximity search quite often to search for words within a specified range of each other, like so
"Government Spending" ~2
I was wondering is there a way to perform a proximity search using a phrase and a word or two phrases. Is this possible? If so what is the syntax?
PhraseQuery in Lucene matches documents containing a particular sequence of terms. PhraseQuery uses positional information of the term that is stored in an index. The number of other words permitted between words in query phrase is called “Slop“. We can use the tilde, “~”, symbol at the end of our Phrase for this.
Solr queries require escaping special characters that are part of the query syntax. Special characters are: +, -, &&, ||, !, (, ), ", ~, *, ?, and : . To escape these characters, use a slash ( \ ) before the character to escape.
In the lucid works SOLR training they suggested you do multiple fq parameters instead of a single one joined with AND for performance reasons. So in the sample it would be q=*:*&fq=(catid:90 OR catid:81)&fq=priceEng:[38 TO 40]&fq=.... etc.
Out of the box I have discovered a way to perform a Solr proximity search using more then one word, or phrases, see below
eg. with 3 words:
"(word1) (word2) (word3)"~10
eg. with 2 phrases: (note the double quote needs to be escaped)
"(\"phrase1\") (\"phrase2\")"~10
Since Solr 4 it is possible with SurroundQueryParser.
E.g. to query where "phrase two" follows "phrase one" not further than 3 words after:
3W(phrase W one, phrase W two)
To query "phrase two" in proximity of 5 words of "phrase one":
5N(phrase W one, phrase W two)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With