Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr: Using a wildcard on a string with whitespace

I have basically the same problem as discussed here: Solr wildcard query with whitespace, but this question was not answered.

I'm using a wildcard in a filter query on a field called "brand."

I'm having trouble when the brand name has whitespace in it. For instance, filtering the brand "Lexington" works fine when I say fq={!tag=brand}brand:Lexing*n. A multi-word brand like "Athentic Models" causes problems, however. It seems double quotes must be included around the name.

When there are "s, *s don't do anything, ie brand:"Athentic Mode*" or brand:"Lexingt*", won't match anything. Without double quotes, it does work to say brand:Authen*, with no quotes and no space, and that will match Authentic Models. But once whitespace is included in the brand name, it seems to only consider the string up to the first space when matching.

The brand field is of type

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

which is not whitespace tokenized, to my understanding. It is populated with a copyField from a whitespace tokenized field, though.

Is there something I can do to stop Solr from tokenizing the filter query without using double quotes?

like image 365
Jon B Avatar asked Sep 11 '12 22:09

Jon B


2 Answers

Just like Rob said in his answer, I've posted an answer on my own on the question he linked to.

All you need to do is escape the space in your query (as in, customer_name:Pop *Tart --> customer_name:Pop\ *Tart). From my experience, this method works no matter where you place the wildcard, which is backed up by how Solr claims that something like:

customer_name:Pop\ *Tart

Is parsed as:

customer_name:Pop *Tart
like image 69
Aubergine Avatar answered Nov 10 '22 14:11

Aubergine


Try to change the type from string to something like text. String type is not tokenized so when there is a whitespace in a string field, it will try to match your query, including the whitespace in the field.

in the default schema file you can see this line just above the string field type

<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->

using a text type should fix your problem, like text_general or a similar one.

like image 23
denizdurmus Avatar answered Nov 10 '22 15:11

denizdurmus