Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use regex for querying in Solr 4

Tags:

regex

solr

I've reached the point of desperation, so I'm asking for help. I'm trying to query results from a Solr 4 engine using regex.

Let's asume the document I want to query is:

<str name="text">description: best company; name: roca mola</str>

And I want to query using this regex:

description:(.*)?company(.*)?;

I read in some forums that using regex in Solr 4 was as easy as adding slashes, like:

localhost:8080/solr/q=text:/description\:(.*)?company(.*)?;/

but it isn't working. And this one doesn't work either:

localhost:8080/solr/q=text:/description(.*)?company(.*)?;/

I don't want a simple query like:

localhost:8080/solr/q=text:*company*

Since that would mismatch documents like:

<str name="text">description: my home; name: mother company"</str>

If I'm not clear please let me know.

Cheers from Chile :D

NOTE: I was using text_general fields on my scheme. As @arun pointed out, string fields can handle the type of regex I'm using.

like image 318
cgajardo Avatar asked Feb 14 '13 19:02

cgajardo


1 Answers

Instead of trying regex search on text field type, try it on a string field type, since your regex is spanning more than one word. (If your regex needs to match a single word, then you can use a text field.)

Also do percent encoding of special characters just to make sure they are not the cause for the mismatches.

q=strfield:/description%3A(.*?)company(.*?)%3B.*/

Update: Just tried it on a string field. The above regex works. It works even without the percent encoding too i.e.

q=strfield:/description:.*?company.*?;.*/
like image 105
arun Avatar answered Sep 20 '22 19:09

arun