Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling + as a special character in Lucene search

How do i make sure lucene gives me back relevant search results when my input string contains terms like c++? Lucene seems to ignore ++ characters.

Code details: When I execute this line,I get a blank search query.

queryField = multiFieldQueryParser.Parse(inpKeywords);

keywordsQuery.Add(queryField, BooleanClause.Occur.SHOULD);

And here is my custom analyzer:

public class CustomAnalyzer : Analyzer
    {
        private static readonly WhitespaceAnalyzer whitespaceAnalyzer = new WhitespaceAnalyzer();
    public override TokenStream TokenStream(String fieldName, System.IO.TextReader reader)
        {
            TokenStream result = whitespaceAnalyzer.TokenStream(fieldName, reader);
            result = new StandardTokenizer(reader);
            result = new LowerCaseFilter(result);
            result = new StopFilter(result, stop_words);
            return result;
        }
}

And I'm executing search query this way:

indexSearcher.Search(searchQuery, collector);

I did try queryField = multiFieldQueryParser.Parse(QueryParser.Escape(inpKeywords));,but it still does not work. Here is the query which get executed and returns zero hits. "+(())"

Thanks.

like image 902
Ed. Avatar asked Oct 21 '09 02:10

Ed.


People also ask

How do you find special characters in Lucene?

Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries). To perform a single character wildcard search use the "?" symbol. To perform a multiple character wildcard search use the "*" symbol.

How do you handle special characters in search?

To search for a special character that has a special function in the query syntax, you must escape the special character by adding a backslash before it, for example: To search for the string “where?”, escape the question mark as follows: “where\?”

How do you escape Lucene?

Lucene supports escaping special characters that are part of the query syntax. To escape a special character, precede the character with a backslash ( \ ).

Is Lucene case sensitive?

Lucene search is case-sensitive, but all input is usually lowercased when passing through QueryParser, so it feels like it is case insensitive (This is the case of the findBySimpleQuery() method. In other words, don't lowercase your input before indexing, and don't lowercase your queries.


2 Answers

Since, + is a special character, it needs to be escaped. The list of all characters that need to be escaped is here (See bottom of the page.)

You also need to be careful about the analyzer you use while indexing. For example, StandardAnalyzer will skip +. You may need to use something like WhiteSpaceAnalyzer while indexing and searching, which will preserve special characters in the tokenstream. Keep in mind that you need to use the same analyzer while indexing and searching.

like image 109
Shashikant Kore Avatar answered Sep 21 '22 20:09

Shashikant Kore


In addition to choosing the right analyzer, you can use QueryParser.Escape(string s) to ensure all special characters are properly escaped.

Because this is a static function, you can use it, even if you're using MultiFieldQueryParser.

For example, you can try something like this:

queryField = multiFieldQueryParser.Parse(QueryParser.Escape(inpKeywords));
like image 22
Jesse MacVicar Avatar answered Sep 21 '22 20:09

Jesse MacVicar