Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lucene wildcard query with space

I have Lucene index which has city names. Consider I want to search for 'New Delhi'. I have string 'New Del' which I want to pass to Lucene searcher and I am expecting output as 'New Delhi'. If I generate query like Name:New Del* It will give me all cities with 'New and Del'in it. Is there any way by which I can create Lucene query wildcard query with spaces in it? I referred and tried few solutions given @ http://www.gossamer-threads.com/lists/lucene/java-user/5487

like image 372
Ashishkumar Chougule Avatar asked Dec 30 '15 12:12

Ashishkumar Chougule


People also ask

How do you use the wildcard in Lucene?

Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries). To perform a single character wildcard search use the "?" symbol. To perform a multiple character wildcard search use the "*" symbol. You can also use the wildcard searches in the middle of a term.

What are Lucene special characters?

Special characters( ) { } [ ] ^ “ ~ * ? : \ / are reserved for the Lucene Query String parser, so you'll need to escape them with \ before the character if you need to use it. For example, f-150 should be wrapped up as f\-150 , or wrapped inside double quotes as "f-150" .

What is the difference between Lucene and Elasticsearch?

Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene. Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally.

What is Lucene query syntax?

What is Lucene Query Syntax? Lucene is a query language that can be used to filter messages in your PhishER inbox. A query written in Lucene can be broken down into three parts: Field The ID or name of a specific container of information in a database.


1 Answers

It sounds like you have indexed your city names with analysis. That will tend to make this more difficult. With analysis, "new" and "delhi" are separate terms, and must be treated as such. Searching over multiple terms with wildcards like this tends to be a bit more difficult.

The easiest solution would be to index your city names without tokenization (lowercasing might not be a bad idea though). Then you would be able to search with the query parser simply by escaping the space:

QueryParser parser = new QueryParser("defaultField", analyzer);
Query query = parser.parse("cityname:new\\ del*");

Or you could use a simple WildcardQuery:

Query query = new WildcardQuery(new Term("cityname", "new del*"));

With the field analyzed by standard analyzer:

You will need to rely on SpanQueries, something like this:

SpanQuery queryPart1 = new SpanTermQuery(new Term("cityname", "new"));
SpanQuery queryPart2 = new SpanMultiTermQueryWrapper(new WildcardQuery(new Term("cityname", "del*")));
Query query = new SpanNearQuery(new SpanQuery[] {query1, query2}, 0, true);

Or, you can use the surround query parser (which provides query syntax intended to provide more robust support of span queries), using a query like W(new, del*):

org.apache.lucene.queryparser.surround.parser.QueryParser surroundparser = new org.apache.lucene.queryparser.surround.parser.QueryParser();
SrndQuery srndquery = surroundparser.parse("W(new, del*)");
query = srndquery.makeLuceneQueryField("cityname", new BasicQueryFactory());
like image 188
femtoRgon Avatar answered Nov 01 '22 03:11

femtoRgon