How does Lucene work with quotes and wildcards

Tags:

When I search in lucene for the Dutch word bieten is their a difference between the following: bieten, "bieten", "*bieten*" and *bieten* when using the DutchAnalyzer and allowing leading wildcards?

Because as far I can find in thee parser syntax the quotes are there just to handle spaces and all words are always search like their are wildcards around them.

The reason I ask this question because I found out that by using the DutchAnalyzer all words are striped of their plural before being entered in the index. Which in my case means biet is stored in the index and not bieten. And when searching with bieten or "bieten" or "bieten" it also modifies the query to biet.
But when I'm using *bieten* the query doesn't change and stays a plural. Which doesn't give any results.
So

  bieten   -->> biet 
 "bieten"  -->> biet
"*bieten*" -->> biet 
 *bieten*  -->> *bieten*

Why is the last search translated to a different query then the others.

Queryparser syntax: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Screenshot Lucene: http://oi63.tinypic.com/1z5krdg.jpg

515

asked Mar 04 '16 13:03

Jeroen

1 Answers

Wildcard, regex and fuzzy queries are not analyzed by the query parser, that's why it's different.

Words are definitely not searched with wildcards around them. The query *bieten* would be intended to match things like "xxbietenxx". Finding words within a sentence does not involve wildcards, though. That's what analysis is for. It splits the text into single-word terms.

To explain each of those queries:

bieten - Simple term query. Search for the given word.
"bieten" - Phrase query. Analyze and find the given multi-term phrase. In this case the phrase is one term long, and so the same as a term query.
"*bieten*" - Again, phrase query. Not a wildcard query in any way. You can't use wildcards in phrases. The analyzer will remove the punctuation, making this identical to the last one.
*bieten* - Wildcard query. This will match "bietenxx", "xxbieten", and "xxbietenxx", but will not be analyzed, and so won't match the post-analysis term "biet".

104

answered Nov 24 '22 04:11

femtoRgon

Related questions
                            
                                OutOfMemoryError: Java heap space error when start solr
                            
                                Lucene and Special Characters
                            
                                Indexing different type of Entities/Objects with Solr Lucene
                            
                                How to highlight nested fields in Elasticsearch
                            
                                How does Lucene/Solr achieve high performance in multi-field / faceted search?
                            
                                How to handle very frequent updates to a Lucene index
                            
                                Fastest way to count all results in Lucene (java)
                            
                                Inserting values into Solr boolean fields
                            
                                Lucene .net Boost not working when using * wildcard
                            
                                Lucene case sensitive & insensitive search
                            
                                Lucene.NET - sorting by int
                            
                                difference between FSDirectory and MMap Directory?
                            
                                Index a MySQL database with Apache Lucene, and keep them synchronized
                            
                                Using inner Join in Solr query
                            
                                Hibernate Search sorting
                            
                                Recrawl URL with Nutch just for updated sites
                            
                                Lucene: Score calculation with a PrefixQuery
                            
                                Boost factor in MultiFieldQueryParser
                            
                                is it mandatory to optimize the lucene index after write?
                            
                                Delete all index data/files in disk using Apache Lucene?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does Lucene work with quotes and wildcards

Tags:

quotes

lucene

wildcard

Jeroen

People also ask

1 Answers

femtoRgon

Recent Activity

Donate For Us