Lucene search and underscores

Tags:

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"

Is there a simple way to escape the underscore (_) character so that it will search for it?

EDIT:

4/1/2010 11:08AM PST

I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:

"bb hhh_ffff5_ssss"

After some testing, I've found that this is because of the number. If I input

"BB_HHH_FFFF_SSSS", I get

"bb hhh ffff ssss"

At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.

Can anyone confirm this?

704

asked Mar 26 '10 00:03

Matt

2 Answers

It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.

Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.

124

answered Sep 27 '22 19:09

bajafresh4life

I don't think you'll be able to use the standard analyser for this use case.

Judging what I think your requirements are, the keyword analyser should work fine for little effort (the whole field becomes a single term).

I think some of the confusion arises when looking at the field with luke. The stored value is not what's used by queries, what you need are the terms. I suspect that when you look at the terms stored for your field, they'll be "my" and "value".

Hope this helps,

answered Sep 27 '22 18:09

Adrian Conlon

Related questions
                            
                                Lucene: how to get the score of a document
                            
                                Type cast exception when doing custom_score query
                            
                                Lucene 4 Pagination
                            
                                ElasticSearch query_string fails to parse query with some characters
                            
                                SimplePosttool: FATAL: specifying either url or core/collection is mandatory
                            
                                Elasticsearch Scan&scroll with JEST API
                            
                                JOINS in Lucene
                            
                                Can Lucene return several search results from a single indexed file?
                            
                                Sort different groups using different sort orders in solr
                            
                                Why does Lucene.NET cause OutOfMemoryException when indexing large files?
                            
                                Compass Lucene hits
                            
                                Keeping query statistics using lucene
                            
                                Alternative IndexProvider for Neo4J 1.9.1
                            
                                AND query in elasticsearch with curl
                            
                                Solr Custom Similarity - Using a field from the indexed document
                            
                                How do I estimate the size of a Lucene index?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Lucene search and underscores

Tags:

lucene

lucene.net

Matt

People also ask

2 Answers

bajafresh4life

Adrian Conlon

Recent Activity

Donate For Us