Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene RangeQuery doesn't filter appropriately

I'm using RangeQuery to get all the documents which have amount between say 0 to 2. When i execute the query, Lucene gives me documents which have amount greater than 2 also. What am I missing here?

Here is my code:

Term lowerTerm = new Term("amount", minAmount);
Term upperTerm = new Term("amount", maxAmount);

RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true);

finalQuery.Add(amountQuery, BooleanClause.Occur.MUST);

and here is what goes into my index:

doc.Add(new Field("amount", amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES));
like image 955
user40907 Avatar asked Apr 02 '09 02:04

user40907


Video Answer


1 Answers

UPDATE: Like @basZero said in his comment, starting with Lucene 2.9, you can add numeric fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when you search.

Original answer

Lucene treats numbers as words, so their order is alphabetic:

0
1
12
123
2
22

That means that for Lucene, 12 is between 0 and 2. If you want to do a proper numerical range, you need to index the numbers zero-padded, then do a range search of [0000 TO 0002]. (The amount of padding you need depends on the expected range of values).

If you have negative numbers, just add another zero for non-negative numbers. (EDIT: WRONG WRONG WRONG. See update)

If your numbers include a fraction part, leave it as is, and zero-pad the integer part only.

Example:

-00002.12
-00001

000000
000001
000003.1415
000022

UPDATE: Negative numbers are a bit tricky, since -1 comes before -2 alphabetically. This article gives a complete explanation about dealing with negative numbers and numbers in general in Lucene. Basically, you have to "encode" numbers into something that makes the order of the items make sense.

like image 197
itsadok Avatar answered Sep 19 '22 18:09

itsadok