I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like: <pre class="prettyprint"><code>Document doc = new Document(); doc.add(new StringField("ticket_number", rs.getString("ticket_number"), Field.Store.YES)); doc.add(new IntField("ticket_id", rs.getInt("ticket_id"), Field.Store.YES)); doc.add(new StringField("id_s", rs.getString("ticket_id"), Field.Store.YES)); w.addDocument(doc); </code></pre> It seems I can't query the <code>ticket_id</code> field at all, while <code>id_s</code> works just fine. One of the documents is (I added whitespace for readability): <pre class="prettyprint"><code>Document< stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W> stored<ticket_id:152> stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>> </code></pre> So my int field is stored, but not indexed. This query works as expected: <code>id_s:152</code>, while this one never returns anything: <code>ticket_id:152</code>. What am I doing wrong? How can I add such a field to the index and make it searchable?

Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term Basically, you create a Term with your int value like this: <pre class="prettyprint"><code>String field = "myfield"; int value = 4711; BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT); NumericUtils.intToPrefixCoded(value, 0, bytes); Term term = new Term(field, bytes); </code></pre> Then you can use this term for searching, or deleting/updating your index. In a first test, this worked fine for me. I can't tell if this is the "right" way to do things however. I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead.

Numeric Fields can be queried with a NumericRangeQuery. For an exact match, simply set the max and min to equal values. Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. Considering that the field is transformed into Lucene's numeric representation, the literal value <code>152</code> will indeed not be indexed At a glance, however, it's possible that your handling of id_s may be the better alternative. IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. If you don't need numeric sorting or range querying on the field, indexing as a <code>StringField</code> certainly makes more sense.

How to search an int field in Lucene 4?

Tags:

java

lucene

I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like:

Document doc = new Document();
doc.add(new StringField("ticket_number", rs.getString("ticket_number"),
        Field.Store.YES));
doc.add(new IntField("ticket_id", rs.getInt("ticket_id"),
        Field.Store.YES));
doc.add(new StringField("id_s", rs.getString("ticket_id"),
        Field.Store.YES));
w.addDocument(doc);

It seems I can't query the ticket_id field at all, while id_s works just fine.

One of the documents is (I added whitespace for readability):

Document<
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W> 
    stored<ticket_id:152> 
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>>

So my int field is stored, but not indexed. This query works as expected: id_s:152, while this one never returns anything: ticket_id:152.

What am I doing wrong? How can I add such a field to the index and make it searchable?

539

asked Dec 28 '12 19:12

Konrad Garus

3 Answers

Below works for me:

    RAMDirectory idx = new RAMDirectory();
    IndexWriter writer = new IndexWriter(
            idx,
            new IndexWriterConfig(Version.LUCENE_40, new ClassicAnalyzer(Version.LUCENE_40))
    );
    Document document = new Document();
    document.add(new StringField("ticket_number", "t123", Field.Store.YES));
    document.add(new IntField("ticket_id", 234, Field.Store.YES));
    document.add(new StringField("id_s", "234", Field.Store.YES));
    writer.addDocument(document);
    writer.commit();

    IndexReader reader = DirectoryReader.open(idx);
    IndexSearcher searcher = new IndexSearcher(reader);

    Query q1 = new TermQuery(new Term("id_s", "234"));
    TopDocs td1 = searcher.search(q1, 1);
    System.out.println(td1.totalHits);  // prints "1"

    Query q2 = NumericRangeQuery.newIntRange("ticket_id", 1, 234, 234, true, true);
    TopDocs td2 = searcher.search(q2, 1);
    System.out.println(td2.totalHits);  // prints "1"

As femtoRgon pointed out, for numeric values (longs, dates, floats, etc.) you need to have NumericRangeQuery and specify precision. Otherwise Lucene has no idea how do you want to define similarity.

answered Oct 21 '22 07:10

mindas

Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term

Basically, you create a Term with your int value like this:

String field = "myfield";
int value = 4711;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(value, 0, bytes);
Term term = new Term(field, bytes);

Then you can use this term for searching, or deleting/updating your index. In a first test, this worked fine for me. I can't tell if this is the "right" way to do things however. I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead.

answered Oct 21 '22 07:10

D.Ogranos

Numeric Fields can be queried with a NumericRangeQuery. For an exact match, simply set the max and min to equal values.

Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. Considering that the field is transformed into Lucene's numeric representation, the literal value 152 will indeed not be indexed

At a glance, however, it's possible that your handling of id_s may be the better alternative. IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. If you don't need numeric sorting or range querying on the field, indexing as a StringField certainly makes more sense.

answered Oct 21 '22 08:10

femtoRgon

Related questions
                            
                                JPA: caching queries
                            
                                Where do I put weblogic-application.xml in my Maven 2 project?
                            
                                how to use chinese and japanese character as string in java?
                            
                                Stubbing defaults in Mockito
                            
                                If I type Ctrl-C on the command line, will the finally block in Java still execute?
                            
                                Why are mouseDragged-events not received when using MouseAdapter?
                            
                                In JSTL/JSP when do I have to use <c:out value="${myVar}"/> and when can I just say ${myVar}
                            
                                Java Regex: matches(pattern, value) returns true but group() fails to match
                            
                                Java generics - is it possible to restrict T to be Serializable?
                            
                                Java public interface and public class in same file
                            
                                In Java, is it safe to change a reference to a HashMap read concurrently
                            
                                How to reference constant in attribute in Spring
                            
                                Create a radial gradient programmatically
                            
                                Formatting a double and not rounding off
                            
                                Java indexOf function more efficient than Rabin-Karp? Search Efficiency of Text
                            
                                Two-dimensional array of different types
                            
                                How to create a X509 certificate using Java?
                            
                                Overloading method calls with parameter null [duplicate]
                            
                                convert a LongBuffer/IntBuffer/ShortBuffer to ByteBuffer
                            
                                Java: Wildcards again

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With