Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search an int field in Lucene 4?

Tags:

java

lucene

I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like:

Document doc = new Document();
doc.add(new StringField("ticket_number", rs.getString("ticket_number"),
        Field.Store.YES));
doc.add(new IntField("ticket_id", rs.getInt("ticket_id"),
        Field.Store.YES));
doc.add(new StringField("id_s", rs.getString("ticket_id"),
        Field.Store.YES));
w.addDocument(doc);

It seems I can't query the ticket_id field at all, while id_s works just fine.

One of the documents is (I added whitespace for readability):

Document<
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W> 
    stored<ticket_id:152> 
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>>

So my int field is stored, but not indexed. This query works as expected: id_s:152, while this one never returns anything: ticket_id:152.

What am I doing wrong? How can I add such a field to the index and make it searchable?

like image 539
Konrad Garus Avatar asked Dec 28 '12 19:12

Konrad Garus


People also ask

How do you search in Lucene?

Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries). To perform a single character wildcard search use the "?" symbol. To perform a multiple character wildcard search use the "*" symbol. You can also use the wildcard searches in the middle of a term.

How does Lucene index search work?

Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.

What is Lucene full-text search?

Apache Lucene™ is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search across high-dimensionality vectors, spell correction or query suggestions.

What is Lucene and how does it work?

Lucene is a full-text search library in Java which makes it easy to add search functionality to an application or website. It does so by adding content to a full-text index.


3 Answers

Below works for me:

    RAMDirectory idx = new RAMDirectory();
    IndexWriter writer = new IndexWriter(
            idx,
            new IndexWriterConfig(Version.LUCENE_40, new ClassicAnalyzer(Version.LUCENE_40))
    );
    Document document = new Document();
    document.add(new StringField("ticket_number", "t123", Field.Store.YES));
    document.add(new IntField("ticket_id", 234, Field.Store.YES));
    document.add(new StringField("id_s", "234", Field.Store.YES));
    writer.addDocument(document);
    writer.commit();

    IndexReader reader = DirectoryReader.open(idx);
    IndexSearcher searcher = new IndexSearcher(reader);

    Query q1 = new TermQuery(new Term("id_s", "234"));
    TopDocs td1 = searcher.search(q1, 1);
    System.out.println(td1.totalHits);  // prints "1"

    Query q2 = NumericRangeQuery.newIntRange("ticket_id", 1, 234, 234, true, true);
    TopDocs td2 = searcher.search(q2, 1);
    System.out.println(td2.totalHits);  // prints "1"

As femtoRgon pointed out, for numeric values (longs, dates, floats, etc.) you need to have NumericRangeQuery and specify precision. Otherwise Lucene has no idea how do you want to define similarity.

like image 93
mindas Avatar answered Oct 21 '22 07:10

mindas


Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term

Basically, you create a Term with your int value like this:

String field = "myfield";
int value = 4711;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(value, 0, bytes);
Term term = new Term(field, bytes);

Then you can use this term for searching, or deleting/updating your index. In a first test, this worked fine for me. I can't tell if this is the "right" way to do things however. I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead.

like image 43
D.Ogranos Avatar answered Oct 21 '22 07:10

D.Ogranos


Numeric Fields can be queried with a NumericRangeQuery. For an exact match, simply set the max and min to equal values.

Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. Considering that the field is transformed into Lucene's numeric representation, the literal value 152 will indeed not be indexed

At a glance, however, it's possible that your handling of id_s may be the better alternative. IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. If you don't need numeric sorting or range querying on the field, indexing as a StringField certainly makes more sense.

like image 7
femtoRgon Avatar answered Oct 21 '22 08:10

femtoRgon