Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you index and searchnumbers in Lucene 4.1

In my 3.6 code I was adding numeric field to my index as follows:

public void addNumericField(IndexField field, Integer value) {
        addField(field, NumericUtils.intToPrefixCoded(value));
    }

however now you need to pass it a BytesRef argument, and its totally unclear what you are meant to do with the value next so instead I've changed it to (work in progress)

public void addNumericField(IndexField field, Integer value) {
        FieldType ft = new FieldType();
        ft.setStored(true);
        ft.setIndexed(true);
        ft.setNumericType(FieldType.NumericType.INT);
        doc.add(new IntField(field.getName(), value, ft));
    }

which seemed neater

In 3.6 I also add to override queryparser to make it work for numeric range searches,

package org.musicbrainz.search.servlet;

import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TermRangeQuery;
import org.apache.lucene.util.NumericUtils;
import org.musicbrainz.search.LuceneVersion;
import org.musicbrainz.search.index.LabelIndexField;
import org.musicbrainz.search.servlet.mmd1.LabelType;

public class LabelQueryParser extends MultiFieldQueryParser {

    public LabelQueryParser(java.lang.String[] strings, org.apache.lucene.analysis.Analyzer analyzer)
    {
        super(LuceneVersion.LUCENE_VERSION, strings, analyzer);
    }

     protected Query newTermQuery(Term term) {

        if(
                (term.field() == LabelIndexField.CODE.getName())
                ){
            try {
                int number = Integer.parseInt(term.text());
                TermQuery tq = new TermQuery(new Term(term.field(), NumericUtils.intToPrefixCoded(number)));
                return tq;
            }
            catch (NumberFormatException nfe) {
                //If not provided numeric argument just leave as is, 
                //won't give matches
                return super.newTermQuery(term);
            }
        } else {
            return super.newTermQuery(term);

        }
    }

    /**
     *
     * Convert Numeric Fields
     *
     * @param field
     * @param part1
     * @param part2
     * @param inclusive
     * @return
     */
    @Override
    public Query newRangeQuery(String field,
                               String part1,
                               String part2,
                               boolean inclusive) {

        if (
                (field.equals(LabelIndexField.CODE.getName()))
            )
        {
            part1 = NumericUtils.intToPrefixCoded(Integer.parseInt(part1));
            part2 = NumericUtils.intToPrefixCoded(Integer.parseInt(part2));
        }
        TermRangeQuery query = (TermRangeQuery)
                super.newRangeQuery(field, part1, part2,inclusive);
        return query;
    }

}

So I took all this out figuring I didnt need it anymore, but unfortunately no queries on this IntField are now working.

Reading further it seems Intfields are only used for range queries so I don't know how you are meant to just do match queries, and whether the NumericRangeQuery is comptable with the classic Query Parser which I am using.

So I then went back to trying to add my numeric values as encoded string

public void addNumericField(IndexField field, Integer value) {

    FieldType fieldType = new FieldType();
    fieldType.setStored(true);
    fieldType.setIndexed(true);
    BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
    NumericUtils.intToPrefixCoded(value, 0, bytes);
    doc.add(new Field(field.getName(),bytes, fieldType));
}

But at runtime I'm now getting error !

java.lang.IllegalArgumentException: Fields with BytesRef values cannot be indexed

But I need to index field, so please how can I index numeric fields like I do in 3.6 so I can search them.

like image 694
Paul Taylor Avatar asked Oct 22 '22 16:10

Paul Taylor


2 Answers

Just a heads up, on how to do it using lucene 4.7:

When indexing I just do as folows:

document.add(new IntField("int_field", int_value, Field.Store.YES));

And for searching:

public class MyQueryParser extends QueryParser {

public MyQueryParser(Version matchVersion, String field, Analyzer anlayzer) {
    super(matchVersion, field, anlayzer);
}

@Override
protected Query getRangeQuery(String field, String part1, String part2, boolean startInclusive, boolean endInclusive) throws ParseException {
    if ("int_field".equals(field)) {
        return NumericRangeQuery.newIntRange(field, Integer.parseInt(part1), Integer.parseInt(part2), startInclusive, endInclusive);
    } else {
        return super.getRangeQuery(field, part1, part2, startInclusive, endInclusive);
    }
}

@Override
protected Query newTermQuery(Term term)
{
    if ("int_field".equals(term.field())) {
        try {
            int number = Integer.parseInt(term.text());
            BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
            NumericUtils.intToPrefixCoded(number, 0, bytes);
            TermQuery tq = new TermQuery(new Term(term.field(), bytes.utf8ToString()));
            return tq;
        } catch (NumberFormatException nfe) {
            //If not provided numeric argument just leave as is, won't give matches
            return super.newTermQuery(term);
        }
    } else {
        return super.newTermQuery(term);
    }
}

}

By doing like this, querys like

 int_field: 1
 int_field: [1 TO 5]

work as expected.

like image 82
Jonas Arêas Avatar answered Oct 31 '22 20:10

Jonas Arêas


Just use the appropriate field. For example IntField, LongField, etc.

See for example http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/document/IntField.html

For querying these fields, see Lucene LongField exact search with Query

like image 27
Rob Audenaerde Avatar answered Oct 31 '22 19:10

Rob Audenaerde