Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use TermVector Lucene 4.0

In the indexing method I use the following line:

Field contentsField = new Field("contents", new FileReader(f), Field.TermVector.YES);

However, in Lucene 4.0 this constructor is deprecated and new TextField should be used instead of new Field.

But the problem with TextField is that it don't accept TermVector in its constructors.

Is there a way to include the Term Vector in my indexing in Lucene 4.0 with the new constructors?

Thanks

like image 348
user692704 Avatar asked Aug 14 '12 04:08

user692704


2 Answers

I had the same problem, so I just simply created my own Field:

public class VecTextField extends Field {

/* Indexed, tokenized, not stored. */
public static final FieldType TYPE_NOT_STORED = new FieldType();

/* Indexed, tokenized, stored. */
public static final FieldType TYPE_STORED = new FieldType();

static {
    TYPE_NOT_STORED.setIndexed(true);
    TYPE_NOT_STORED.setTokenized(true);
    TYPE_NOT_STORED.setStoreTermVectors(true);
    TYPE_NOT_STORED.setStoreTermVectorPositions(true);
    TYPE_NOT_STORED.freeze();

    TYPE_STORED.setIndexed(true);
    TYPE_STORED.setTokenized(true);
    TYPE_STORED.setStored(true);
    TYPE_STORED.setStoreTermVectors(true);
    TYPE_STORED.setStoreTermVectorPositions(true);
    TYPE_STORED.freeze();
}

// TODO: add sugar for term vectors...?

/** Creates a new TextField with Reader value. */
public VecTextField(String name, Reader reader, Store store) {
    super(name, reader, store == Store.YES ? TYPE_STORED : TYPE_NOT_STORED);
}

/** Creates a new TextField with String value. */
public VecTextField(String name, String value, Store store) {
    super(name, value, store == Store.YES ? TYPE_STORED : TYPE_NOT_STORED);
}

/** Creates a new un-stored TextField with TokenStream value. */
public VecTextField(String name, TokenStream stream) {
    super(name, stream, TYPE_NOT_STORED);
}

}

Hope this helps

like image 101
amas Avatar answered Oct 18 '22 20:10

amas


TextField is a convenience class for users who need indexed fields without term vectors. If you need terms vectors, just use a Field. It takes a few more lines of code since you need to create an instance of FieldType first, set storeTermVectors and tokenizer to true and then use this FieldType instance in Field constructor.

like image 26
jpountz Avatar answered Oct 18 '22 20:10

jpountz