I'm building a Lucene Index and adding Documents.
I have a field that is multi-valued, for this example I'll use Categories.
An Item can have many categories, for example, Jeans can fall under Clothing, Pants, Men's, Women's, etc.
When adding the field to a document, do commas make a difference? Will Lucene simply ignore them? if I change commas to spaces will there be a difference? Does this automatically make the field multi-valued?
String categoriesForItem = getCategories(); // returns "category1, category2, cat3" from a DB call
categoriesForItem = categoriesForItem.replaceAll(",", " ").trim(); // not sure if to remove comma
doc.add(new StringField("categories", categoriesForItem , Field.Store.YES)); // doc is a Document
Am I doing this correctly? or is there another way to create multivalued fields?
Any help/advice is appreciated.
This would be a better way to index multiValued fields per document
String categoriesForItem = getCategories(); // get "category1, category2, cat3" from a DB call
String [] categoriesForItems = categoriesForItem.split(",");
for(String cat : categoriesForItems) {
doc.add(new StringField("categories", cat , Field.Store.YES)); // doc is a Document
}
Whenever multiple fields with the same name appear in one document, both the inverted index and term vectors will logically append the tokens of the field to one another, in the order the fields were added.
Also during the analysis phase two different values will be seperated by a position increment via setPositionIncrementGap() automatically. Let me explain why this is needed.
Your field "categories" in Document D1 has two values - "foo bar" and "foo baz" Now if you were to do a phrase query "bar foo" D1 should not come up. This is ensure by adding an extra increment between two values of the same field.
If you yourself concatenate the field values and rely on the analyzer to split it into multiple values "bar foo" would return D1 which would be incorrect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With