I'm trying to create a Lucene 4.10 index. I just want to save in the index the exact strings that I put into the document, witout tokenization.
I'm using the StandardAnalyzer.
Directory dir = FSDirectory.open(new File("myDire"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_4_10_0, analyzer);
iwc.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(dir, iwc);
StringField field1 = new StringField("1", content1, Store.YES);
StringField field2 = new StringField("2", content2, Store.YES);
StringField field3 = new StringField("3", content3, Store.YES);
doc.add(field1);
doc.add(field2);
doc.add(field3);
writer.addDocument(doc, analyzer);
writer.close();
If I print the index's content, I can see my data being stored, for example, my document has this "field 3":
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<3:"Fuel Tank Capacity"@en>
I'm trying to query the index in order to get it back:
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("3", analyzer);
String queryString = "\"\"Fuel Tank Capacity"\@en\"";
Query query = parser.createPhraseQuery("3", QueryParser.escape(queryString));
TopDocs docs = searcher.search(query, null, 20);
I'm trying to search the term "Fuel Tank Capacity"@en (quotation marks included) so I tried to escape them and I put another couple of quotes around the terms in order to let lucene understand that I'm searching for the entire texts.
If I print the query, I get: 3:"fuel tank capacity en" but I dont want to split the text on the @ symbol.
I think that my first problem is the StandardAnalyzer, because it seems to tokenize, if I'm not mistaken. However, I cannot understand how to query the index in order to get exactly "Fuel Tank Capacity"@en (quotation marks included).
Thank you
Howeer, Lucene’s patterns are always anchored. The pattern provided must match the entire string. For string abcde: Any Unicode characters may be used in the pattern, but certain characters are reserved and must be escaped.
Note that Lucene doesn't support using a * symbol as the first character of a search. Lucene supports finding words are a within a specific distance away. Search for "foo bar" within 4 words from each other. Note that for proximity searches, exact matches are proximity zero, and word transpositions (bar foo) are proximity 1.
However, Lucene syntax is not able to search nested objects or scripted fields. To perform a free text search, simply enter a text string. For example, if you’re searching web server logs, you could enter safari to search all fields:
The full Lucene syntax is used for query expressions passed in the search parameter of the Search Documents API, not to be confused with the OData syntax used for the $filter parameter of that API. These different syntaxes have their own rules for constructing queries, escaping strings, and so on.
You could simplify matters, and just cut the QueryParser
out of the equation entirely. Since you are using a StringField
, the whole content of the field is a single term, so a simple TermQuery
should work well:
Query query = new TermQuery(new Term("3","\"Fuel Tank Capacity\"@en"));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With