Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a Lucene.net Custom Analyzer

I am trying to create a custom analyzer in Lucene.net 4.8 - however I am running into an error I can't fathom.

My analyzer code :

public class SynonymAnalyzer : Analyzer  
{

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
{
    String base1 = "lawnmower";
    String syn1 = "lawn mower";
    String base2 = "spanner";
    String syn2 = "wrench";

    SynonymMap.Builder sb = new SynonymMap.Builder(true);
    sb.Add(new CharsRef(base1), new CharsRef(syn1), true);
    sb.Add(new CharsRef(base2), new CharsRef(syn2), true);
    SynonymMap smap = sb.Build();

    Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_48, reader);

    TokenStream result = new StandardTokenizer(Version.LUCENE_48, reader);
    result = new SynonymFilter(result, smap, true);
    return new TokenStreamComponents(tokenizer, result);
}
}

My code to build the index is :

var fordFiesta = new Document();
fordFiesta.Add(new StringField("Id", "1", Field.Store.YES));
fordFiesta.Add(new TextField("Make", "Ford", Field.Store.YES));
fordFiesta.Add(new TextField("Model", "Fiesta 1.0 Developing", Field.Store.YES));
fordFiesta.Add(new TextField("FullText", "lawnmower Ford 1.0 Fiesta Developing spanner", Field.Store.YES));

Lucene.Net.Store.Directory directory = FSDirectory.Open(new DirectoryInfo(Environment.CurrentDirectory + "\\LuceneIndex"));

SynonymAnalyzer analyzer = new SynonymAnalyzer();

var config = new IndexWriterConfig(Version.LUCENE_48, analyzer);
var writer = new IndexWriter(directory, config);

writer.UpdateDocument(new Term("Id", "1"), fordFiesta);

writer.Flush(true, true);
writer.Commit();
writer.Dispose();

However when I run my code it fails at the writer.UpdateDocument line with the following error :

TokenStream contract violation: Reset()/Dispose() call missing, Reset() called multiple times, or subclass does not call base.Reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

I can't figure out where I am going wrong?!

like image 935
chilluk Avatar asked Oct 29 '22 23:10

chilluk


1 Answers

The problem is that your TokenStreamComponents is constructed with a different Tokenizer than the one used in the result TokenStream. Changing it to this should fix the issue:

Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_48, reader);
TokenStream result = new SynonymFilter(tokenizer, smap, true);
return new TokenStreamComponents(tokenizer, result);
like image 139
femtoRgon Avatar answered Nov 15 '22 06:11

femtoRgon