Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sharpNLP as .nbin file extension [closed]

Tags:

c#

nlp

I've downloaded SharpNLP from this site http://sharpnlp.codeplex.com/ but it downloaded .nbin file, which i don't know how to deal with. Any help pleeeeeeeease?

like image 719
Alaa' Avatar asked Jul 01 '12 01:07

Alaa'


1 Answers

I too was a user like you. But with a bit of struggle, I have found a few ways to use the Nbin file. As stated Nbin files are trained models. We can create Nbin file using the BinaryGisModelWriter. However just like me, I believe you too aren't interested to create your own model, but to use the nbin files effectively in your project.

For that you need two dlls.

SharpEntropy.dll OpenNLP.dll

Apart from this for a quick start, you can download the Sample Project from Code Project for SharpNLP

It is better to download the .NET 2.0 version of the sample

Inside that you will have a project named OpenNLP. Add that project to any of the project that you wish to make use of NLP or the nbin files and add a reference from your solution to the "OpenNLP" project.

Now from your main solution, you can initialize different tools like, for example I will show you the initialization of a sentence detector, tokenizer and a PosTagger

 private string mModelPath = @"C:\Users\ATS\Documents\Visual Studio 2012\Projects\Google_page_speed_json\Google_page_speed_json\bin\Release\";
 private OpenNLP.Tools.SentenceDetect.MaximumEntropySentenceDetector mSentenceDetector;
 private OpenNLP.Tools.Tokenize.EnglishMaximumEntropyTokenizer mTokenizer;
 private OpenNLP.Tools.PosTagger.EnglishMaximumEntropyPosTagger mPosTagger;

The mModelPath is the variable to hold the path of the nbin files that you wish to make use of.

Now I will show you how to use the nbin files using the constructor of the above defined classes.

For Sentence Detector

private string[] SplitSentences(string paragraph)
    {
        if (mSentenceDetector == null)
        {
            mSentenceDetector = new OpenNLP.Tools.SentenceDetect.EnglishMaximumEntropySentenceDetector(mModelPath + "EnglishSD.nbin");
        }

        return mSentenceDetector.SentenceDetect(paragraph);
    }

For Tokenizer

private string[] TokenizeSentence(string sentence)
    {
        if (mTokenizer == null)
        {
            mTokenizer = new OpenNLP.Tools.Tokenize.EnglishMaximumEntropyTokenizer(mModelPath + "EnglishTok.nbin");
        }

        return mTokenizer.Tokenize(sentence);
    }

And for POSTagger

private string[] PosTagTokens(string[] tokens)
    {
        if (mPosTagger == null)
        {
            mPosTagger = new OpenNLP.Tools.PosTagger.EnglishMaximumEntropyPosTagger(mModelPath + "EnglishPOS.nbin", mModelPath + @"\Parser\tagdict");
        }

        return mPosTagger.Tag(tokens);
    }

You can see that I have used the EnglishSD.nbin, EnglishTok.nbin and EnglishPOS.nbin for Sentence Detection, Tokenizing and POS Tagging respectively. The nbin files are just pre-trained models that can be used using SharpNLP or OpenNLP in general.

You can find the latest set of trained models from The official OpenNLP Tool Models or From the Codeplex repository of Nbin files for use with SharpNLP

A sample POS tagger using the above methods and Nbin files will be as follows,

public void POSTagger_Method(string sent)
    {
        File.WriteAllText("POSTagged.txt", sent+"\n\n");
        string[] split_sentences = SplitSentences(sent);
        foreach (string sentence in split_sentences)
        {
            File.AppendAllText("POSTagged.txt", sentence+"\n");
            string[] tokens = TokenizeSentence(sentence);
            string[] tags = PosTagTokens(tokens);

            for (int currentTag = 0; currentTag < tags.Length; currentTag++)
            {
                File.AppendAllText("POSTagged.txt", tokens[currentTag] + " - " + tags[currentTag]+"\n");
            }
            File.AppendAllText("POSTagged.txt", "\n\n");
        }
    }

You can write similar methods for chunking, parsing etc., by making use of the available Nbin files, or you can train one of your own.

Though I haven't trained a model on my own, the syntax for the training a model from a neatly formed training text file is

System.IO.StreamReader trainingStreamReader = new System.IO.StreamReader(trainingDataFile);
SharpEntropy.ITrainingEventReader eventReader = new SharpEntropy.BasicEventReader(new SharpEntropy.PlainTextByLineDataReader(trainingStreamReader));
SharpEntropy.GisTrainer trainer = new SharpEntropy.GisTrainer();
trainer.TrainModel(eventReader);
mModel = new SharpEntropy.GisModel(trainer);

I believe that this post will help you start your way with SharpNLP. Please feel to discuss any issues that you face. I will be happy to reply.

like image 101
Arun Thundyill Saseendran Avatar answered Oct 26 '22 12:10

Arun Thundyill Saseendran