Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make use of USE SharpNlp in my C# application

Tags:

c#

nlp

I require POS tagging for my files in the corpus. I have successfully followed the installation instructions of SharpNlp
I am using the binary version

I created a new c# project in:       E:\sharp\sharpapp
location of Models Folder is:        E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is:   E:\sharp\SharpNLP-1.0.2529-Bin

I have also followed the instructions to modify both .config files "ParseTree.Exe" and "ToolsExamples.Exe"

Now in my c# project I have a class called tagging.cs where I have to access my corpus text files and do POS tagging for those files. Can anybody help me how can I make use of SharpNlp to do so

Please provide steps to do so.

like image 913
amey kerkar Avatar asked Dec 21 '22 21:12

amey kerkar


1 Answers

In a nutshell, SharpNLP is

  • a port to C# of OpenNLP Tools and OpenNLP MaxEnt
  • a connector to WordNet
  • a set of pre-computed models, mostly for the English language
  • utility modules such as integration with SQLLite

It should be noted that the port of the OpenNLP libraries is relatively informal, with various class and property name changes, possibly loose preservation of features and semantics and no apparent connection with the original Java projects' lifecycle. This situation will likely ensure that in time the OpenNLP portion of SharpNLP will be more akin to distant cousins than twin sisters...

Never the less, it is possible to use examples and documentation from OpenNLP to complement the relatively thin support material available with SharpNLP. Between the source code of SharpNLP and resources like the OpenNLP API reference and the OpenNLP wiki, one can generally map things and adapt accordingly.

A loose conductor could be the study of this particular source file which makes use of OpenNLP in a way that seems close to what you may need. Note the name changes between OpenNLP and SharpNLP, for example POSTTaggerME class becomes MaximumEntropyPosTagger and the Parse() method and its overload turn to TagSentence() and such.

A more general hint is to understand...
...the sequence of steps typically necessary to perform POS Tagging.
This is a very high-level approximate description but, I think, useful.

  • get the text to be tagged = string(s) of text
  • Initialize a text parser
  • parse it = an "array" (or other container) with individual tokens i.e. words and punctuation characters.
  • initialize the POS Tagger, in particular tell its which model it should use
  • feed the [ordered] sequence of tokens to the POS Tagger
  • Ta dah! Use the POS tags for the eventual purpose of your NLP application.

Note how the above sequence assumes that the model is readily available.
The model is a representation of the statistical "profile" of text in general, obtained from training the Tagger with a set of text readily tagged.
SharpNLP comes with a model for generic English language, but in order to tag other languages or if the specific corpora to be tagged belongs to a particular domain (say medical reports or Tweets or...) it may be preferable to re-train the tagger to improve its precision.
Open/SharpNLP as most POS Taggers, whether stand-alone or their API, typically include features to train them (= to produce a model given a sample set of text readily tagged) and also to verify the quality of the model/tagger so produced (= to compare the tags produced on a test set, with the tags expected for this set).

like image 129
mjv Avatar answered Dec 24 '22 09:12

mjv