I require POS tagging for my files in the corpus.
I have successfully followed the installation instructions of SharpNlp
I am using the binary version
I created a new c# project in: E:\sharp\sharpapp
location of Models Folder is: E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is: E:\sharp\SharpNLP-1.0.2529-Bin
I have also followed the instructions to modify both .config files "ParseTree.Exe" and "ToolsExamples.Exe"
Now in my c# project I have a class called tagging.cs where I have to access my corpus text files and do POS tagging for those files. Can anybody help me how can I make use of SharpNlp to do so
Please provide steps to do so.
In a nutshell, SharpNLP is
It should be noted that the port of the OpenNLP libraries is relatively informal, with various class and property name changes, possibly loose preservation of features and semantics and no apparent connection with the original Java projects' lifecycle. This situation will likely ensure that in time the OpenNLP portion of SharpNLP will be more akin to distant cousins than twin sisters...
Never the less, it is possible to use examples and documentation from OpenNLP to complement the relatively thin support material available with SharpNLP. Between the source code of SharpNLP and resources like the OpenNLP API reference and the OpenNLP wiki, one can generally map things and adapt accordingly.
A loose conductor could be the study of this particular source file which makes use of OpenNLP in a way that seems close to what you may need. Note the name changes between OpenNLP and SharpNLP, for example POSTTaggerME class becomes MaximumEntropyPosTagger and the Parse() method and its overload turn to TagSentence() and such.
A more general hint is to understand...
...the sequence of steps typically necessary to perform POS Tagging.
This is a very high-level approximate description but, I think, useful.
Note how the above sequence assumes that the model is readily available.
The model is a representation of the statistical "profile" of text in general, obtained from training the Tagger with a set of text readily tagged.
SharpNLP comes with a model for generic English language, but in order to tag other languages or if the specific corpora to be tagged belongs to a particular domain (say medical reports or Tweets or...) it may be preferable to re-train the tagger to improve its precision.
Open/SharpNLP as most POS Taggers, whether stand-alone or their API, typically include features to train them (= to produce a model given a sample set of text readily tagged) and also to verify the quality of the model/tagger so produced (= to compare the tags produced on a test set, with the tags expected for this set).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With