Is there any way to use the Standford Tagger in a more performant fashion?
Each call to NLTK's wrapper starts a new java instance per analyzed string which is very very slow especially when a larger foreign language model is used...
http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford
Using nltk.tag.stanford.POSTagger.tag_sents()
for tagging multiple sentences.
The tag_sents
has replaced the old batch_tag
function, see https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L61
DEPRECATED:
Tag the sentences using batch_tag
instead of tag
, see http://www.nltk.org/_modules/nltk/tag/stanford.html#StanfordTagger.batch_tag
Found the solution. It is possible to run the POS Tagger in servlet mode and then connect to it via HTTP. Perfect.
http://nlp.stanford.edu/software/pos-tagger-faq.shtml#d
example
start server in background
nohup java -mx1000m -cp /var/stanford-postagger-full-2014-01-04/stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -model /var/stanford-postagger-full-2014-01-04/models/german-dewac.tagger -port 2020 >& /dev/null &
adjust firewall to limit access to port 2020 from localhost only
iptables -A INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -A INPUT -p tcp --dport 2020 -j DROP
test it with wget
wget http://localhost:2020/?die welt ist schön
shutdown server
pkill -f stanford
restore iptable settings
iptables -D INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -D INPUT -p tcp --dport 2020 -j DROP
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With