Stanford POS Tagger not tagging Chinese text

Question

I'm using Stanford POS Tagger (for the first time) and while it tags English correctly, it does not seem to recognize (Simplified) Chinese even when changing the model parameter. Have I overlooked something?

I've downloaded and unpacked the latest full version from here: http://nlp.stanford.edu/software/tagger.shtml

Then I've inputed sample text into the "sample-input.txt".

这是一个测试的句子。这是另一个句子。

Then I simply run

./stanford-postagger.sh models/chinese-distsim.tagger sample-input.txt

The expected output is to tag each of the words with a part of speech, but instead it recognizes the entire string of text as one word:

Loading default properties from tagger models/chinese-distsim.tagger

Reading POS tagger model from models/chinese-distsim.tagger ... done [3.5 sec].

這是一個測試的句子。這是另一個句子。#NR

Tagged 1 words at 30.30 words per second.

I appreciate any help.

Ryan Rapp · Accepted Answer

I finally realized that tokenization/segmentation is not included in this pos tagger. It appears the words must be space delimited before feeding them to the tagger. For those interested in maximum entropy word segmentation of Chinese, there is a separate package available here:

http://nlp.stanford.edu/software/segmenter.shtml

Thanks everyone.

http://nlp.stanford.edu/software/segmenter.shtml

Thanks everyone.

Stanford POS Tagger not tagging Chinese text

Tags:

linux

nlp

stanford-nlp

pos-tagger

Ryan Rapp

1 Answers

Ryan Rapp

Recent Activity

Donate For Us

Stanford POS Tagger not tagging Chinese text

Tags:

linux

nlp

stanford-nlp

pos-tagger

Ryan Rapp

1 Answers

Ryan Rapp

Related questions

Recent Activity

Donate For Us