As phrased in the question, I'm looking for a free and/or open-source text-segmentation algorithm for Chinese, I do understand it is a very difficult task to solve, as there are many ambiguities involed. I know there's google's API, but well it is rather a black-box, i.e. not many information of what it is doing are passing through.
The keyword text-segmentation for Chinese
should be 中文分词
in Chinese.
Good and active open-source text-segmentation algorithm :
C#
, Snapshot
Java
C/C++, Java, C#
, Demo
C, PHP, PostgreSQL
ICTCLAS
, Demo
Java
Java
, Demo
Python, Java
, Demo
python
Other
Sample
Google Chrome (Chromium) : src
, cc_cedict.txt (73,145 Chinese words/pharases)
In text field
or textarea
of Google Chrome with Chinese sentences, press
Ctrl+← or Ctrl+→
Double click
on 中文分词指的是将一个汉字序列切分成一个一个单独的词
Stanford segment using CRF algorithmn.
It's under GPL
link page is : http://nlp.stanford.edu/software/segmenter.shtml
ICU has details on universal text segmentation - http://userguide.icu-project.org/boundaryanalysis
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With