Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort Chinese strings by stroke in Java?

Is there any library to sort chinese strings by stroke in Java?

like image 965
bydsky Avatar asked Jan 12 '12 10:01

bydsky


2 Answers

Try java.text.Collator for chinese Locale.

like image 153
RokL Avatar answered Oct 24 '22 06:10

RokL


If you want to roll the code yourself, one source for the data is the Unihan database's Radical-Stroke Counts fields, from the Unicode Consortium. The link is to the section of Technical Report 38, describing those fields.

Note that the stroke count of an ideographic character is based on the structure (or morphology) of the character as displayed, i.e. its glyph. The glyph's morphology is a function of the font design style — especially whether the font follows traditional Chinese, simplified Chinese, or Japanese conventions. But character codes in Java are usually based on the Unicode standard, which unifies characters from all these conventions under a single character code.

So, you will need external information to tell you which convention your text is using. This in turn tells you which field of the Unihan database to use. If you know that your Chinese text strings are all simplified, or all traditional Chinese, then you have enough information.

Also check out the Chinese Character Web API, which serves up data from the Unihan database.

like image 38
Jim DeLaHunt Avatar answered Oct 24 '22 06:10

Jim DeLaHunt