Is there any library to sort chinese strings by stroke in Java?
Try java.text.Collator
for chinese Locale
.
If you want to roll the code yourself, one source for the data is the Unihan database's Radical-Stroke Counts fields, from the Unicode Consortium. The link is to the section of Technical Report 38, describing those fields.
Note that the stroke count of an ideographic character is based on the structure (or morphology) of the character as displayed, i.e. its glyph. The glyph's morphology is a function of the font design style — especially whether the font follows traditional Chinese, simplified Chinese, or Japanese conventions. But character codes in Java are usually based on the Unicode standard, which unifies characters from all these conventions under a single character code.
So, you will need external information to tell you which convention your text is using. This in turn tells you which field of the Unihan database to use. If you know that your Chinese text strings are all simplified, or all traditional Chinese, then you have enough information.
Also check out the Chinese Character Web API, which serves up data from the Unihan database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With