Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically determine number of strokes in a Chinese character?

Does Unicode store stroke count information about Chinese, Japanese, or other stroke-based characters?

like image 638
xkdkxdxc Avatar asked Mar 07 '10 22:03

xkdkxdxc


People also ask

How many strokes are there in Chinese?

The Eight Basic Strokes Some systems find up to 37 different strokes, but many of these are variations. The Chinese character 永 (yǒng), meaning "forever" or "permanence is often used to illustrate the 8 basic strokes of Chinese characters.

What encoding to use for Chinese characters?

English and the other Latin languages use ASCII encoding; Simplified Chinese uses GB2312 encoding, Traditional Chinese uses Big 5 encoding, and so forth. In other words, a computer using Big 5 encoding cannot read computer code in GB2312 or ASCII encoding.

What is the stroke order for Chinese?

Horizontal strokes go from left to right. If you have two horizontal strokes, then the top one comes first. This can be seen in characters like 二 or 首. If the character has two or three components, like 谢, then start with the component furthest to the left, then the middle one, then the right one.


5 Answers

If you want to do character recognition goggle HanziDict.

Also take a look at the Unihan data site:

http://www.unicode.org/charts/unihanrsindex.html

You can look up stroke count and then get character info. You might be able to build your own look up.

like image 96
Joe Pitz Avatar answered Oct 20 '22 20:10

Joe Pitz


A little googling came up with Unihan.zip, a file published by the Unicode Consortium which contains several text files including Unihan_RadicalStrokeCounts.txt which may be what you want. There is also an online Unihan Database Lookup based on this data.

like image 39
Tim Avatar answered Oct 20 '22 21:10

Tim


In Python there is a library for that:

>>> from cjklib.characterlookup import CharacterLookup
>>> cjk = CharacterLookup('C')
>>> cjk.getStrokeCount(u'日')
4

Disclaimer: I wrote it

like image 43
cburgmer Avatar answered Oct 20 '22 21:10

cburgmer


You mean, is it encoded somehow in the actual code point? No. There may well be a table somewhere you can find on the net (or create one) but it's not part of the Unicode mandate to store this sort of metadata.

like image 39
paxdiablo Avatar answered Oct 20 '22 21:10

paxdiablo


UILocalizedIndexedCollation can be a total solution.

https://developer.apple.com/library/ios/documentation/iPhone/Reference/UILocalizedIndexedCollation_Class/UILocalizedIndexedCollation.html

like image 25
Jerry Juang Avatar answered Oct 20 '22 19:10

Jerry Juang