Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where can I get a list of Unicode chars by class?

I'm new to learning Unicode, and not sure how much I have to learn based on my ASCII background, but I'm reading the C# spec on rules for identifiers to determine what chars are permitted within Azure Table (which is directly based on the C# spec).

Where can I find a list of Unicode characters that fall into these categories:

  • letter-character: A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
  • combining-character: A Unicode character of classes Mn or Mc
  • decimal-digit-character: A Unicode character of the class Nd
  • connecting-character: A Unicode character of the class Pc
  • formatting-character: A Unicode character of the class Cf
like image 476
makerofthings7 Avatar asked Sep 18 '10 16:09

makerofthings7


People also ask

How do I find Unicode characters?

To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X. For more Unicode character codes, see Unicode character code charts by script.

How many Unicode chars are there?

Q: How many characters are in Unicode? The short answer is that as of Version 14.0, the Unicode Standard contains 144,697 characters.

Does Z have a Unicode value?

Unicode Character “Z” (U+005A)

How many special characters are there?

There are 33 characters classified as ASCII Punctuation & Symbols are also sometimes referred to as ASCII special characters.


2 Answers

You can retrieve this information in an automated fashion from the official Unicode data file, UnicodeData.txt, which is published here:

  • UnicodeData.txt (at unicode.org)

This is a file with semicolon-separated values in each line. The third column tells you the character class of each character.

The benefit of this is that you can get the character name for each character, so you have a better idea of what it is than by just looking at the character itself (e.g. would you know what ბ is? That’s right, it’s Ban. In Georgian. :-))

like image 104
Timwi Avatar answered Oct 02 '22 17:10

Timwi


FileFormat.info has a list of Unicode characters by category:

http://www.fileformat.info/info/unicode/category/index.htm

like image 41
Phil Ross Avatar answered Oct 02 '22 16:10

Phil Ross