Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List of Unicode alphabetic characters

I need the list of ranges of Unicode characters with the property Alphabetic as defined in http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Alphabetic. However, I cannot find them in the Unicode Character Database no matter how I search for them. Can somebody provide a list of them or just a search facility for characters with specified Unicode properties?

like image 441
thSoft Avatar asked Jan 30 '11 14:01

thSoft


People also ask

What is a Unicode alphabetic character?

The alphabetic characters are those UNICODE characters which are defined as letters by the UNICODE standard, e.g., the ASCII characters. ABCDEFGHIJKLMNOPQRSTUVWXYZ. abcdefghijklmnopqrstuvwxyz.

What characters are Unicode?

Unicode Basics Unicode provides a unique number for every character including punctuation marks, mathematical symbols, technical symbols, arrows, and characters making up non-Latin alphabets such as Thai, Chinese, or Arabic script.

How many Unicode letters are there?

Q: How many characters are in Unicode? The short answer is that as of Version 14.0, the Unicode Standard contains 144,697 characters.


4 Answers

The Unicode Character Database comprises all the text files in the distribution. It is not just a single file as it once was long ago.

The Alphabetic property is a derived property.

You really do not want to use code point ranges for this. You want to use the property properly. That’s because there are just too many of them. Using the unichars script, we learn that there are more than ten thousand just in the Basic Multilingual Plane alone not counting Han or Hangul:

$ unichars '\p{Alphabetic}' | wc -l
   10052

If we include the other 16 astral planes, now we’re at fourteen thousand:

$ unichars -a '\p{Alphabetic}' | wc -l
   14736

And if we include Han and Hangul, which in fact the Alphabetic property does, we just blew the roof off of a hundred thousands code points:

$ unichars -ua '\p{Alphabetic}' | wc -l
  101539

I hope you can see that you do not want to specifically enumerate these using code point ranges. Down that road lies madness.

By the way, if you find the unichars script useful, you might also like the uniprops script and perhaps the uninames script.

like image 66
tchrist Avatar answered Oct 21 '22 03:10

tchrist


Derived Core Properties can be calculated from the other properties.

The Alphabetic property is defined as: Generated from: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

So, if you take all the characters in Lu, Ll, Lt, Lm, Lo, Nl, and all the characters with the Other_Alphabetic property, you will have the Alphabetic characters.

like image 30
Avi Avatar answered Oct 21 '22 01:10

Avi


Citation from your source: Generated from: Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

These Abbrevations seem to be explained here.

like image 21
flying sheep Avatar answered Oct 21 '22 01:10

flying sheep


I found the UniView web application which provides a nice search interface. Searching for the Letter property (with Local unchecked) gives 14723 results...

like image 30
thSoft Avatar answered Oct 21 '22 03:10

thSoft