Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the range of Unicode Printable Characters?

Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is \u0020 - \u007f]

like image 832
Anindya Chatterjee Avatar asked Sep 22 '10 14:09

Anindya Chatterjee


People also ask

What is the range of Unicode?

A single Unicode character code point, for example U+26 . A range of Unicode code points. So for example, U+0025-00FF means include all characters in the range U+0025 to U+00FF . A range of Unicode code points containing wildcard characters, that is using the '?'

How many printable Unicode characters are there?

As of Unicode version 14.0, there are 144,697 characters with code points, covering 159 modern and historical scripts, as well as multiple symbol sets.

How many ASCII characters can be printable?

Related subjects: Computing hardware and infrastructure. There are 95 printable ASCII characters, numbered 32 to 126. ASCII (American Standard Code for Information Interchange), generally pronounced [ˈæski], is a character encoding based on the English alphabet.

How many characters can Unicode have?

Unicode is a universal character set. It is aimed to include all the characters needed for any writing system or language. The first code point positions in Unicode use 16 bits to represent the most commonly used characters in a number of languages. This Basic Multilingual Plane allows for 65,536 characters.


3 Answers

See, http://en.wikipedia.org/wiki/Unicode_control_characters

You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes

The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F

other than C-control character, Unicode also has hundreds of formatting control characters, e.g. zero-width non-joiner, which makes character spacing closer, or bidirectional text control. This formatting control characters are rather scattered.

More importantly, what are you doing that requires you to know Unicode's non-printable characters? More likely than not, whatever you're trying to do is the wrong approach to solve your problem.

like image 81
Lie Ryan Avatar answered Oct 04 '22 14:10

Lie Ryan


First, you should remove the word 'UTF8' in your question, it's not pertinent (UTF8 is just one of the encodings of Unicode, it's something orthogonal to your question).

Second: the meaning of "printable/non printable" is less clear in Unicode. Perhaps you mean a "graphical character" ; and one can even dispute if a space is printable/graphical. The non-graphical characters would consist, basically, of control characters: the range 0x00-0x0f plus some others that are scattered.

Anyway, the vast majority of Unicode characters (more than 200.000) are "graphical". But this certainly does not imply that they are printable in your environment.

It seems to me a bad idea, if you intend to generate a "random printable" unicode string, to try to include all "printable" characters.

like image 30
leonbloy Avatar answered Oct 04 '22 13:10

leonbloy


This is an old question, but it is still valid and I think there is more to usefully, but briefly, say on the subject than is covered by existing answers.

Unicode

Unicode defines properties for characters.

One of these properties is "General Category" which has Major classes and subclasses. The Major classes are Letter, Mark, Punctuation, Symbol, Separator, and Other.

By knowing the properties of your characters, you can decide whether you consider them printable in your particular context.

You must always remember that terms like "character" and "printable" are often difficult and have interesting edge-cases.


Programming Language support

Some programming languages assist with this problem.

For example, the Go language has a "unicode" package which provides many useful Unicode-related functions including these two:

func IsGraphic(r rune) bool

IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such  
characters include letters, marks, numbers, punctuation, symbols, and spaces, 
from categories L, M, N, P, S, Zs. 

func IsPrint(r rune) bool

IsPrint reports whether the rune is defined as printable by Go. Such  
characters include letters, marks, numbers, punctuation, symbols, and  
the ASCII space character, from categories L, M, N, P, S and the ASCII  
space character. This categorization is the same as IsGraphic except  
that the only spacing character is ASCII space, U+0020.

Notice that it says "defined as printable by Go" not by "defined as printable by Unicode". It is almost as if there are some depths the wizards at Unicode dare not plumb.


Printable

The more you learn about Unicode, the more you realise how unexpectedly diverse and unfathomably weird human writing systems are.

In particular whether a particular "character" is printable is not always obvious.

Is a zero-width space printable? When is a hyphenation point printable? Are there characters whose printability depends on their position in a word or on what characters are adjacent to them? Is a combining-character always printable?


Footnotes

ASCII printable character range is \u0020 - \u007f

No it isn't. \u007f is DEL which is not normally considered a printable character. It is, for example, associated with the keyboard key labelled "DEL" whose earliest purpose was to command the deletion of a character from some medium (display, file etc).

In fact many 8-bit character sets have many non-consecutive ranges which are non-printable. See for example C0 and C1 controls.

like image 43
RedGrittyBrick Avatar answered Oct 04 '22 13:10

RedGrittyBrick