Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Windows, how do you enter a character outside of the Unicode Basic Multilingual Plane?

I know that Windows has supported supplemental planes since Windows XP.

I have fonts which I know have characters outside the basic multilingual plane (BMP).

For these characters, the Unicode codepoint consists of five hexadecimal digits.

I do not know how to enter these characters in applications.

Windows seems to only support keyboard entry of characters in the BMP. You can either enter a decimal number or some applications allow you to enter a four digit hexadecimal number.

Can someone confirm how entry is managed? I don't care if it directly from the keyboard or application-assisted. (The default Windows "Character Map" application only supports characters in the BMP, so I need suggestions -- preferably to an application supporting at least Unicode Version 5, if not 6.)

In Java, these characters are managed using "surrogate pairs" in UTF-16. I'm concerned that Windows may also have some of the old "Unicode is 16 bit" legacy, causing to have a similar issue. Even getting confirmation that I need to punch in surrogate pair numbers would be an answer.

Thanks!

like image 238
yam655 Avatar asked Mar 18 '12 02:03

yam655


2 Answers

Ok, i clearly do not know what are you talking about.

Anyway, refering to:

The default Windows "Character Map" application only supports characters in the BMP, so I need suggestions -- preferably to an application supporting at least Unicode Version 5, if not 6.

I've found a link to an application that could help.

http://www.babelstone.co.uk/software/babelpad.html

Download it, and select menu Tools -> then Character map.

Hope it could help.

If not sorry for the missunderstanding, just intending to help.

like image 94
Martin Avatar answered Oct 18 '22 22:10

Martin


At least in MS Word 2007, the Alt+X method works for non-BMP characters, too: enter U+ followed by the Unicode number in hexadecimal, then Alt+X. The characters U+ may be omitted if the preceding character is not a digit or a letter A–F or X. You may need to explicitly select the font of the text (i.e., Word does not necessarily switch to a font that contains the character, as it normally does with BMP characters).

In Word, you can alternatively use the Insert → Symbol command and then, in the insertion window, select a font that contains the character you need.

Using the UnicodeInput program, you can enter a character by pressing Alt++ and then entering the Unicode number. It supports non-BMP too, but with an odd restriction, due to a program bug: it does not work for non-BMP characters if the fourth digit from the right is a letter (e.g., U+1B000).

BabelPad, mentioned in Martin’s answer, is great alternative and lets you select characters both by number and by Unicode name.

There are probably other Unicode editors too that let you work with BMP; check out Alan Wood’s list of Unicode and Multilingual Programs and Utilities.

like image 2
Jukka K. Korpela Avatar answered Oct 18 '22 23:10

Jukka K. Korpela