Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there known URI scheme or URN namespace for Unicode characters?

I need to reference to a Unicode character with a URI. Following IANA references list multiple schemes and namespaces but do not mention anything about identifiers for the Unicode characters. Does anyone know if something like this exists already?

  • http://www.iana.org/assignments/uri-schemes.html
  • http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xml

I hoped to find something like

  • unicode://U+0394
  • urn:unicode://0394
  • http://unicode.org/unicode/0394

for the greek capital letter delta Δ.

If someone wonders, this is for a semantic web like application that uses URIs as identifiers for concepts, including concepts of the Unicode characters.

like image 805
Akseli Palén Avatar asked Jul 28 '12 09:07

Akseli Palén


2 Answers

I’m afraid there is no URL or URN for referring authoritative information on a Unicode character in general. In the Unicode Standard, information about individual characters is partly in the so-called character database (mostly plain text files in specific formats), partly in the Code Charts (PDF files). Neither of them offers a way to point at an individual character. Moreover, the information there is not exhaustive: there are important remarks on individual characters information scattered around the standard.

The Decodeunicode site has individually addressable items, such as

http://www.decodeunicode.org/en/u+0394

but its information content varies a lot and is generally very limited. It is not official, and it currently contains Unicode 5.0 only.

The Fileformat.info site is much more systematic, but it, too, is unofficial. It is basically limited to formal properties and data derivable from them, plus comments extracted from the Code Charts, plus instructions on typing the character in Windows, plus information about support in fonts—but that’s quite a lot! Example:

http://www.fileformat.info/info/unicode/char/0394/

like image 54
Jukka K. Korpela Avatar answered Sep 22 '22 21:09

Jukka K. Korpela


[ EDIT ] : found this URL matching your needs : http://unicode.org/cldr/utility/character.jsp?a=1F40F

.

Well, there is an URL referencing the authoritative information on the Unicode database, even though it does not describe (as said in the other answer) all the information on one specific character.

You have the following URL, pointing to the latest Unicode database. This is a simple list of existing valid Unicode characters. Some upcoming characters are missing (㋿), and you should expect it to be mutable.

  • https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

The contents looks like the following, which isn't so practical to use as-is.

$ grep -ai kangaroo UnicodeData.txt -C 7
1F991;SQUID;So;0;ON;;;;;N;;;;;
1F992;GIRAFFE FACE;So;0;ON;;;;;N;;;;;
1F993;ZEBRA FACE;So;0;ON;;;;;N;;;;;
1F994;HEDGEHOG;So;0;ON;;;;;N;;;;;
1F995;SAUROPOD;So;0;ON;;;;;N;;;;;
1F996;T-REX;So;0;ON;;;;;N;;;;;
1F997;CRICKET;So;0;ON;;;;;N;;;;;
1F998;KANGAROO;So;0;ON;;;;;N;;;;;
1F999;LLAMA;So;0;ON;;;;;N;;;;;
1F99A;PEACOCK;So;0;ON;;;;;N;;;;;
1F99B;HIPPOPOTAMUS;So;0;ON;;;;;N;;;;;
1F99C;PARROT;So;0;ON;;;;;N;;;;;
1F99D;RACCOON;So;0;ON;;;;;N;;;;;
1F99E;LOBSTER;So;0;ON;;;;;N;;;;;
1F99F;MOSQUITO;So;0;ON;;;;;N;;;;;

You could build up a hacky « hash-based » namespace with a suffix like this, but that's definitely non-standard.

  • https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt#1F998
like image 30
global uuid database Avatar answered Sep 25 '22 21:09

global uuid database