Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding XML element name beginning with a number?

Tags:

xml

I'm looking at the output of a tool, dumping a database table to XML. One of the columns is named 64kbit , the tool encodes that as such, and I need to replicate that:

 <_x0036_4kbit>0</_x0036_4kbit>

Is this some sort of standard encoding ? Where can I learn more about it ?

like image 555
Anonym Avatar asked Jan 18 '10 15:01

Anonym


People also ask

Can XML element name start with a number?

XML elements must follow these naming rules: Names can contain letters, numbers, and other characters. Names cannot start with a number or punctuation character.

Can XML elements have numbers?

Element names must start with a letter or underscore. Element names cannot start with the letters xml (or XML, or Xml, etc) Element names can contain letters, digits, hyphens, underscores, and periods. Element names cannot contain spaces.


2 Answers

The official word is that the restriction imposed on Xml naming conventions are inherited from Xml's parent-set SGML, with one exception only: In Xml, as an additional option, names may begin with an underscore '_' character.

SGML was developed by IBM in the 1960s, by a group of minds that were thinking '1960s style'.

As a result, the brain-storm that lead to the creation of SGML was likely to have been distracted by the overwhelming notion that space-ships, time-travel and flairs made of kitchen foil to protect against 'them aliens' and their fool-hardy attempts at thought-provocation and mind-control were justified thought processes.

So. The question still remains. Why doesn't SGML allow numbers? Furthermore, why would there be any sort of restriction imposed on the use of any character other than the control-characters; <, >, & and empty space? It would be madness, surely to present the computer geek with so many keys for so many different characters, only to prevent him or her from using them.

The most significant reason is the 1960s thinking parser, and it's following of the complexity rule to a degree of outright pedantry.

'The simpler the parser is, the faster it will perform'

The alphabet is 26 capital + 26 uncapital characters big in total, and that's 52. Allowing numbers is an additional ten digits, which is about a sixth more!

In human terms, this would be like having to wash six hideously filth-encrusted pots, each one taking an hour to clean, and then hidden underneath the last pot is an extra bonus pot to wash, and you must wash it! You have to repeat this routine every single day for the rest of your life, and that's exactly what it like. Precisely!

Mark-up language documents have a tendency to bulge in content. So, the less jobs for the parser, mean a direct increase in performance speed. The benefits then trickle down through the ranks until they metamorphose into pure lucrative performance.

In the 'Ye olde days of horse, carriage and a Commodore 64' it was far more the user's responsibility to count their bits and bytes manually, in order for the kilobytes to take care of themselves. However, as the modern CPU is more able to cope than its ancient predecessor, the restrictions imposed by the parser have become more significant than the performance issues.

If it's any consolation, if I were to design a Mark-up language myself (which for argument's sake, we will call NAM-LIT-MAML, because Nicholas' awesome mark-up language is the most awesome mark-up language (ever!), then it would allow you to use any number of all the characters in the entire history of the world, and indeed universe, without exception, and I would work really hard to create some never been used before characters for the language's own use, which could still be used within the document by use of its own escape character that looks nothing like any other character that's ever been used before by anyone ever.

The restrictions imposed by Xml are inherited from SGML, and we can all agree that in this day and age of space-ship camels and other useful robotic mammals, they are unnecessary, stupid and go against the grain of Object Oriented programming.

Further reading at http://www.w3.org/TR/REC-xml/

Although the simpliest way that I have found to make a name xml compatible is to include a suffix of '_', there is no standard and as such other methods are in use.

In your example, the first character has been converted into a hex value. This hex value represents the '6' character in both ASCII, Unicode and undoubtedly others.

A good thing about using hex values is that all characters in a code-set e.g. Unicode may be represented.

A bad thing is that they aren't as readable at a glance.

like image 108
14 revs Avatar answered Oct 05 '22 19:10

14 revs


Well, it doesn't seem to be too standard, but XML explicitly disallows numbers (and some other things) as the first character of an element name:

NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                  [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] |
                  [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
                  [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] |
                  [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

This encoding here just kinda escapes the first character if it doesn't fit that requirements. It uses the hexadecimal value of that character. _x0036_ obviously corresponds to hexadeximal 0x36 which is 54 in decimal and represents the digit 6.

like image 22
Joey Avatar answered Oct 05 '22 21:10

Joey