Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would be a regex for valid xml names?

Tags:

xml

[a-zA-Z_:]([a-zA-Z0-9_:.])*

Would this do?

like image 370
simpatico Avatar asked Dec 01 '22 10:12

simpatico


2 Answers

Do you mean XML element names? If so, no, that's too exclusive, there are lots of valid characters that that doesn't cover. More in the spec here and here:

NameStartChar    ::=    ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                        [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] |
                        [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
                        [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] |
                        [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] 

NameChar         ::=    NameStartChar | "-" | "." | [0-9] | #xB7 |
                        [#x0300-#x036F] | [#x203F-#x2040] 

Name             ::=    NameStartChar (NameChar)* 
like image 133
T.J. Crowder Avatar answered Dec 07 '22 01:12

T.J. Crowder


EDIT:

.NET also has the method XmlConvert.VerifyName(string).

From Wikipedia:

Unicode characters in the following code point ranges are valid in XML 1.0 documents:

  • U+0009
  • U+000A
  • U+000D
  • U+0020–U+D7FF
  • U+E000–U+FFFD
  • U+10000–U+10FFFF

Unicode characters in the following code point ranges are always valid in XML 1.1 documents:

  • U+0001–U+0008
  • U+000B–U+000C
  • U+000E–U+001F
  • U+007F–U+0084
  • U+0086–U+009F

The preceding code points are contained in the following code point ranges which are only valid in certain contexts in XML 1.1 documents:

  • U+0001–U+D7FF
  • U+E000–U+FFFD
  • U+10000–U+10FFFF
like image 32
JohnB Avatar answered Dec 07 '22 01:12

JohnB