Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP regex for valid XML tag name

Tags:

regex

php

xml

What is a good general regex (in PHP terms) to determine if a string is a valid XML tag name?

I startet using /[^>]+/i but that also matches something like 4 \<< which obviously isn't a valid tag name.

So I tried combining all valid characters like /[a-z][a-z0-9_-]*/i which also isn't quite right, as XML allows virtually any character in tag names also of foreign languages.

I'm stuck on that now - should I just check if there are whitespace characters? Or is there more to it?

like image 486
F.P Avatar asked Dec 07 '22 19:12

F.P


2 Answers

why dont you just use an XML parser/generator which already knows the rules?

function isValidXmlElementName($elementName)
{
    try {
        new DOMElement($elementName);
    } catch (DOMException $e) {
        return false;
    }
    return true;
}

var_dump(isValidXmlElementName(' ')); // false 
var_dump(isValidXmlElementName('1')); // false
var_dump(isValidXmlElementName('-')); // false
var_dump(isValidXmlElementName('a')); // true
like image 187
Gordon Avatar answered Dec 10 '22 12:12

Gordon


From the XML specification:

[4]     NameStartChar      ::=      ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a]    NameChar       ::=      NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5]     Name       ::=      NameStartChar (NameChar)*
like image 24
Mark Byers Avatar answered Dec 10 '22 13:12

Mark Byers