Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the at-sign (@) a valid HTML/XML tag character?

I'm doing some HTML stripping using regular expressions (yes, I know, never parse HTML with regexes, but I'm just stripping it, and I also unfortunately cannot use any external libraries). I'm using a regex from the Regular Expressions Cookbook, and it has worked great, except I just ran into this problem:

In the string Bob Saget <[email protected]>, my regex is matching the email as a tag.

So my question is, is the @ sign a valid XML or HTML tag character? (I'm not asking whether or not it is valid within an attribute; I know that it is) If it is not, I will be able to successfully exclude it in my regex.

I'm not sure where to look this up. I looked here and I think that says that in XML, the at-sign is not allowed in a tag; however, I would appreciate some concrete proof.

like image 391
NickAldwin Avatar asked Aug 15 '11 13:08

NickAldwin


People also ask

What is tag name in XML?

It allows to create new tags (user defined tags). The first element of XML document is called root element. The simple XML document contain opening tag and closing tag. The XML tags are case sensitive i.e. <root> and <Root> both tags are different. The XML tags are used to define the scope of elements in XML document.

What are elements and attributes in XML?

element-name: It is the name of element. attributes: The attributes are used to define the XML element property and these attributes are separated by white space. It associates the name with a value, which is a string of characters.


1 Answers

After another look at the XML Specification:

A tag consists of:

'<' Name (S Attribute)* S? '>'

A Name consists of:

NameStartChar (NameChar)*

A NameStartChar consists of:

":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

A NameChar consists of:

NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

The @ sign is U+0040

So the @ sign is not valid in a NameChar or a NameStartChar, and thus not valid in a Name.

like image 149
NickAldwin Avatar answered Nov 07 '22 05:11

NickAldwin