I made an XML Schema and I have this in it.
<xs:element name="Email">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Some of my emails in one of my XML documents fail and I get this error
Email' element is invalid - The value '[email protected]' is invalid according to its datatype 'String' - The Pattern constraint failed. LineNumber: 15404 LinePostion: 32
So just looking at all the emails that passed and the ones that failed I noticed that all the ones that failed have an "_(underscore)". So I am unsure if this is the reason or not.
Edit
So I changed my regex to this
<xs:pattern value="[\w_]+([-+.'][\w_]+)*@[\w_]+([-.][\w_]+)*\.[\w_]+([-.][\w_]+)*"/>
It now works but don't understand why \w is not capturing it.
The W3C Recommendation on datatypes defines \w as:
[#X0000-#x10FFFF]-[\p{P}\p{Z}\p{C}](all characters except the set of "punctuation", "separator" and "other" characters)*
The underscore character definition in Unicode is 'LOW LINE' (U+005F), category: punctuation, connector [Pc]
so XML Schema handles character classes more in accordance with Unicode definitions.
But for e-mail regexp, you shold use strict ASCII, like [0-9A-Za-z_-] intead of \w (I bet email address with nonlatin characters is invalid :) ), yet better is to find a proven regexp syntax, or look into RFC, what is the proper e-mail format
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With