As far as I know, \d
should matchs non-english digits, e.g. ۱۲۳۴۵۶۷۸۹۰
but it doesn't work properly in JavaScript.
See this jsFiddle: http://jsfiddle.net/xZpam/
Is this a normal behavior?
Definition and Usage The [0-9] expression is used to find any character between the brackets. The digits inside the brackets can be any numbers or span of numbers from 0 to 9. Tip: Use the [^0-9] expression to find any character that is NOT a digit.
As in the later versions of perl \d is not the same as [0-9] , as \d will represent any Unicode character that has the digit attribute, and that [0-9] represents the characters '0', '1', '2', ..., '9'.
So only in C locale all [0-9] , [0123456789] , \d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and in some cases mean only [0123456789] . The \d is supported by few utilities.
The regular expression ^[0-9]+$ will match a non-empty contiguous string of digits, i.e. a non-empty line that is composed of nothing but digits.
It seems that JavaScript does not support this (along with other weaknesses of the language in RegExp). However there's a library called XRegExp that has a unicode addon, which enables unicode support through \p{}
category definition. For example if you use \p{Nd}
instead of \d
it will match digits:
<script src="xregexp-all.js" type="text/javascript"></script>
<script type="text/javascript">
var englishDigits = '123123';
var nonEnglishDigits = '۱۲۳۱۲۳';
var digitsPattern = XRegExp('\\p{Nd}+');
if (digitsPattern.test(nonEnglishDigits)) {
alert('Non-english using xregexp');
}
if (digitsPattern.test(englishDigits)) {
alert('English using xregexp');
}
</script>
Used \p{Nd}
instead of \p{N}
as it seems that \d
is equivalent to \p{Nd}
in non ECMA Script Regex engines. Thanks go to Shervin for pointing it out. See also this fiddle by Shervin.
JavaScript does not support Unicode regex matching (and it is far from the only language where such is true).
http://www.regular-expressions.info/unicode.html
In the documention of Mozilla Firefox (https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp) you will find that:
\d
Matches a digit character in the basic Latin alphabet. Equivalent to [0-9].
\d
is equivalent to [0-9]
, according to MDN.
From MDN . RegEx Test
Matches a digit character in the basic Latin alphabet. Equivalent to [0-9].
Matches a digit character. Equivalent to [0-9].
For example, /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."
From MDN
Yes, it is normal and correct that \d
matches the Ascii digits 0
to 9
only. The authoritative reference is the ECMAScript standard. It is not particularly easy reading, but clause 15.10.2.12 (CharacterClassEscape) specifies that \d
denotes “the ten-element set of characters containing the characters 0 through 9 inclusive”.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With