Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use \d or [0-9] to match digits in a Perl regex?

Tags:

regex

perl

Having read a number of questions/answers over the past few weeks, I have seen the use of \d in perl regular expressions commented on as incorrect. As in the later versions of perl \d is not the same as [0-9], as \d will represent any Unicode character that has the digit attribute, and that [0-9] represents the characters '0', '1', '2', ..., '9'.

I appreciate that in some contexts [0-9] will be the correct thing to use, and in others \d will be. I was wondering which people feel is the correct default to use?

Personally I find the \d notation very succinct and expressive, whereas in comparison [0-9] is somewhat cumbersome. But I have little experience of doing multi-language code, or rather code for languages that do not fit into the ASCII character range, and therefore may be being naive.

I notice

$find /System/Library/Perl/5.8.8/ -name \*pm | xargs grep '\\d' | wc -l   298 $find /System/Library/Perl/5.8.8/ -name \*pm | xargs grep '\[0-9\]' | wc -l   26 
like image 262
Beano Avatar asked May 20 '09 23:05

Beano


People also ask

What is the difference between 0 9 and \D in regex?

As in the later versions of perl \d is not the same as [0-9] , as \d will represent any Unicode character that has the digit attribute, and that [0-9] represents the characters '0', '1', '2', ..., '9'.

How do I match numbers in Perl?

The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.

Which regex matches one or more digits?

Occurrence Indicators (or Repetition Operators): +: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.

What does \d do in regex?

Decimal digit character: \d \d matches any decimal digit. It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets.


1 Answers

It seems to me very dangerous to use \d, It is a poor design decision in the language, as in most cases you want [0-9]. Huffman-coding would dictate the use of \d for ASCII numbers.

Most of the previous posters have already highlighted why you should use [0-9], so let me give you a bit more data:

  • If I read the unicode charts correctly '۷۰' is a number (70 in indic, don't take my word for it).

  • Try this:

    $ perl -le '$one = chr 0xFF11; print "$one + 1 = ", $one+1;' 1 + 1 = 1 
  • Here is a partial list of valid numbers (which may or may not show up properly in your browser, depending on the fonts you use), for each number, only the first of those being interpreted as a number when doing arithmetics with Perl, as shown above:

     ZERO:  0٠۰߀०০੦૦୦௦౦೦൦๐໐0  ONE:   1١۱߁१১੧૧୧௧౧೧൧๑໑1  TWO:   2٢۲߂२২੨૨୨௨౨೨൨๒໒2  THREE: 3٣۳߃३৩੩૩୩௩౩೩൩๓໓3  FOUR:  4٤۴߄४৪੪૪୪௪౪೪൪๔໔4  FIVE:  5٥۵߅५৫੫૫୫௫౫೫൫๕໕5  SIX:   6٦۶߆६৬੬૬୬௬౬೬൬๖໖6  SEVEN: 7٧۷߇७৭੭૭୭௭౭೭൭๗໗7  EIGHT: 8٨۸߈८৮੮૮୮௮౮೮൮๘໘8  NINE:  9٩۹߉९৯੯૯୯௯౯೯൯๙໙9�� 

Are you still not convinced?

like image 193
mirod Avatar answered Sep 21 '22 04:09

mirod