Should I use \d or [0-9] to match digits in a Perl regex?

Tags:

Having read a number of questions/answers over the past few weeks, I have seen the use of \d in perl regular expressions commented on as incorrect. As in the later versions of perl \d is not the same as [0-9], as \d will represent any Unicode character that has the digit attribute, and that [0-9] represents the characters '0', '1', '2', ..., '9'.

I appreciate that in some contexts [0-9] will be the correct thing to use, and in others \d will be. I was wondering which people feel is the correct default to use?

Personally I find the \d notation very succinct and expressive, whereas in comparison [0-9] is somewhat cumbersome. But I have little experience of doing multi-language code, or rather code for languages that do not fit into the ASCII character range, and therefore may be being naive.

I notice

$find /System/Library/Perl/5.8.8/ -name \*pm | xargs grep '\\d' | wc -l   298 $find /System/Library/Perl/5.8.8/ -name \*pm | xargs grep '\[0-9\]' | wc -l   26

262

asked May 20 '09 23:05

Beano

1 Answers

It seems to me very dangerous to use \d, It is a poor design decision in the language, as in most cases you want [0-9]. Huffman-coding would dictate the use of \d for ASCII numbers.

Most of the previous posters have already highlighted why you should use [0-9], so let me give you a bit more data:

If I read the unicode charts correctly '۷۰' is a number (70 in indic, don't take my word for it).

Try this:

$ perl -le '$one = chr 0xFF11; print "$one + 1 = ", $one+1;' １ + 1 = 1

Here is a partial list of valid numbers (which may or may not show up properly in your browser, depending on the fonts you use), for each number, only the first of those being interpreted as a number when doing arithmetics with Perl, as shown above:

 ZERO:  0٠۰߀०০੦૦୦௦౦೦൦๐໐０  ONE:   1١۱߁१১੧૧୧௧౧೧൧๑໑１  TWO:   2٢۲߂२২੨૨୨௨౨೨൨๒໒２  THREE: 3٣۳߃३৩੩૩୩௩౩೩൩๓໓３  FOUR:  4٤۴߄४৪੪૪୪௪౪೪൪๔໔４  FIVE:  5٥۵߅५৫੫૫୫௫౫೫൫๕໕５  SIX:   6٦۶߆६৬੬૬୬௬౬೬൬๖໖６  SEVEN: 7٧۷߇७৭੭૭୭௭౭೭൭๗໗７  EIGHT: 8٨۸߈८৮੮૮୮௮౮೮൮๘໘８  NINE:  9٩۹߉९৯੯૯୯௯౯೯൯๙໙９��

Are you still not convinced?

193

answered Sep 21 '22 04:09

mirod

Related questions
                            
                                Replace the content of a textfile with a regex in powershell
                            
                                regex to get the number from the end of a string
                            
                                How to extract a value from a string using regex and a shell?
                            
                                Can I test if a regex is valid in C# without throwing exception
                            
                                Range out of order in character class in javascript
                            
                                Regex.Match whole words
                            
                                Passing regex modifier options to RegExp object
                            
                                Replace URLs in text with HTML links
                            
                                Regular expression for a string literal in flex/lex
                            
                                RegEx - Get All Characters After Last Slash in URL
                            
                                regex to find a pair of adjacent digits with different digits around them
                            
                                While replacing using regex, How to keep a part of matched string?
                            
                                Python glob but against a list of strings rather than the filesystem
                            
                                How to extract the nth word and count word occurrences in a MySQL string?
                            
                                How can I get at the matches when using preg_replace in PHP?
                            
                                Find "one letter that appears twice" in a string
                            
                                "vertical" regex matching in an ASCII "image"
                            
                                Regex: I want this AND that AND that... in any order
                            
                                regex implementation to replace group with its lowercase version
                            
                                Notepad++ Regular expression find and delete a line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I use \d or [0-9] to match digits in a Perl regex?

Tags:

regex

perl

Beano

People also ask

1 Answers

mirod

Recent Activity

Donate For Us