I am reading a book and see tons of examples like this: <pre class="prettyprint"><code>(?P<email> [\w\d.+-]+ # username @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) </code></pre> Since <code>\w</code> means <code>[a-zA-Z0-9_]</code>, <code>\d</code> means <code>[0-9]</code>, <code>\d</code> is subset of <code>\w</code>. So, aren't those "\d"s redundant? Please someone confirm my understanding is correct as this drives me nut.

Yes, this is redundant, and plain <code>\w</code> would work just as well. See https://docs.python.org/2/library/re.html <blockquote> <code>\d</code> When the <code>UNICODE</code> flag is not specified, matches any decimal digit; this is equivalent to the set <code>[0-9]</code>. With <code>UNICODE</code>, it will match whatever is classified as a decimal digit in the Unicode character properties database. <code>\w</code> When the <code>LOCALE</code> and <code>UNICODE</code> flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set <code>[a-zA-Z0-9_]</code>. With <code>LOCALE</code>, it will match the set <code>[0-9_]</code> plus whatever characters are defined as alphanumeric for the current locale. If <code>UNICODE</code> is set, this will match the characters <code>[0-9_]</code> plus whatever is classified as alphanumeric in the Unicode character properties database. </blockquote>

Isn't \d redundant in [\w\d]?

Tags:

python

regex

I am reading a book and see tons of examples like this:

(?P<email>
[\w\d.+-]+ # username
@
([\w\d.]+\.)+ # domain name prefix
(com|org|edu) # limit the allowed top-level domains
)

Since \w means [a-zA-Z0-9_], \d means [0-9], \d is subset of \w.
So, aren't those "\d"s redundant? Please someone confirm my understanding is correct as this drives me nut.

225

asked Nov 02 '15 19:11

hxin

1 Answers

Yes, this is redundant, and plain \w would work just as well. See https://docs.python.org/2/library/re.html

\d

When the UNICODE flag is not specified, matches any decimal digit; this is equivalent to the set [0-9]. With UNICODE, it will match whatever is classified as a decimal digit in the Unicode character properties database.

\w

When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.

149

answered Sep 29 '22 12:09

Russ Cox

Related questions
                            
                                Zoomed inset in matplotlib without re-plotting data
                            
                                Java equivalent of Python's str.strip().split()?
                            
                                How to show image in django admin
                            
                                Python Eve, SQLalchemy and ForeignKey
                            
                                Grammar rule extraction from parsed result
                            
                                generate time series by quarter, increment by one quarter
                            
                                Python script to exe on python 3.5
                            
                                Convert categorical variable to color with Matplotlib
                            
                                Python Sorting Contents of txt file
                            
                                python invalid syntax in comment
                            
                                pandas: write df to text file - indent df to right by 5 white spaces
                            
                                how to move identical elements in numpy array into subarrays
                            
                                Permutations over subarray in python
                            
                                Why does this loop in python runs progressively slower?
                            
                                Merge two rows in the same Dataframe if their index is the same?
                            
                                Eliminating spaces between equal signs in ConfigParser - Python [duplicate]
                            
                                Downloading flask-generated html page
                            
                                K-means Clustering in Python
                            
                                Double Output when calling a function through another one
                            
                                Fastest way to strip punctuation from a unicode string in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With