Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Isn't \d redundant in [\w\d]?

Tags:

python

regex

I am reading a book and see tons of examples like this:

(?P<email>
[\w\d.+-]+ # username
@
([\w\d.]+\.)+ # domain name prefix
(com|org|edu) # limit the allowed top-level domains
)

Since \w means [a-zA-Z0-9_], \d means [0-9], \d is subset of \w.
So, aren't those "\d"s redundant? Please someone confirm my understanding is correct as this drives me nut.

like image 225
hxin Avatar asked Nov 02 '15 19:11

hxin


People also ask

What does W mean in regex?

In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.

What does D mean in regex Java?

For example, \d means a range of digits (0-9), and \w means a word character (any lowercase letter, any uppercase letter, the underscore character, or any digit).

What does \s mean in regex?

The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r]

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.


1 Answers

Yes, this is redundant, and plain \w would work just as well. See https://docs.python.org/2/library/re.html

\d

When the UNICODE flag is not specified, matches any decimal digit; this is equivalent to the set [0-9]. With UNICODE, it will match whatever is classified as a decimal digit in the Unicode character properties database.

\w

When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.

like image 149
Russ Cox Avatar answered Sep 29 '22 12:09

Russ Cox