I am reading a book and see tons of examples like this:
(?P<email>
[\w\d.+-]+ # username
@
([\w\d.]+\.)+ # domain name prefix
(com|org|edu) # limit the allowed top-level domains
)
Since \w
means [a-zA-Z0-9_]
, \d
means [0-9]
, \d
is subset of \w
.
So, aren't those "\d"s redundant? Please someone confirm my understanding is correct as this drives me nut.
In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.
For example, \d means a range of digits (0-9), and \w means a word character (any lowercase letter, any uppercase letter, the underscore character, or any digit).
The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r]
The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.
Yes, this is redundant, and plain \w
would work just as well. See https://docs.python.org/2/library/re.html
\d
When the
UNICODE
flag is not specified, matches any decimal digit; this is equivalent to the set[0-9]
. WithUNICODE
, it will match whatever is classified as a decimal digit in the Unicode character properties database.
\w
When the
LOCALE
andUNICODE
flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set[a-zA-Z0-9_]
. WithLOCALE
, it will match the set[0-9_]
plus whatever characters are defined as alphanumeric for the current locale. IfUNICODE
is set, this will match the characters[0-9_]
plus whatever is classified as alphanumeric in the Unicode character properties database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With