Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't the underscore be matched by '\W'?

Tags:

python

regex

I know that _ cannot be matched by \W while any other punctuation can. As the docs state: \w is a set of alphanumeric characters and the underscore.

At the same time:

enter image description here

I have always been confused by this but never actually bothered to question why.

Does it have to do with the special role that _ plays in Python?

like image 574
minerals Avatar asked Mar 11 '16 13:03

minerals


People also ask

Why do we use underscore after the name of a variable?

Following are different places where _ is used in Python: Multiple time we do not want return values at that time assign those values to Underscore. It used as throwaway variable. Python has their by default keywords which we can not use as the variable name. To avoid such conflict between python keyword and variable we use underscore after name

What is the use of underscore character in SQL?

The underscore character ( _ ) represents a single character to match a pattern from a word or string. More than one ( _ ) underscore characters can be used to match a pattern of multiple characters. To get 'cust_code', 'cust_name', 'cust_city' and 'cust_country' from the table 'customer' with following conditions -

What is underscore operator in C++?

Underscore Operator. The underscore character ( _ ) represents a single character to match a pattern from a word or string. More than one ( _ ) underscore characters can be used to match a pattern of multiple characters. Example.

What does leading double underscore mean in Python?

Leading double underscore tell python interpreter to rewrite name in order to avoid conflict in subclass.Interpreter changes variable name with class extension and that feature known as the Mangling. In Mangling python interpreter modify variable name with ___.


1 Answers

Lots of Python's regular expression syntax in the module re comes from Perl, which was influenced by sed and awk. The \w comes from there and has a long history.


In the original regex module (which was deprecated in Python 1.5), \w did not include _, as is evident from Python 1.4 documentation:

\w

Matches any alphanumeric character; this is equivalent to the set [a-zA-Z0-9].


P.S. While it is not very convenient can match all non-\w + _ with a character class [\W_].

like image 162