Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In regex, what does \w* mean?

What does this regex mean?

^[\w*]$
like image 732
TIMEX Avatar asked Oct 16 '09 08:10

TIMEX


People also ask

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.

What is W's regex?

The regular expression \W\S matches a sequence of two characters; one non-word, and one non-space. If you want to combine them, that's [^\w\s] which matches one character which does not belong to either the word or the whitespace group.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.


3 Answers

Quick answer: ^[\w*]$ will match a string consisting of a single character, where that character is alphanumeric (letters, numbers) an underscore (_) or an asterisk (*).

Details:

  • The "\w" means "any word character" which usually means alphanumeric (letters, numbers, regardless of case) plus underscore (_)
  • The "^" "anchors" to the beginning of a string, and the "$" "anchors" To the end of a string, which means that, in this case, the match must start at the beginning of a string and end at the end of the string.
  • The [] means a character class, which means "match any character contained in the character class".

It is also worth mentioning that normal quoting and escaping rules for strings make it very difficult to enter regular expressions (all the backslashes would need to be escaped with additional backslashes), so in Python there is a special notation which has its own special quoting rules that allow for all of the backslashes to be interpreted properly, and that is what the "r" at the beginning is for.

Note: Normally an asterisk (*) means "0 or more of the previous thing" but in the example above, it does not have that meaning, since the asterisk is inside of the character class, so it loses its "special-ness".

For more information on regular expressions in Python, the two official references are the re module, the Regular Expression HOWTO.

like image 145
Adam Batkin Avatar answered Oct 01 '22 02:10

Adam Batkin


As exhuma said, \w is any word-class character (alphanumeric as Jonathan clarifies).

However because it is in square brackets it will match:

  1. a single alphanumeric character OR
  2. an asterisk (*)

So the whole regular expression matches:

  • the beginning of a line (^)
  • followed by either a single alphanumeric character or an asterisk
  • followed by the end of a line ($)

so the following would match:

blah
z  <- matches this line
blah

or

blah
* <- matches this line
blah
like image 2
atomice Avatar answered Oct 01 '22 00:10

atomice


\w refers to 0 or more alphanumeric characters and the underscore. the * in your case is also inside the character class, so [\w*] would match all of [a-zA-Z0-9_*] (the * is interpreted literally)

See http://www.regular-expressions.info/reference.html

To quote:

\d, \w and \s --- Shorthand character classes matching digits, word characters, and whitespace. Can be used inside and outside character classes.

Edit corrected in response to comment

like image 1
Jonathan Fingland Avatar answered Oct 01 '22 00:10

Jonathan Fingland