Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get list of what was not matched by regex?

Tags:

python

regex

I am splitting a string using "Python strings split with multiple separators":

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r'\w+', DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

I want to get a separate list of of what's in between the matched words:

[", ", " - ", " ", " ", " ", " ", "!?"]

How do I do this?

like image 958
jedierikb Avatar asked Dec 09 '25 21:12

jedierikb


1 Answers

print re.findall(r'\W+', DATA)  # note, UPPER-case "W"

yields the list you are looking for:

[', ', ' - ', ' ', ' ', ' ', ' ', '!?']

I used \W+ rather than \w+ which negates the character class you were using.

   \w  Matches word characters, i.e., letters, digits, and underscores.
   \W  Matches non-word characters, i.e., the negated version of \w

This Regular Expression Reference Sheet might be helpful in selecting the best character classes/meta characters for your regular expression searches/matches. Also, see this tutorial for more information (esp the reference section toward the bottom of the page)

like image 115
Levon Avatar answered Dec 12 '25 13:12

Levon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!