Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Not Extracting Expected Pattern

I'm new to RegEx and I am trying to perform a simple match to extract a list of items using re.findall. However, I am not getting the expected result. Can you please help explain why I am also getting the first piece of this string based on the below regex pattern and what I need to modify to get the desired output?

import re
string = '''aaaa_1y345_xyz_orange_bar_1
aaaa_123a5542_xyz_orange_bar_1
bbbb_1z34512_abc_purple_bar_1'''

print(re.findall('_\w+_\w+_bar_\d+', string))

Current Output:

['_1y345_xyz_orange_bar_1', '_123a5542_xyz_orange_bar_1', '_1z34512_abc_purple_bar_1']

Desired Output:

['_xyz_orange_bar_1', '_xyz_orange_bar_1', '_abc_purple_bar_1']
like image 995
MBasith Avatar asked Feb 18 '26 10:02

MBasith


1 Answers

The \w pattern matches letters, digits and _ symbol. Depending on the Python version and options used, the letters and digits it can match may be from the whole Unicode range or just ASCII.

So, the best way to fix the issue is by replacing \w with [^\W_]:

import re
string = '''aaaa_1y345_xyz_orange_bar_1
aaaa_123a5542_xyz_orange_bar_1
bbbb_1z34512_abc_purple_bar_1'''
print(re.findall(r'_[^\W_]+_[^\W_]+_bar_[0-9]+', string))
# => ['_xyz_orange_bar_1', '_xyz_orange_bar_1', '_abc_purple_bar_1']

See the Python demo.

Details:

  • _ - an underscore
  • [^\W_]+ - 1 or more chars that are either digits or letters (a [^ starts the negated character class, \W matches any non-word char, and _ is added to match any word chars other than _)
  • _[^\W_]+ - same as above
  • _bar_ - a literal substring _bar_
  • [0-9]+ - 1 or more ASCII digits.

See the regex demo.

like image 128
Wiktor Stribiżew Avatar answered Feb 21 '26 15:02

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!