Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regex assertion in python

Tags:

python

regex

I am experimenting with regex and i have read up on assertions a bit and seen examples but for some reason I can not get this to work.. I am trying to get the word after the following pattern using look-behind.

import re
s = '123abc456someword 0001abde19999anotherword'
re.findall(r'(?<=\d+[a-z]+\d+)[a-z]+', s, re.I)

The results should be someword and anotherword

But i get error: look-behind requires fixed-width pattern

Any help appreciated.

like image 485
Jackson Avatar asked Mar 20 '23 01:03

Jackson


2 Answers

Python's re module only allows fixed-length strings using look-behinds. If you want to experiment and be able to use variable length look-behinds in regexes, use the alternative regex module:

>>> import regex
>>> s = '123abc456someword 0001abde19999anotherword'
>>> regex.findall(r'(?i)(?<=\d+[a-z]+\d+)[a-z]+', s)
['someword', 'anotherword']

Or simply avoid using look-behind in general and use a capturing group ( ):

>>> import re
>>> s = '123abc456someword 0001abde19999anotherword'
>>> re.findall(r'\d+[a-z]+\d+([a-z]+)', s, re.I)
['someword', 'anotherword']
like image 163
hwnd Avatar answered Mar 27 '23 16:03

hwnd


Convert it to Non-capturing group and get the matched group from index 1.

(?:\d+\w+\d+)(\w+\b)

here is DEMO

If you are interested in [a-z] only then change \w to [a-z] in above regex pattern. Here \b is added to assert position at a word boundary.

sample code:

import re
p = re.compile(ur'(?:\d+\w+\d+)(\w+\b)', re.IGNORECASE)
test_str = u"123abc456someword 0001abde19999anotherword"

re.findall(p, test_str)
like image 32
Braj Avatar answered Mar 27 '23 15:03

Braj