Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List of all words matching regular expression

Tags:

python

regex

Let assume that I have some string: "Lorem ipsum dolor sit amet" I need a list of all words with lenght more than 3. Can I do it with regular expressions?

e.g.

pattern = re.compile(r'some pattern')
result = pattern.search('Lorem ipsum dolor sit amet').groups()

result contains 'Lorem', 'ipsum', 'dolor' and 'amet'.

EDITED:

The words I mean can only contains letters and numbers.

like image 945
szaman Avatar asked Jan 04 '11 13:01

szaman


1 Answers

>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolor sit? amet...')
['Lorem', 'ipsum', 'dolor', 'amet']

Take note that in Python 3, where all strings are Unicode, this will also find words that use non-ASCII letters:

>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolör sit? amet...')
['Lorem', 'ipsum', 'dolör', 'amet']

In Python 2, you'd have to use

>>> myre = re.compile(r"\w{4,}", re.UNICODE)
>>> myre.findall(u'Lorem, ipsum! dolör sit? amet...')
[u'Lorem', u'ipsum', u'dol\xf6r', u'amet']
like image 132
Tim Pietzcker Avatar answered Sep 28 '22 04:09

Tim Pietzcker