Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex for finding all words in a string [duplicate]

Hello I am new into regex and I'm starting out with python. I'm stuck at extracting all words from an English sentence. So far I have:

import re

shop="hello seattle what have you got"
regex = r'(\w*) '
list1=re.findall(regex,shop)
print list1

This gives output:

['hello', 'seattle', 'what', 'have', 'you']

If I replace regex by

regex = r'(\w*)\W*'

then output:

['hello', 'seattle', 'what', 'have', 'you', 'got', '']

whereas I want this output

['hello', 'seattle', 'what', 'have', 'you', 'got']

Please point me where I am going wrong.

like image 994
TNT Avatar asked May 31 '16 10:05

TNT


People also ask

How do you repeat words in regex?

“\\w+” A word character: [a-zA-Z_0-9] “\\W+”: A non-word character: [^\w] “\\1”: Matches whatever was matched in the 1st group of parentheses, which in this case is the (\w+) “+”: Match whatever it's placed after 1 or more times.

What is b regex?

The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).


1 Answers

Use word boundary \b

import re

shop="hello seattle what have you got"
regex = r'\b\w+\b'
list1=re.findall(regex,shop)
print list1

OP : ['hello', 'seattle', 'what', 'have', 'you', 'got']

or simply \w+ is enough

import re

shop="hello seattle what have you got"
regex = r'\w+'
list1=re.findall(regex,shop)
print list1

OP : ['hello', 'seattle', 'what', 'have', 'you', 'got']
like image 157
Pranav C Balan Avatar answered Oct 11 '22 02:10

Pranav C Balan