Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting whole words based on substring matching in python

Tags:

python

regex

I am looking for a regex expression in Python. I have a long string of text, and I have a list of substrings to do matching in the long string of text.

Example substrings in : 'table', 'e furnish' Example string :

'Today is a good day to do up the table furnishings. Lets go to the store.'

For 'table', I would like to extract 'table'. For 'e furnish', I would like to extract 'table furnishings'.

My current code is :

for item in checklist:
 pattern = r"[\s](.*)" + item +"([a-z]){0,2}[\s\.]"
 print pattern    
 matchObj = re.search(pattern, line)
 if matchObj:
   print "matchObj.group() : ", matchObj.group()
 else:
   print ("No match!!")

but I am not able to pick up whole words encapsulating the substrings. The thing is that the substrings can be single or multiple words and it might match entire words or just part of words. For those substrings with multiple words, the extracted words must be together with no other word in between.

Thank you for your help, everyone.

like image 974
XJL Avatar asked Oct 30 '15 08:10

XJL


People also ask

How do you extract certain words from a string in Python?

Method #1 : Using split() Using the split function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.

How do you match a whole string in Python?

fullmatch() function in Python. re. fullmatch() returns a match object if and only if the entire string matches the pattern. Otherwise, it will return None.

How do I extract a substring between two markers in Python?

Using index() + loop to extract string between two substrings. In this, we get the indices of both the substrings using index(), then a loop is used to iterate within the index to find the required string between them.


1 Answers

You could use \w* any amount of word characters as a joker.

\w*e furnish\w*

See demo at regex101

like image 110
bobble bubble Avatar answered Sep 18 '22 14:09

bobble bubble