I want to check whether a certain term is contained in a document. However, sometimes, the word is in several forms (plural, past tense, etc).
'Hello Worlds'
'Hellos Worlds'
'Jello World'
'Hello Worlded'
How can I create a search term which will find all instances such as
'*ello* World*'
where star is a wild card that doesn't necessarily have to be included in the word.
I found documentation for an fnmatch module, but I can't see how that can help me search through a document.
Use regular expressions and just loop through the file:
import re
f=open('test.file.here', 'r')
pattern = re.compile("^[^\s]*ello[^\s]*\sWorld[^\s]*$")
for line in f:
if pattern.match(line):
print line,
f.close()
I would usually opt for a regular expression, but if for some reason you want to stick to the wildcard format, you can do this:
from fnmatch import fnmatch
pattern = '*ello* World*'
with open('sample.txt') as file:
for line in f:
if fnmatch(line, pattern):
print(line)
The * syntax you describe is known as globbing. It doesn't work for documents, just files and directories. Regular expressions, as others have noted, are the answer.
If you're doing anything complicated, regular expressions are the way to go. If you're not comfortable with those, I think for your specific question you could also use "in". For example:
x = 'hello world'
if 'ello' in x and 'world' in x':
print 'matches'
else:
print 'does not match'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With