Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create search terms with wildcards in Python?

I want to check whether a certain term is contained in a document. However, sometimes, the word is in several forms (plural, past tense, etc).

'Hello Worlds'
'Hellos Worlds'
'Jello World'
'Hello Worlded'

How can I create a search term which will find all instances such as

'*ello* World*'

where star is a wild card that doesn't necessarily have to be included in the word.

I found documentation for an fnmatch module, but I can't see how that can help me search through a document.

like image 379
coderman Avatar asked Apr 27 '11 19:04

coderman


4 Answers

Use regular expressions and just loop through the file:

import re
f=open('test.file.here', 'r')

pattern = re.compile("^[^\s]*ello[^\s]*\sWorld[^\s]*$")

for line in f:
  if pattern.match(line):
    print line,

f.close()
like image 86
photoionized Avatar answered Oct 08 '22 17:10

photoionized


I would usually opt for a regular expression, but if for some reason you want to stick to the wildcard format, you can do this:

from fnmatch import fnmatch

pattern = '*ello* World*'

with open('sample.txt') as file:
    for line in f:
        if fnmatch(line, pattern):
            print(line)
like image 23
seddonym Avatar answered Oct 08 '22 17:10

seddonym


The * syntax you describe is known as globbing. It doesn't work for documents, just files and directories. Regular expressions, as others have noted, are the answer.

like image 37
Brian O'Dell Avatar answered Oct 08 '22 15:10

Brian O'Dell


If you're doing anything complicated, regular expressions are the way to go. If you're not comfortable with those, I think for your specific question you could also use "in". For example:

x = 'hello world'
if 'ello' in x and 'world' in x':
     print 'matches'
else:
     print 'does not match'
like image 29
Alex S Avatar answered Oct 08 '22 16:10

Alex S