Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

prefix matching in python

Tags:

python

I have a string like:

" This is such an nice artwork"

and I have a tag_list ["art","paint"]

Basically, I want to write a function which accepts this string and taglist as inputs and returns me the word "artwork" as artwork contains the word art which is in taglist.

How do i do this most efficiently?

I want this to be efficient in terms of speed

 def prefix_match(string, taglist):
        # do something here
     return word_in string
like image 262
frazman Avatar asked May 23 '12 22:05

frazman


People also ask

How do you match a string prefix in Python?

A Simple approach to solve this problem is to traverse through complete list and match given prefix with each string one by one, print all strings which contains given value as prefix. We have existing solution to solve this problem using Trie Data Structure. We can implement Trie in python using pytrie.

What is a prefix in Python?

A prefix is the beginning letter of a word or group of words. Example: The word “unhappy” consists of the prefix “un." Given a query, string s , and a list of all possible words, return all words that have s as a prefix. 6.


3 Answers

Try the following:

def prefix_match(sentence, taglist):
    taglist = tuple(taglist)
    for word in sentence.split():
        if word.startswith(taglist):
            return word

This works because str.startswith() can accept a tuple of prefixes as an argument.

Note that I renamed string to sentence so there isn't any ambiguity with the string module.

like image 197
Andrew Clark Avatar answered Oct 14 '22 12:10

Andrew Clark


Try this:

def prefix_match(s, taglist):
    words = s.split()
    return [w for t in taglist for w in words if w.startswith(t)]

s = "This is such an nice artwork"
taglist = ["art", "paint"]
prefix_match(s, taglist)

The above will return a list with all the words in the string that match a prefix in the list of tags.

like image 41
Óscar López Avatar answered Oct 14 '22 12:10

Óscar López


Here is a possible solution. I am using regex, because I can get rid of punctuation symbols easily this way. Also, I am using collections.Counter this might add efficiency if your string has a lot of repeated words.

tag_list =  ["art","paint"]

s = "This is such an nice artwork, very nice artwork. This is the best painting I've ever seen"

from collections import Counter
import re

words = re.findall(r'(\w+)', s)

dicto = Counter(words)

def found(s, tag):
    return s.startswith(tag)

words_found = []

for tag in tag_list:
    for k,v in dicto.iteritems():
        if found(k, tag):
            words_found.append((k,v))

The last part can be done with list comprehension:

words_found = [[(k,v) for k,v in dicto.iteritems() if found(k,tag)] for tag in tag_list]

Result:

>>> words_found
[('artwork', 2), ('painting', 1)]
like image 23
Akavall Avatar answered Oct 14 '22 12:10

Akavall