I have a string like:
" This is such an nice artwork"
and I have a tag_list ["art","paint"]
Basically, I want to write a function which accepts this string and taglist as inputs and returns me the word "artwork" as artwork contains the word art which is in taglist.
How do i do this most efficiently?
I want this to be efficient in terms of speed
def prefix_match(string, taglist):
# do something here
return word_in string
A Simple approach to solve this problem is to traverse through complete list and match given prefix with each string one by one, print all strings which contains given value as prefix. We have existing solution to solve this problem using Trie Data Structure. We can implement Trie in python using pytrie.
A prefix is the beginning letter of a word or group of words. Example: The word “unhappy” consists of the prefix “un." Given a query, string s , and a list of all possible words, return all words that have s as a prefix. 6.
Try the following:
def prefix_match(sentence, taglist):
taglist = tuple(taglist)
for word in sentence.split():
if word.startswith(taglist):
return word
This works because str.startswith()
can accept a tuple of prefixes as an argument.
Note that I renamed string
to sentence
so there isn't any ambiguity with the string module.
Try this:
def prefix_match(s, taglist):
words = s.split()
return [w for t in taglist for w in words if w.startswith(t)]
s = "This is such an nice artwork"
taglist = ["art", "paint"]
prefix_match(s, taglist)
The above will return a list with all the words in the string that match a prefix in the list of tags.
Here is a possible solution. I am using regex
, because I can get rid of punctuation symbols easily this way. Also, I am using collections.Counter
this might add efficiency if your string has a lot of repeated words.
tag_list = ["art","paint"]
s = "This is such an nice artwork, very nice artwork. This is the best painting I've ever seen"
from collections import Counter
import re
words = re.findall(r'(\w+)', s)
dicto = Counter(words)
def found(s, tag):
return s.startswith(tag)
words_found = []
for tag in tag_list:
for k,v in dicto.iteritems():
if found(k, tag):
words_found.append((k,v))
The last part can be done with list comprehension:
words_found = [[(k,v) for k,v in dicto.iteritems() if found(k,tag)] for tag in tag_list]
Result:
>>> words_found
[('artwork', 2), ('painting', 1)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With