I have a string and a dictionary, I have to replace every occurrence of the dict key in that text.
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
If there is no space in keys, I will break the text into word and compare one by one with dict. Look like it took O(n). But now the key have space inside it so thing is more complected. Please suggest me the good way to do this and please notice the key may not match case with the text.
Update
I have think of this solution but it not efficient. O(m*n) or more...
for k,v in dict.iteritems():
text = text.replace(k,v) #or regex...
If the key word in the text is not close to each others (keyword other keyword) we may do this. Took O(n) to me >"<
def dict_replace(dictionary, text, strip_chars=None, replace_func=None):
"""
Replace word or word phrase in text with keyword in dictionary.
Arguments:
dictionary: dict with key:value, key should be in lower case
text: string to replace
strip_chars: string contain character to be strip out of each word
replace_func: function if exist will transform final replacement.
Must have 2 params as key and value
Return:
string
Example:
my_dict = {
"hello": "hallo",
"hallo": "hello", # Only one pass, don't worry
"smart tv": "http://google.com?q=smart+tv"
}
dict_replace(my_dict, "hello google smart tv",
replace_func=lambda k,v: '[%s](%s)'%(k,v))
"""
# First break word phrase in dictionary into single word
dictionary = dictionary.copy()
for key in dictionary.keys():
if ' ' in key:
key_parts = key.split()
for part in key_parts:
# Mark single word with False
if part not in dictionary:
dictionary[part] = False
# Break text into words and compare one by one
result = []
words = text.split()
words.append('')
last_match = '' # Last keyword (lower) match
original = '' # Last match in original
for word in words:
key_word = word.lower().strip(strip_chars) if \
strip_chars is not None else word.lower()
if key_word in dictionary:
last_match = last_match + ' ' + key_word if \
last_match != '' else key_word
original = original + ' ' + word if \
original != '' else word
else:
if last_match != '':
# If match whole word
if last_match in dictionary and dictionary[last_match] != False:
if replace_func is not None:
result.append(replace_func(original, dictionary[last_match]))
else:
result.append(dictionary[last_match])
else:
# Only match partial of keyword
match_parts = last_match.split(' ')
match_original = original.split(' ')
for i in xrange(0, len(match_parts)):
if match_parts[i] in dictionary and \
dictionary[match_parts[i]] != False:
if replace_func is not None:
result.append(replace_func(match_original[i], dictionary[match_parts[i]]))
else:
result.append(dictionary[match_parts[i]])
result.append(word)
last_match = ''
original = ''
return ' '.join(result)
If your keys have no spaces:
output = [dct[i] if i in dct else i for i in text.split()]
' '.join(output)
You should use dct instead of dict so it doesn't collide with the built in function dict()
This makes use of a dictionary comprehension, and a ternary operator to filter the data.
If your keys do have spaces, you are correct:
for k,v in dct.iteritems():
string.replace('d', dct[d])
And yes, this time complexity will be m*n, as you have to iterate through the string every time for each key in dct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With