Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python dictionary replacement with space in key

I have a string and a dictionary, I have to replace every occurrence of the dict key in that text.

text = 'I have a smartphone and a Smart TV'
dict = {
    'smartphone': 'toy',
    'smart tv': 'junk'
}

If there is no space in keys, I will break the text into word and compare one by one with dict. Look like it took O(n). But now the key have space inside it so thing is more complected. Please suggest me the good way to do this and please notice the key may not match case with the text.

Update

I have think of this solution but it not efficient. O(m*n) or more...

for k,v in dict.iteritems():
    text = text.replace(k,v) #or regex...
like image 599
James Avatar asked Oct 19 '22 16:10

James


2 Answers

If the key word in the text is not close to each others (keyword other keyword) we may do this. Took O(n) to me >"<

def dict_replace(dictionary, text, strip_chars=None, replace_func=None):
    """
        Replace word or word phrase in text with keyword in dictionary.

        Arguments:
            dictionary: dict with key:value, key should be in lower case
            text: string to replace
            strip_chars: string contain character to be strip out of each word
            replace_func: function if exist will transform final replacement.
                          Must have 2 params as key and value

        Return:
            string

        Example:
            my_dict = {
                "hello": "hallo",
                "hallo": "hello",    # Only one pass, don't worry
                "smart tv": "http://google.com?q=smart+tv"
            }
            dict_replace(my_dict, "hello google smart tv",
                         replace_func=lambda k,v: '[%s](%s)'%(k,v))
    """

    # First break word phrase in dictionary into single word
    dictionary = dictionary.copy()
    for key in dictionary.keys():
        if ' ' in key:
            key_parts = key.split()
            for part in key_parts:
                # Mark single word with False
                if part not in dictionary:
                    dictionary[part] = False

    # Break text into words and compare one by one
    result = []
    words = text.split()
    words.append('')
    last_match = ''     # Last keyword (lower) match
    original = ''       # Last match in original
    for word in words:
        key_word = word.lower().strip(strip_chars) if \
                   strip_chars is not None else word.lower()
        if key_word in dictionary:
            last_match = last_match + ' ' + key_word if \
                         last_match != '' else key_word
            original = original + ' ' + word if \
                       original != '' else word
        else:
            if last_match != '':
                # If match whole word
                if last_match in dictionary and dictionary[last_match] != False:
                    if replace_func is not None:
                        result.append(replace_func(original, dictionary[last_match]))
                    else:
                        result.append(dictionary[last_match])
                else:
                    # Only match partial of keyword
                    match_parts = last_match.split(' ')
                    match_original = original.split(' ')
                    for i in xrange(0, len(match_parts)):
                        if match_parts[i] in dictionary and \
                           dictionary[match_parts[i]] != False:
                            if replace_func is not None:
                                result.append(replace_func(match_original[i], dictionary[match_parts[i]]))
                            else:
                                result.append(dictionary[match_parts[i]])
            result.append(word)
            last_match = ''
            original = ''

    return ' '.join(result)
like image 184
James Avatar answered Oct 21 '22 06:10

James


If your keys have no spaces:

output = [dct[i] if i in dct else i for i in text.split()]

' '.join(output)

You should use dct instead of dict so it doesn't collide with the built in function dict()

This makes use of a dictionary comprehension, and a ternary operator to filter the data.

If your keys do have spaces, you are correct:

for k,v in dct.iteritems():
    string.replace('d', dct[d])

And yes, this time complexity will be m*n, as you have to iterate through the string every time for each key in dct.

like image 31
mindink Avatar answered Oct 21 '22 06:10

mindink