I am trying to use a dictionary key to replace strings in a pandas column with its values. However, each column contains sentences. Therefore, I must first tokenize the sentences and detect whether a Word in the sentence corresponds with a key in my dictionary, then replace the string with the corresponding value.
However, the result that I continue to get it none. Is there a better pythonic way to approach this problem?
Here is my MVC for the moment. In the comments, I specified where the issue is happening.
import pandas as pd
data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big','The cat from the UK is small','The rock was found in LA.']
}
ids = {'Id':['NYC','LA','UK'],
      'City':['New York City','Los Angeles','United Kingdom']}
df = pd.DataFrame(data)
ids = pd.DataFrame(ids)
def col2dict(ids):
    data = ids[['Id', 'City']]
    idDict = data.set_index('Id').to_dict()['City']
    return idDict
def replaceIds(data,idDict):
    ids = idDict.keys()
    types = idDict.values()
    data['commentTest'] = data['Comment']
    words = data['commentTest'].apply(lambda x: x.split())
    for (i,word) in enumerate(words):
        #Here we can see that the words appear
        print word
        print ids
        if word in ids:
        #Here we can see that they are not being recognized. What happened?
            print ids
            print word
            words[i] = idDict[word]
            data['commentTest'] = ' '.apply(lambda x: ''.join(x))
    return data
idDict = col2dict(ids)
results = replaceIds(df, idDict)
Results:
None
I am using python2.7 and when I am printing out the dict, there are u' of Unicode. 
My expected outcome is:
Categories
Comment
Type
commentTest
  Categories  Comment  Type commentTest
0 animal  The NYC tree is very big tree The New York City tree is very big 
1 plant The cat from the UK is small dog  The cat from the United Kingdom is small 
2 object  The rock was found in LA. rock  The rock was found in Los Angeles. 
                You can replace a string in the pandas DataFrame column by using replace(), str. replace() with lambda functions.
You can replace substring of pandas DataFrame column by using DataFrame. replace() method. This method by default finds the exact sting match and replaces it with the specified value. Use regex=True to replace substring.
You can use df. replace({"Courses": dict}) to remap/replace values in pandas DataFrame with Dictionary values. It allows you the flexibility to replace the column values with regular expressions for regex substitutions.
You can create dictionary and then replace:
ids = {'Id':['NYC','LA','UK'],
      'City':['New York City','Los Angeles','United Kingdom']}
ids = dict(zip(ids['Id'], ids['City']))
print (ids)
{'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City'}
df['commentTest'] = df['Comment'].replace(ids, regex=True)
print (df)
  Categories                       Comment  Type  \
0     animal      The NYC tree is very big  tree   
1      plant  The cat from the UK is small   dog   
2     object     The rock was found in LA.  rock   
                                commentTest  
0        The New York City tree is very big  
1  The cat from the United Kingdom is small  
2        The rock was found in Los Angeles.  
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With