Use dictionary to replace a string within a string in Pandas columns

Tags:

I am trying to use a dictionary key to replace strings in a pandas column with its values. However, each column contains sentences. Therefore, I must first tokenize the sentences and detect whether a Word in the sentence corresponds with a key in my dictionary, then replace the string with the corresponding value.

However, the result that I continue to get it none. Is there a better pythonic way to approach this problem?

Here is my MVC for the moment. In the comments, I specified where the issue is happening.

import pandas as pd

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big','The cat from the UK is small','The rock was found in LA.']
}

ids = {'Id':['NYC','LA','UK'],
      'City':['New York City','Los Angeles','United Kingdom']}


df = pd.DataFrame(data)
ids = pd.DataFrame(ids)

def col2dict(ids):
    data = ids[['Id', 'City']]
    idDict = data.set_index('Id').to_dict()['City']
    return idDict

def replaceIds(data,idDict):
    ids = idDict.keys()
    types = idDict.values()
    data['commentTest'] = data['Comment']
    words = data['commentTest'].apply(lambda x: x.split())
    for (i,word) in enumerate(words):
        #Here we can see that the words appear
        print word
        print ids
        if word in ids:
        #Here we can see that they are not being recognized. What happened?
            print ids
            print word
            words[i] = idDict[word]
            data['commentTest'] = ' '.apply(lambda x: ''.join(x))
    return data

idDict = col2dict(ids)
results = replaceIds(df, idDict)

Results:

None

I am using python2.7 and when I am printing out the dict, there are u' of Unicode.

My expected outcome is:

owwoow14

1 Answers

You can create dictionary and then replace:

ids = {'Id':['NYC','LA','UK'],
      'City':['New York City','Los Angeles','United Kingdom']}

ids = dict(zip(ids['Id'], ids['City']))
print (ids)
{'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City'}

df['commentTest'] = df['Comment'].replace(ids, regex=True)
print (df)
  Categories                       Comment  Type  \
0     animal      The NYC tree is very big  tree   
1      plant  The cat from the UK is small   dog   
2     object     The rock was found in LA.  rock   

                                commentTest  
0        The New York City tree is very big  
1  The cat from the United Kingdom is small  
2        The rock was found in Los Angeles.

161

answered Sep 18 '22 16:09

jezrael

Related questions
                            
                                How to find and replace nth occurrence of word in a sentence using python regular expression?
                            
                                FAILED: No config file 'alembic.ini' found
                            
                                Serve image stored in SQLAlchemy LargeBinary column
                            
                                Select everything but a list of columns from pandas dataframe
                            
                                How to turn off INFO from logs in PySpark with no changes to log4j.properties?
                            
                                python re.sub, only replace part of match [duplicate]
                            
                                Retrieving public dns of EC2 instance with BOTO3
                            
                                Sqlalchemy: subquery in FROM must have an alias
                            
                                Using getattr in Jinja2 gives me an error (jinja2.exceptions.UndefinedError: 'getattr' is undefined)
                            
                                Getting csv.Sniffer to work with quoted values
                            
                                How to access Enum types in Django templates
                            
                                Django rest auth email instead of username
                            
                                Calculate max draw down with a vectorized solution in python
                            
                                read_csv doesn't read the column names correctly on this file?
                            
                                How to extract subjects in a sentence and their respective dependent phrases?
                            
                                How to have actual values in matplotlib Pie Chart displayed
                            
                                Python __attrs__ explained
                            
                                Panda Python - dividing a column by 100 (then rounding by 2.dp)
                            
                                keras - cannot import name Conv2D
                            
                                Group duplicate column IDs in pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Use dictionary to replace a string within a string in Pandas columns

Tags:

python

dictionary

replace

pandas

dataframe

owwoow14

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us