Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlapping keys in dictionary when Using .replace() method on pandas dataframe

Tags:

python

pandas

I want to replace some values in a column of a dataframe using a dictionary that maps the old codes to the new codes.

di = dict( { "myVar": {11:0, 204:11} } )
mydata.replace( to_replace = di, inplace = True )

But some of the new codes and old codes overlap. When using the .replace method of the dataframe I encounter the error 'Replacement not allowed with overlapping keys and values'

My current workaround is to replace replace the offending keys manually and then apply the dictionary to the remaining non-overlapping cases.

mydata.loc[ mydata.myVar == 11, "myVar" ] = 0 
di = dict( { "myVar": {204:11} } )
mydata.replace( to_replace = di, inplace = True )

Is there a more compact way to do this?

like image 793
Nirvan Avatar asked Feb 23 '17 20:02

Nirvan


People also ask

How do I find and replace values in pandas?

Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])

Are pandas faster than dictionary?

For certain small, targeted purposes, a dict may be faster. And if that is all you need, then use a dict, for sure! But if you need/want the power and luxury of a DataFrame, then a dict is no substitute. It is meaningless to compare speed if the data structure does not first satisfy your needs.


1 Answers

I found an answer here that uses the .map method on a series in conjunction with a dictionary. Here's an example recoding dictionary with overlapping keys and values.

import pandas as pd
>>> df = pd.DataFrame( [1,2,3,4,1], columns = ['Var'] )
>>> df
   Var
0    1
1    2
2    3
3    4
4    1
>>> dict = {1:2, 2:3, 3:1, 4:3}
>>> df.Var.map( dict )
0    2
1    3
2    1
3    3
4    2
Name: Var, dtype: int64

UPDATE:

With map, every value in the original series must be mapped to a new value. If the mapping dictionary does not contain all the values of the original column, the unmapped values are mapped to NaN.

>>> df = pd.DataFrame( [1,2,3,4,1], columns = ['Var'] )
>>> dict = {1:2, 2:3, 3:1}
>>> df.Var.map( dict )
0    2.0
1    3.0
2    1.0
3    NaN
4    2.0
Name: Var, dtype: float64
like image 97
Nirvan Avatar answered Oct 04 '22 14:10

Nirvan