Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace values in pandas dataframe column with different replacement dict based on condition

Tags:

python

pandas

I have a dataframe where I want to replace values in a column, but the dict describing the replacement is based on values in another column. A sample dataframe would look like this:

   Map me strings        date
0       1   test1  2020-01-01
1       2   test2  2020-02-10
2       3   test3  2020-01-01
3       4   test2  2020-03-15

I have a dictionary that looks like this:

map_dict = {'2020-01-01': {1: 4, 2: 3, 3: 1, 4: 2},
            '2020-02-10': {1: 3, 2: 4, 3: 1, 4: 2},
            '2020-03-15': {1: 3, 2: 2, 3: 1, 4: 4}}

Where I want the mapping logic to be different based on the date.

In this example, the expected output would be:

   Map me strings        date
0       4   test1  2020-01-01
1       4   test2  2020-02-10
2       1   test3  2020-01-01
3       4   test2  2020-03-15

I have a massive dataframe (100M+ rows) so I really want to avoid any looping solutions if at all possible.

I have tried to think of a way to use either map or replace but have been unsuccessful

like image 822
Fredrik Nilsson Avatar asked Nov 19 '20 09:11

Fredrik Nilsson


People also ask

How do I replace column values based on conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you replace a value in a DataFrame based on a dictionary?

You can use df. replace({"Courses": dict}) to remap/replace values in pandas DataFrame with Dictionary values. It allows you the flexibility to replace the column values with regular expressions for regex substitutions.

How do I change a specific value in a Pandas DataFrame?

if we want to modify the value of the cell [0,"A"] u can use one of those solution : df. iat[0,0] = 2. df.at[0,'A'] = 2.


1 Answers

Use DataFrame.join with MultiIndex Series created by DataFrame cosntructor and DataFrame.stack:

df = df.join(pd.DataFrame(map_dict).stack().rename('new'), on=['Map me','date'])
print (df)
   Map me strings        date  new
0       1   test1  2020-01-01    4
1       2   test2  2020-02-10    4
2       3   test3  2020-01-01    1
3       4   test2  2020-03-15    4
like image 131
jezrael Avatar answered Oct 05 '22 22:10

jezrael