Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a dictionary to replace column values on given index numbers on a pandas dataframe

Tags:

python

pandas

Consider the following dataframe

df_test = pd.DataFrame( {'a' : [1, 2, 8], 'b' : [np.nan, np.nan, 5], 'c' : [np.nan, np.nan, 4]})
df_test.index = ['one', 'two', 'three']

which gives

      a  b   c
one   1 NaN NaN
two   2 NaN NaN
three 8  5   4

I have a dictionary of row replacements for columns b and c. For example:

{ 'one': [3.1, 2.2], 'two' : [8.8, 4.4] }

where 3.1 and 8.8 replaces column b and 2.2 and 4.4 replaces column c, so that the result is

      a  b   c
one   1 3.1 2.2
two   2 8.8 4.4
three 8  5   4

I know how to make these changes with a for loop:

index_list = ['one', 'two']
value_list_b = [3.1, 8.8]
value_list_c = [2.2, 4.4]
for i in range(len(index_list)):
    df_test.ix[df_test.index == index_list[i], 'b'] = value_list_b[i]
    df_test.ix[df_test.index == index_list[i], 'c'] = value_list_c[i]

but I'm sure there's a nicer and quicker way to use the dictionary!

I guess it can be done with the DataFrame.replace method, but I couldn't figure it out.

Thanks for the help,

cd

like image 260
cd98 Avatar asked Jul 15 '13 22:07

cd98


1 Answers

You are looking for pandas.DataFrame.update. The only twist in your case is that you specify the updates as a dictionary of rows, whereas a DataFrame is usually built from a dictionary of columns. The orient keyword can handle that.

In [24]: import pandas as pd

In [25]: df_test
Out[25]: 
       a   b   c
one    1 NaN NaN
two    2 NaN NaN
three  8   5   4

In [26]: row_replacements = { 'one': [3.1, 2.2], 'two' : [8.8, 4.4] }

In [27]: df_update = pd.DataFrame.from_dict(row_replacements, orient='index')

In [28]: df_update.columns = ['b', 'c']

In [29]: df_update
Out[29]: 
        b     c
one   3.1   2.2
two   8.8   4.4

In [30]: df_test.update(df_update)

In [31]: df_test
Out[31]: 
       a    b    c
one    1  3.1  2.2
two    2  8.8  4.4
three  8  5.0  4.0

pandas.DataFrame.from_dict is a specific DataFrame constructor that gives us the orient keyword, not available if you just say DataFrame(...). For reasons I don't know, we can't pass column names ['b', 'c'] to from_dict, so I specified them in separate step.

like image 175
Dan Allan Avatar answered Sep 23 '22 03:09

Dan Allan