Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

updating columns based on conditions in pandas 0.16

Tags:

python

pandas

I am trying to update a column based on condition of another column

df=pd.DataFrame(np.random.randn(6,4),columns=list('abcd'))
df[df.b>0].d=1

why doesnt this work? without the condition it works.

like image 452
desmond Avatar asked Mar 15 '23 23:03

desmond


1 Answers

When I do this with pandas v0.16.1, I get a warning telling me what's happening:

df=pd.DataFrame(np.random.randn(6,4),columns=list('abcd'))
df[df.b>0].d=1
/home/me/.local/lib/python2.7/site-packages/pandas/core/generic.py:1974: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

df[df.b > 0] creates a copy of those rows of the dataframe that is no longer linked to the original dataframe. Following the suggestions in the warning, if I do:

df.loc[df.b > 0, 'd'] = 1

I get the desired results:

df
Out[10]: 
          a         b         c         d
0 -0.127010  0.252527 -0.857680  1.000000
1  0.348888  0.780728 -0.710778  1.000000
2  0.840746 -0.456552  0.414482 -1.326191
3  0.864530  0.365728 -0.540530  1.000000
4  1.954639 -0.919998 -0.446927  1.949182
5 -0.928344 -0.145271  0.089434 -0.569934
like image 122
Marius Avatar answered Mar 24 '23 09:03

Marius