Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace an entry in a pandas DataFrame using a conditional statement

I'd like to change the value of an entry in a Dataframe given a condition. For instance:

d = pandas.read_csv('output.az.txt', names = varname)
d['uld'] = (d.trade - d.plg25)*(d.final - d.price25)

if d['uld'] > 0:
   d['uld'] = 1
else:
   d['uld'] = 0

I'm not understanding why the above doesn't work. Thank you for your help.

like image 669
James Eaves Avatar asked Dec 20 '22 06:12

James Eaves


1 Answers

Use np.where to set your data based on a simple boolean criteria:

In [3]:

df = pd.DataFrame({'uld':np.random.randn(10)})
df
Out[3]:
        uld
0  0.939662
1 -0.009132
2 -0.209096
3 -0.502926
4  0.587249
5  0.375806
6 -0.140995
7  0.002854
8 -0.875326
9  0.148876
In [4]:

df['uld'] = np.where(df['uld'] > 0, 1, 0)
df
Out[4]:
   uld
0    1
1    0
2    0
3    0
4    1
5    1
6    0
7    1
8    0
9    1

As for why what you did failed:

In [7]:

if df['uld'] > 0:
   df['uld'] = 1
else:
   df['uld'] = 0
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-ec7d7aaa1c28> in <module>()
----> 1 if df['uld'] > 0:
      2    df['uld'] = 1
      3 else:
      4    df['uld'] = 0

C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    696         raise ValueError("The truth value of a {0} is ambiguous. "
    697                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 698                          .format(self.__class__.__name__))
    699 
    700     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So the error is that you are trying to evaluate an array with True or False which becomes ambiguous because there are multiple values to compare hence the error. In this situation you can't really use the recommended any, all etc. as you are wanting to mask your df and only set the values where the condition is met, there is an explanation on the pandas site about this: http://pandas.pydata.org/pandas-docs/dev/gotchas.html and a related question here: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

np.where takes a boolean condition as the first param, if that is true it'll return the second param, otherwise if false it returns the third param as you want.

UPDATE

Having looked at this again you can convert the boolean Series to an int by casting using astype:

In [23]:
df['uld'] = (df['uld'] > 0).astype(int)
df

Out[23]:
   uld
0    1
1    0
2    0
3    0
4    1
5    1
6    0
7    1
8    0
9    1
like image 130
EdChum Avatar answered Dec 21 '22 23:12

EdChum