New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame

Question

I've got a DataFrame which contains stock values.

It looks like this:

>>>Data Open High Low Close Volume Adj Close Date                                                       
2013-07-08  76.91  77.81  76.85  77.04  5106200  77.04

When I try to make a conditional new column with the following if statement:

Data['Test'] =Data['Close'] if Data['Close'] > Data['Open'] else Data['Open']

I get the following error:

Traceback (most recent call last):
  File "<pyshell#116>", line 1, in <module>
    Data[1]['Test'] =Data[1]['Close'] if Data[1]['Close'] > Data[1]['Open'] else Data[1]['Open']
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I then used a.all() :

Data[1]['Test'] =Data[1]['Close'] if all(Data[1]['Close'] > Data[1]['Open']) else Data[1]['Open']

The result was that the entire ['Open'] Column was selected. I didn't get the condition that I wanted, which is to select every time the biggest value between the ['Open'] and ['Close'] columns.

Any help is appreciated.

Thanks.

DSM · Accepted Answer

From a DataFrame like:

>>> df
         Date   Open   High    Low  Close   Volume  Adj Close
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04

The simplest thing I can think of would be:

>>> df["Test"] = df[["Open", "Close"]].max(axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

df.ix[:,["Open", "Close"]].max(axis=1) might be a little faster, but I don't think it's as nice to look at.

Alternatively, you could use .apply on the rows:

>>> df["Test"] = df.apply(lambda row: max(row["Open"], row["Close"]), axis=1)
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

Or fall back to numpy:

>>> df["Test"] = np.maximum(df["Open"], df["Close"])
>>> df
         Date   Open   High    Low  Close   Volume  Adj Close   Test
0  2013-07-08  76.91  77.81  76.85  77.04  5106200      77.04  77.04
1  2013-07-00  77.04  79.81  71.81  72.87  1920834      77.04  77.04
2  2013-07-10  72.87  99.81  64.23  93.23  2934843      77.04  93.23

The basic problem is that if/else doesn't play nicely with arrays, because if (something) always coerces the something into a single bool. It's not equivalent to "for every element in the array something, if the condition holds" or anything like that.

New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame

Tags:

python

pandas

python-3.3

Uninvited Guest

1 Answers

DSM

Recent Activity

Donate For Us

New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame

Tags:

python

pandas

python-3.3

Uninvited Guest

1 Answers

DSM

Related questions

Recent Activity

Donate For Us