I've got a DataFrame
which contains stock values.
It looks like this:
>>>Data Open High Low Close Volume Adj Close Date
2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04
When I try to make a conditional new column with the following if statement:
Data['Test'] =Data['Close'] if Data['Close'] > Data['Open'] else Data['Open']
I get the following error:
Traceback (most recent call last):
File "<pyshell#116>", line 1, in <module>
Data[1]['Test'] =Data[1]['Close'] if Data[1]['Close'] > Data[1]['Open'] else Data[1]['Open']
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I then used a.all()
:
Data[1]['Test'] =Data[1]['Close'] if all(Data[1]['Close'] > Data[1]['Open']) else Data[1]['Open']
The result was that the entire ['Open']
Column was selected. I didn't get the condition that I wanted, which is to select every time the biggest value between the ['Open']
and ['Close']
columns.
Any help is appreciated.
Thanks.
From a DataFrame like:
>>> df
Date Open High Low Close Volume Adj Close
0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04
1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04
2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04
The simplest thing I can think of would be:
>>> df["Test"] = df[["Open", "Close"]].max(axis=1)
>>> df
Date Open High Low Close Volume Adj Close Test
0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 77.04
1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04 77.04
2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04 93.23
df.ix[:,["Open", "Close"]].max(axis=1)
might be a little faster, but I don't think it's as nice to look at.
Alternatively, you could use .apply
on the rows:
>>> df["Test"] = df.apply(lambda row: max(row["Open"], row["Close"]), axis=1)
>>> df
Date Open High Low Close Volume Adj Close Test
0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 77.04
1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04 77.04
2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04 93.23
Or fall back to numpy:
>>> df["Test"] = np.maximum(df["Open"], df["Close"])
>>> df
Date Open High Low Close Volume Adj Close Test
0 2013-07-08 76.91 77.81 76.85 77.04 5106200 77.04 77.04
1 2013-07-00 77.04 79.81 71.81 72.87 1920834 77.04 77.04
2 2013-07-10 72.87 99.81 64.23 93.23 2934843 77.04 93.23
The basic problem is that if/else
doesn't play nicely with arrays, because if (something)
always coerces the something
into a single bool
. It's not equivalent to "for every element in the array something, if the condition holds" or anything like that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With