How can I use ternary operator in the lambda function within apply
function of pandas
dataframe?
First of all, this code is from R/plyr, which is exactly what I want to get:
ddply(mtcars, .(cyl), summarise, sum(ifelse(carb==4,1,0))/sum(ifelse(carb %in% c(4,1),1,0)))
in the above function, I can use ifelse
function, R's ternary operator, to compute the resultant dataframe.
However, when I want to do the same in Python/pandas with the following code
mtcars.groupby(["cyl"]).apply(lambda x: sum(1 if x["carb"] == 4 else 0) / sum(1 if x["carb"] in (4, 1) else 0))
, the following error occurs:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
So how can I compute and get the same dataframe as in R/plyr?
For your information, if I use the ternary operator without grouping the columns, such as
mtcars.apply(lambda x: sum(1 if x["carb"] == 4 else 0) / sum(1 if x["carb"] in (4, 1) else 0), axis=1)
, I can get the resultant dataframe for some reasons (but it's not what I wanted to do).
Thanks.
[Update]
Sorry, the original example is not a good one when it comes to the use of ternary operator, since it uses 1
and 0
, which can be used as a binary. So the updated R/plyr code is the following:
ddply(mtcars, .(cyl), summarise, sum(ifelse(carb==4,6,3))/sum(ifelse(carb %in% c(4,1),8,4)))
Is it feasible to use the ternary operator in this situation?
I think your code could be translated to this:
mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum((x == 4).astype(float)) / sum(x.isin((4, 1))))
Toy example:
>>> mtcars = pd.DataFrame({'cyl':[8,8,6,6,6,4], 'carb':[4,3,1,5,4,1]})
>>> mtcars
carb cyl
0 4 8
1 3 8
2 1 6
3 5 6
4 4 6
5 1 4
>>> mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum((x == 4).astype(float)) / sum(x.isin((4, 1))))
cyl
4 0.0
6 0.5
8 1.0
dtype: float64
update
In more complex case, you can use numpy.where() function:
>>> import numpy as np
>>> mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum(np.where(x == 4,6,3).astype(float)) / sum(np.where(x.isin((4,1)),8,4)))
cyl
4 0.375
6 0.600
8 0.750
dtype: float64
It looks to me like x['carb']
is a numpy array (or subclass). In this case, x['carb'] == 4
returns a boolean array. True
where the values equal 4, False
otherwise. This is a very handy feature of numpy, but it can be annoying in situations like these (because it is natural to expect the ==
operator to return a boolean result).
The trick is to call .all()
on the result:
(x['carb'] == 4).all()
That will return True
only if all the elements in (x['carb'] == 4)
are True
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With