How can I use ternary operator in the lambda function within apply function of pandas dataframe?
First of all, this code is from R/plyr, which is exactly what I want to get:
ddply(mtcars, .(cyl), summarise, sum(ifelse(carb==4,1,0))/sum(ifelse(carb %in% c(4,1),1,0)))
in the above function, I can use ifelse function, R's ternary operator, to compute the resultant dataframe.
However, when I want to do the same in Python/pandas with the following code
mtcars.groupby(["cyl"]).apply(lambda x: sum(1 if x["carb"] == 4 else 0) / sum(1 if x["carb"] in (4, 1) else 0))
, the following error occurs:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
So how can I compute and get the same dataframe as in R/plyr?
For your information, if I use the ternary operator without grouping the columns, such as
mtcars.apply(lambda x: sum(1 if x["carb"] == 4 else 0) / sum(1 if x["carb"] in (4, 1) else 0), axis=1)
, I can get the resultant dataframe for some reasons (but it's not what I wanted to do).
Thanks.
[Update]
Sorry, the original example is not a good one when it comes to the use of ternary operator, since it uses 1 and 0, which can be used as a binary. So the updated R/plyr code is the following:
ddply(mtcars, .(cyl), summarise, sum(ifelse(carb==4,6,3))/sum(ifelse(carb %in% c(4,1),8,4)))
Is it feasible to use the ternary operator in this situation?
I think your code could be translated to this:
mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum((x == 4).astype(float)) / sum(x.isin((4, 1))))
Toy example:
>>> mtcars = pd.DataFrame({'cyl':[8,8,6,6,6,4], 'carb':[4,3,1,5,4,1]})
>>> mtcars
   carb  cyl
0     4    8
1     3    8
2     1    6
3     5    6
4     4    6
5     1    4
>>> mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum((x == 4).astype(float)) / sum(x.isin((4, 1))))
cyl
4      0.0
6      0.5
8      1.0
dtype: float64
update
In more complex case, you can use numpy.where() function:
>>> import numpy as np
>>> mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum(np.where(x == 4,6,3).astype(float)) / sum(np.where(x.isin((4,1)),8,4)))
cyl
4      0.375
6      0.600
8      0.750
dtype: float64
                        It looks to me like x['carb'] is a numpy array (or subclass).  In this case, x['carb'] == 4 returns a boolean array.  True where the values equal 4, False otherwise.  This is a very handy feature of numpy, but it can be annoying in situations like these (because it is natural to expect the == operator to return a boolean result).
The trick is to call .all() on the result:
(x['carb'] == 4).all()
That will return True only if all the elements in (x['carb'] == 4) are True.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With