Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use ternary operator in apply function in pandas dataframe, without grouping columns

Tags:

python

pandas

How can I use ternary operator in the lambda function within apply function of pandas dataframe?

First of all, this code is from R/plyr, which is exactly what I want to get:

ddply(mtcars, .(cyl), summarise, sum(ifelse(carb==4,1,0))/sum(ifelse(carb %in% c(4,1),1,0)))

in the above function, I can use ifelse function, R's ternary operator, to compute the resultant dataframe.

However, when I want to do the same in Python/pandas with the following code

mtcars.groupby(["cyl"]).apply(lambda x: sum(1 if x["carb"] == 4 else 0) / sum(1 if x["carb"] in (4, 1) else 0))

, the following error occurs:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So how can I compute and get the same dataframe as in R/plyr?

For your information, if I use the ternary operator without grouping the columns, such as

mtcars.apply(lambda x: sum(1 if x["carb"] == 4 else 0) / sum(1 if x["carb"] in (4, 1) else 0), axis=1)

, I can get the resultant dataframe for some reasons (but it's not what I wanted to do).

Thanks.

[Update]

Sorry, the original example is not a good one when it comes to the use of ternary operator, since it uses 1 and 0, which can be used as a binary. So the updated R/plyr code is the following:

ddply(mtcars, .(cyl), summarise, sum(ifelse(carb==4,6,3))/sum(ifelse(carb %in% c(4,1),8,4)))

Is it feasible to use the ternary operator in this situation?

like image 723
Blaszard Avatar asked Nov 15 '13 04:11

Blaszard


2 Answers

I think your code could be translated to this:

mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum((x == 4).astype(float)) / sum(x.isin((4, 1))))

Toy example:

>>> mtcars = pd.DataFrame({'cyl':[8,8,6,6,6,4], 'carb':[4,3,1,5,4,1]})
>>> mtcars
   carb  cyl
0     4    8
1     3    8
2     1    6
3     5    6
4     4    6
5     1    4
>>> mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum((x == 4).astype(float)) / sum(x.isin((4, 1))))
cyl
4      0.0
6      0.5
8      1.0
dtype: float64

update

In more complex case, you can use numpy.where() function:

>>> import numpy as np
>>> mtcars.groupby(["cyl"])['carb'].apply(lambda x: sum(np.where(x == 4,6,3).astype(float)) / sum(np.where(x.isin((4,1)),8,4)))
cyl
4      0.375
6      0.600
8      0.750
dtype: float64
like image 64
Roman Pekar Avatar answered Oct 27 '22 11:10

Roman Pekar


It looks to me like x['carb'] is a numpy array (or subclass). In this case, x['carb'] == 4 returns a boolean array. True where the values equal 4, False otherwise. This is a very handy feature of numpy, but it can be annoying in situations like these (because it is natural to expect the == operator to return a boolean result).

The trick is to call .all() on the result:

(x['carb'] == 4).all()

That will return True only if all the elements in (x['carb'] == 4) are True.

like image 27
mgilson Avatar answered Oct 27 '22 10:10

mgilson