Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby select last row or second to last row based on value (0 or 1) in another column

Tags:

python

pandas

I have a dataframe with customers. Each customer has several observations and variables, and some are no longer a customer.

My question is the following:

I want to group by customer and select the last row if the customer is still a customer, and the second to last row if the customer is not a customer anymore. I have a column named churned that is 1 if the customer has churned (no longer a customer), and 0 if it is still a customer, for all observations of the customer.

I know I can select the last row (-1) or second to last row (-2) as such:
df = df.groupby(['CustomerID'],as_index=False).nth(-1).reset_index()

This also seem to work, although the order is different:
df = df.groupby(['CustomerID']).apply(lambda x: x.iloc[-1])

I have been trying to use a if else statement in a lambda function like this:
df = df.groupby(['CustomerID'],as_index=False).apply(lambda x: x.iloc[-2] if x['churned']==1 else x.iloc[-1]).reset_index()

But I get ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there any way I can group by CustomerID, condition on the column churned and get only the last row of the customers where churned==0, and the second to last row if churned==1?

like image 599
Preben Avatar asked Sep 11 '25 01:09

Preben


1 Answers

You need to divide them into two dataframes - customers and non-customers, and use the last() function:

df.groupby(['CustomerID'],as_index=False).last()
like image 149
gtomer Avatar answered Sep 12 '25 13:09

gtomer