Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas groupby and iloc

Tags:

python

pandas

I have a dataframe contains data column, customer column and size like this:

Customer Date Size
Cust1 20/10/2021 4
Cust1 21/10/2021 5
Cust1 22/10/2021 6
Cust1 23/10/2021 6
Cust2 20/10/2021 4
Cust2 21/10/2021 5
Cust2 22/10/2021 6
Cust2 23/10/2021 6
Cust3 20/10/2021 4
Cust3 21/10/2021 5
Cust3 22/10/2021 6
Cust3 23/10/2021 6

I need to extract every nth date from a customer and delete the rest. In this example - every 2nd:

|Customer|Date|Size|
|--------|----|----|
|Cust1   |20/10/2021|4|
|Cust1   |22/10/2021|6|
|Cust2   |20/10/2021|4|
|Cust2   |22/10/2021|6|
|Cust3   |20/10/2021|4|
|Cust3   |22/10/2021|6|

Sorry for the bad format, but table formatting doesn't work for the 2nd table.

In reality it's every 10th and day, starting from the most recent one. Tried with group and iloc but isn't working:

df_10 = df.iloc[::10, :]

AttributeError: 'DataFrameGroupBy' object has no attribute 'iloc'

I don't insist to use groupby at all to be honest, but can't find working solution so far.

Thank you

like image 317
Yavor I Avatar asked Oct 19 '25 03:10

Yavor I


1 Answers

You can use:

df.loc[df.groupby('Customer').cumcount().mod(2).eq(0)]

output:

   Customer        Date  Size
0     Cust1  20/10/2021     4
2     Cust1  22/10/2021     6
4     Cust2  20/10/2021     4
6     Cust2  22/10/2021     6
8     Cust3  20/10/2021     4
10    Cust3  22/10/2021     6

explanation:

df.groupby('Customer').cumcount() creates a count per group ([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]), then mod(2) takes the modulo to give [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], and eq(0) enables to select the 0 values.

If you want to take every N rows, starting from the K's one (first being 0): .mod(N).eq(K)

like image 156
mozway Avatar answered Oct 21 '25 02:10

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!