I have a dataframe contains data column, customer column and size like this:
Customer | Date | Size |
---|---|---|
Cust1 | 20/10/2021 | 4 |
Cust1 | 21/10/2021 | 5 |
Cust1 | 22/10/2021 | 6 |
Cust1 | 23/10/2021 | 6 |
Cust2 | 20/10/2021 | 4 |
Cust2 | 21/10/2021 | 5 |
Cust2 | 22/10/2021 | 6 |
Cust2 | 23/10/2021 | 6 |
Cust3 | 20/10/2021 | 4 |
Cust3 | 21/10/2021 | 5 |
Cust3 | 22/10/2021 | 6 |
Cust3 | 23/10/2021 | 6 |
I need to extract every nth date from a customer and delete the rest. In this example - every 2nd:
|Customer|Date|Size|
|--------|----|----|
|Cust1 |20/10/2021|4|
|Cust1 |22/10/2021|6|
|Cust2 |20/10/2021|4|
|Cust2 |22/10/2021|6|
|Cust3 |20/10/2021|4|
|Cust3 |22/10/2021|6|
Sorry for the bad format, but table formatting doesn't work for the 2nd table.
In reality it's every 10th and day, starting from the most recent one. Tried with group and iloc but isn't working:
df_10 = df.iloc[::10, :]
AttributeError: 'DataFrameGroupBy' object has no attribute 'iloc'
I don't insist to use groupby at all to be honest, but can't find working solution so far.
Thank you
You can use:
df.loc[df.groupby('Customer').cumcount().mod(2).eq(0)]
output:
Customer Date Size
0 Cust1 20/10/2021 4
2 Cust1 22/10/2021 6
4 Cust2 20/10/2021 4
6 Cust2 22/10/2021 6
8 Cust3 20/10/2021 4
10 Cust3 22/10/2021 6
explanation:
df.groupby('Customer').cumcount()
creates a count per group ([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
), then mod(2)
takes the modulo to give [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
, and eq(0)
enables to select the 0 values.
If you want to take every N rows, starting from the K's one (first being 0): .mod(N).eq(K)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With