I have a dataframe with an id
column and a quantity
column, which can be 0 or 1.
import pandas as pd
df = pd.DataFrame([
{'id': 'thing 1', 'date': '2016-01-01', 'quantity': 0 },
{'id': 'thing 1', 'date': '2016-02-01', 'quantity': 0 },
{'id': 'thing 1', 'date': '2016-09-01', 'quantity': 1 },
{'id': 'thing 1', 'date': '2016-10-01', 'quantity': 1 },
{'id': 'thing 2', 'date': '2017-01-01', 'quantity': 1 },
{'id': 'thing 2', 'date': '2017-02-01', 'quantity': 1 },
{'id': 'thing 2', 'date': '2017-02-11', 'quantity': 1 },
{'id': 'thing 3', 'date': '2017-09-01', 'quantity': 0 },
{'id': 'thing 3', 'date': '2017-10-01', 'quantity': 0 },
])
df.date = pd.to_datetime(df.date, format="%Y-%m-%d")
df
If for a certain id
I have both 0 and 1 values, I want to return only the 1s. If I have only 1s, I want to return all of them. If I have only 0s, I want to return all of them.
The way I do it is to apply a function to each group and then reset the index:
def drop_that(dff):
q = len(dff[dff['quantity']==1])
if q >0:
return dff[dff['quantity']==1]
else:
return dff
dfg = df.groupby('id', as_index=False).apply(drop_that)
dfg.reset_index(drop=True)
However, I implemented this just by brute-force googling and I really do not know if this is a good Pandas practice or if there are alternative methods that would be more performant.
Any advice would really be appreciated.
You can try:
# find the number of unique quantity for each thing
s = df.groupby('id')['quantity'].transform('nunique')
df[s.eq(1) # things with only 1 quantity value (either 0 or 1)
| df['quantity'].eq(1) # or quantity==1 when there are 2 values
]
Output:
id date quantity
2 thing 1 2016-09-01 1
3 thing 1 2016-10-01 1
4 thing 2 2017-01-01 1
5 thing 2 2017-02-01 1
6 thing 2 2017-02-11 1
7 thing 3 2017-09-01 0
8 thing 3 2017-10-01 0
Based on your logic, try transform
with max
, if max eq to original value we should keep,
#logic : only have 0 or 1 max will be 0 or 1 ,
# if both have 0 and 1, max should be 1 we should keep all value eq to 1
out = df[df.quantity.eq(df.groupby('id')['quantity'].transform('max'))]
Out[89]:
id date quantity
2 thing 1 2016-09-01 1
3 thing 1 2016-10-01 1
4 thing 2 2017-01-01 1
5 thing 2 2017-02-01 1
6 thing 2 2017-02-11 1
7 thing 3 2017-09-01 0
8 thing 3 2017-10-01 0
Another solution, that might be closer to natural language:
(
df
.groupby("id")
.apply(lambda x: x if x.quantity.unique().size == 1
else x.query("quantity == 1"))
.reset_index(drop=True)
)
Output:
# id date quantity
# 0 thing 1 2016-09-01 1
# 1 thing 1 2016-10-01 1
# 2 thing 2 2017-01-01 1
# 3 thing 2 2017-02-01 1
# 4 thing 2 2017-02-11 1
# 5 thing 3 2017-09-01 0
# 6 thing 3 2017-10-01 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With