A B C
0 2002-01-16 2002-02-28 Jack
1 2002-01-16 2002-01-30 Helen
2 2002-01-16 2002-02-28 Peter
3 2002-01-16 2002-01-30 Jud
4 2002-04-27 2002-04-30 Nick
5 2002-04-27 2002-05-25 Wendy
6 2002-04-27 2002-04-30 Bryan
7 2002-04-27 2002-05-25 Sarah
I want to select the rows, which A date are closer in time to B date, per each A group.
Output should be:
A B C
1 2002-01-16 2002-01-30 Helen
3 2002-01-16 2002-01-30 Jud
4 2002-04-27 2002-04-30 Nick
6 2002-04-27 2002-04-30 Bryan
Use:
df = df[df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min())]
print (df)
A B C
1 2002-01-16 2002-01-30 Helen
3 2002-01-16 2002-01-30 Jud
4 2002-04-27 2002-04-30 Nick
6 2002-04-27 2002-04-30 Bryan
Details:
print (df['B'].sub(df['A']))
0 43 days
1 14 days
2 43 days
3 14 days
4 3 days
5 28 days
6 3 days
7 28 days
dtype: timedelta64[ns]
print (df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min()))
0 False
1 True
2 False
3 True
4 True
5 False
6 True
7 False
dtype: bool
This is one way.
# convert columns to datetime
df[['A', 'B']] = df[['A', 'B']].apply(pd.to_datetime)
# calculate absolute difference
df['Diff'] = (df['B'] - df['A']).abs()
# filter for difference equal to mapped minimum
res = df.loc[df['Diff'] == df['A'].map(df.groupby('A')['Diff'].min())]
Result:
A B C Diff
1 2002-01-16 2002-01-30 Helen 14 days
3 2002-01-16 2002-01-30 Jud 14 days
4 2002-04-27 2002-04-30 Nick 3 days
6 2002-04-27 2002-04-30 Bryan 3 days
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With