Select rows by minimum absolute difference between dates, by group

Question

            A           B      C
0  2002-01-16  2002-02-28   Jack
1  2002-01-16  2002-01-30  Helen
2  2002-01-16  2002-02-28  Peter
3  2002-01-16  2002-01-30    Jud
4  2002-04-27  2002-04-30   Nick
5  2002-04-27  2002-05-25  Wendy
6  2002-04-27  2002-04-30  Bryan
7  2002-04-27  2002-05-25  Sarah

I want to select the rows, which A date are closer in time to B date, per each A group.

Output should be:

            A           B      C
1  2002-01-16  2002-01-30  Helen
3  2002-01-16  2002-01-30    Jud
4  2002-04-27  2002-04-30   Nick
6  2002-04-27  2002-04-30  Bryan

jezrael · Accepted Answer

Use:

df = df[df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min())]
print (df)
           A          B      C
1 2002-01-16 2002-01-30  Helen
3 2002-01-16 2002-01-30    Jud
4 2002-04-27 2002-04-30   Nick
6 2002-04-27 2002-04-30  Bryan

Details:

print (df['B'].sub(df['A']))

0   43 days
1   14 days
2   43 days
3   14 days
4    3 days
5   28 days
6    3 days
7   28 days
dtype: timedelta64[ns]

print (df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min()))
0    False
1     True
2    False
3     True
4     True
5    False
6     True
7    False
dtype: bool

jpp · Answer

This is one way.

# convert columns to datetime
df[['A', 'B']] = df[['A', 'B']].apply(pd.to_datetime)

# calculate absolute difference
df['Diff'] = (df['B'] - df['A']).abs()

# filter for difference equal to mapped minimum
res = df.loc[df['Diff'] == df['A'].map(df.groupby('A')['Diff'].min())]

Result:

           A          B      C    Diff
1 2002-01-16 2002-01-30  Helen 14 days
3 2002-01-16 2002-01-30    Jud 14 days
4 2002-04-27 2002-04-30   Nick  3 days
6 2002-04-27 2002-04-30  Bryan  3 days

Select rows by minimum absolute difference between dates, by group

Tags:

python

datetime

pandas

dataframe

Tie_24

2 Answers

jezrael

jpp

Recent Activity

Donate For Us

Select rows by minimum absolute difference between dates, by group

Tags:

python

datetime

pandas

dataframe

Tie_24

2 Answers

jezrael

jpp

Related questions

Recent Activity

Donate For Us