Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge dataframes on nearest datetime / timestamp

Tags:

python

pandas

I have two data frames as follows:

A = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/22/2014","07/02/2014","01/01/2015","01/01/1991","08/02/1999"]})

B = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"], "value": ["3","5","1","7","8"] })

Which look like the following:

>>> A
  ID       date
0  A 2014-06-22
1  A 2014-07-02
2  C 2015-01-01
3  B 1991-01-01
4  B 1999-08-02

>>> B
  ID       date value
0  A 2015-02-15     3
1  A 2014-06-30     5
2  C 1999-07-02     1
3  B 1990-10-05     7
4  B 2014-06-24     8

I want to merge A with the values of B using the nearest date. In this example, none of the dates match, but it could the the case that some do.

The output should be something like this:

>>> C
  ID        date value
0  A  06/22/2014     8
1  A  07/02/2014     5
2  C  01/01/2015     3
3  B  01/01/1991     7
4  B  08/02/1999     1

It seems to me that there should be a native function in pandas that would allow this.

Note: as similar question has been asked here pandas.merge: match the nearest time stamp >= the series of timestamps

like image 649
dleal Avatar asked Aug 08 '16 15:08

dleal


People also ask

How do I merge two Dataframes based on date column?

groupby(), sum() — Using a groupby() on dates column and sum() on newCases column returns a series object with length 269. This will group all duplicate dates as one and add up their respective cases. Series to Data-Frame — Groupby() function returns a series object.

Is merge or join faster pandas?

The Fastest Ways As it turns out, join always tends to perform well, and merge will perform almost exactly the same given the syntax is optimal.

Can you merge multiple Dataframes at once?

We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .


1 Answers

You can use reindex with method='nearest' and then merge:

A['date'] = pd.to_datetime(A.date)
B['date'] = pd.to_datetime(B.date)
A.sort_values('date', inplace=True)
B.sort_values('date', inplace=True)

B1 = B.set_index('date').reindex(A.set_index('date').index, method='nearest').reset_index()
print (B1)

print (pd.merge(A,B1, on='date'))
  ID_x       date ID_y value
0    B 1991-01-01    B     7
1    B 1999-08-02    C     1
2    A 2014-06-22    B     8
3    A 2014-07-02    A     5
4    C 2015-01-01    A     3

You can also add parameter suffixes:

print (pd.merge(A,B1, on='date', suffixes=('_', '')))
  ID_       date ID value
0   B 1991-01-01  B     7
1   B 1999-08-02  C     1
2   A 2014-06-22  B     8
3   A 2014-07-02  A     5
4   C 2015-01-01  A     3
like image 164
jezrael Avatar answered Sep 21 '22 16:09

jezrael