Merge dataframes on nearest datetime / timestamp

Tags:

python

pandas

I have two data frames as follows:

A = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/22/2014","07/02/2014","01/01/2015","01/01/1991","08/02/1999"]})

B = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"], "value": ["3","5","1","7","8"] })

Which look like the following:

>>> A
  ID       date
0  A 2014-06-22
1  A 2014-07-02
2  C 2015-01-01
3  B 1991-01-01
4  B 1999-08-02

>>> B
  ID       date value
0  A 2015-02-15     3
1  A 2014-06-30     5
2  C 1999-07-02     1
3  B 1990-10-05     7
4  B 2014-06-24     8

I want to merge A with the values of B using the nearest date. In this example, none of the dates match, but it could the the case that some do.

The output should be something like this:

>>> C
  ID        date value
0  A  06/22/2014     8
1  A  07/02/2014     5
2  C  01/01/2015     3
3  B  01/01/1991     7
4  B  08/02/1999     1

It seems to me that there should be a native function in pandas that would allow this.

Note: as similar question has been asked here pandas.merge: match the nearest time stamp >= the series of timestamps

649

asked Aug 08 '16 15:08

dleal

1 Answers

You can use reindex with method='nearest' and then merge:

A['date'] = pd.to_datetime(A.date)
B['date'] = pd.to_datetime(B.date)
A.sort_values('date', inplace=True)
B.sort_values('date', inplace=True)

B1 = B.set_index('date').reindex(A.set_index('date').index, method='nearest').reset_index()
print (B1)

print (pd.merge(A,B1, on='date'))
  ID_x       date ID_y value
0    B 1991-01-01    B     7
1    B 1999-08-02    C     1
2    A 2014-06-22    B     8
3    A 2014-07-02    A     5
4    C 2015-01-01    A     3

You can also add parameter suffixes:

print (pd.merge(A,B1, on='date', suffixes=('_', '')))
  ID_       date ID value
0   B 1991-01-01  B     7
1   B 1999-08-02  C     1
2   A 2014-06-22  B     8
3   A 2014-07-02  A     5
4   C 2015-01-01  A     3

164

answered Sep 21 '22 16:09

jezrael

Related questions
                            
                                Why python debugger always get this timeout waiting for response on 113 when using Pycharm?
                            
                                Python pandas dataframe - any way to set frequency programmatically?
                            
                                How does this function to remove duplicate characters from a string in python work?
                            
                                Merging a pandas groupby result back into DataFrame
                            
                                Open a new scratch file in PyCharm?
                            
                                Why does my Sieve of Eratosthenes work faster with integers than with booleans?
                            
                                Django Signals: using update_field as condition
                            
                                Per-class constants in Python
                            
                                How to test Pl/Python PostgreSQL procedures with Travis CI?
                            
                                convert Integers to RGB values and back with Python
                            
                                Airflow not scheduling Correctly Python
                            
                                How to limit query results with Django Rest filters
                            
                                pandas iterrows changes ints into floats
                            
                                Making a Jupyter notebook output cell fullscreen
                            
                                _pickle.UnpicklingError: could not find MARK
                            
                                Why iterator is considered functional-style in the Python documentation?
                            
                                Tips for properly using large broadcast variables?
                            
                                Custom describe or aggregate without groupby
                            
                                Why do we import scikit-learn with sklearn?
                            
                                Why limit DB Connection Pool Size in SQLAlchemy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With