Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merging two pandas dataframes on nearest time stamp

Tags:

python

pandas

I have two daframes df1 and df2

df1 is

time                  status
2/2/2015 8.00 am      on time
2/2/2015 9.00 am      canceled
2/2/2015 10.30 am     on time
2/2/2015 12.45 pm     on time

df2 is

 w_time                 temp
 2/2/2015 8.00 am      45
 2/2/2015 8.50 am      46
 2/2/2015 9.40 am      47
 2/2/2015 10.15 am     47
 2/2/2015 10.35 am     48
 2/2/2015 12.00 pm     48
 2/2/2015 1.00 pm      49

Now i want merge two data frames in such way that the second time stamp is always closer or equal to the first timestamp

the result should be

time              status     w_time              temp

2/2/2015 8.00 am  on time    2/2/2015 8.00 am     45

2/2/2015 9.00 am  canceled   2/2/2015 8.50 am     46

2/2/2015 10.30 am   on time    2/2/2015 10.35 am   48
2/2/2015 12.45 pm   on time    2/2/2015 1.00 pm    49
like image 929
Srirekha Srirekha Avatar asked Nov 03 '15 05:11

Srirekha Srirekha


People also ask

Can you merge Dataframes of different lengths pandas?

It can be done using the merge() method. Below are some examples that depict how to merge data frames of different lengths using the above method: Example 1: Below is a program to merge two student data frames of different lengths.

How do I merge two Dataframes based on date column?

groupby(), sum() — Using a groupby() on dates column and sum() on newCases column returns a series object with length 269. This will group all duplicate dates as one and add up their respective cases. Series to Data-Frame — Groupby() function returns a series object.


1 Answers

First ensure that the date columns are datetime64 columns.

df1['time'] = pd.to_datetime(df1['time'].str.replace(".", ":"))
df2['w_time'] = pd.to_datetime(df2['w_time'].str.replace(".", ":"))

If you set these as DatetimeIndexs can then use reindex with the 'nearest' method:

In [11]: df1 = df1.set_index("time")

In [12]: df2 = df2.set_index("w_time", drop=False)

In [13]: df1
Out[13]:
                       status
time
2015-02-02 08:00:00   on time
2015-02-02 09:00:00  canceled
2015-02-02 10:30:00   on time
2015-02-02 12:45:00   on time

In [14]: df2
Out[14]:
                     temp              w_time
w_time
2015-02-02 08:00:00    45 2015-02-02 08:00:00
2015-02-02 08:50:00    46 2015-02-02 08:50:00
2015-02-02 09:40:00    47 2015-02-02 09:40:00
2015-02-02 10:15:00    47 2015-02-02 10:15:00
2015-02-02 10:35:00    48 2015-02-02 10:35:00
2015-02-02 12:00:00    48 2015-02-02 12:00:00
2015-02-02 13:00:00    49 2015-02-02 13:00:00

With the following:

In [15]: df2.reindex(df1.index, method='nearest')
Out[15]:
                     temp              w_time
time
2015-02-02 08:00:00    45 2015-02-02 08:00:00
2015-02-02 09:00:00    46 2015-02-02 08:50:00
2015-02-02 10:30:00    48 2015-02-02 10:35:00
2015-02-02 12:45:00    49 2015-02-02 13:00:00

Then add these columns/join back to df1.

like image 178
Andy Hayden Avatar answered Sep 27 '22 22:09

Andy Hayden