Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining Two Different Dataframes on Timestamp

Say I have two dataframes:

df1:                          df2:
+-------------------+----+    +-------------------+-----+
|  Timestamp        |data|    |  Timestamp        |stuff|
+-------------------+----+    +-------------------+-----+
|2019/04/02 11:00:01| 111|    |2019/04/02 11:00:14|  101|
|2019/04/02 11:00:15| 222|    |2019/04/02 11:00:15|  202|
|2019/04/02 11:00:29| 333|    |2019/04/02 11:00:16|  303|
|2019/04/02 11:00:30| 444|    |2019/04/02 11:00:30|  404|
+-------------------+----+    |2019/04/02 11:00:31|  505|
                              +-------------------+-----+

Without looping through every row of df2, I am trying to join the two dataframes based on the timestamp. So for every row in df2, it will "add" data from df1 that was at that particular time. In this example, the resulting dataframe would be:

Adding df1 data to df2:
+-------------------+-----+----+
|  Timestamp        |stuff|data|
+-------------------+-----+----+
|2019/04/02 11:00:14|  101| 111|
|2019/04/02 11:00:15|  202| 222|
|2019/04/02 11:00:16|  303| 222|
|2019/04/02 11:00:30|  404| 444|
|2019/04/02 11:00:31|  505|None|
+-------------------+-----+----+

Looping through each row of df2 then comparing to each df1 is very inefficient. Is there another way?

like image 214
Andrew Drake Avatar asked Apr 18 '19 13:04

Andrew Drake


Video Answer


2 Answers

Use merge_asof:

df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])

df = pd.merge_asof(df2, df1, on='Timestamp')
print (df)
            Timestamp  stuff  data
0 2019-04-02 11:00:14    101   111
1 2019-04-02 11:00:15    202   222
2 2019-04-02 11:00:16    303   222
3 2019-04-02 11:00:30    404   444

Also is possible change order df1 with df2 and add parameter direction='forward':

df = pd.merge_asof(df1, df2, on='Timestamp', direction='forward')
print (df)
            Timestamp  data  stuff
0 2019-04-02 11:00:01   111  101.0
1 2019-04-02 11:00:15   222  202.0
2 2019-04-02 11:00:29   333  404.0
3 2019-04-02 11:00:30   444  404.0
4 2019-04-02 11:00:31   505    NaN

#default direction='backward'
df = pd.merge_asof(df1, df2, on='Timestamp')
print (df)
            Timestamp  data  stuff
0 2019-04-02 11:00:01   111    NaN
1 2019-04-02 11:00:15   222  202.0
2 2019-04-02 11:00:29   333  303.0
3 2019-04-02 11:00:30   444  404.0
4 2019-04-02 11:00:31   505  404.0
like image 177
jezrael Avatar answered Oct 19 '22 06:10

jezrael


import pandas as pd
pd.merge(df1, df2, left_on=['Timestamp'], right_on=['Timestamp'])
like image 1
Francesco Dassisis Avatar answered Oct 19 '22 07:10

Francesco Dassisis