Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge 3 dataframes with different timesteps (10min and 15min and 30min) using pandas

The goal is to merge three different dataframes having different timesteps (10min, 15min and 30min. The code must recognize what timestep to consider firstly and identify the next available next timestep. in This example 2019/04/02 10:40:00 does not exist in the dataframes dataset. Therefore the next timestep to consider after 2019/04/02 10:30:00 would be 2019/04/02 10:45:00.

df1:

Timestamp data1
2019/04/02 10:00:00 1
2019/04/02 10:10:00 1
2019/04/02 10:20:00 1
2019/04/02 10:30:00 1

df2:

Timestamp data2
2019/04/02 10:00:00 2
2019/04/02 10:15:00 22
2019/04/02 10:30:00 222
2019/04/02 10:45:00 2222
2019/04/02 11:00:00 22222

df3:

Timestamp data3
2019/04/02 10:00:00 3
2019/04/02 10:30:00 33
2019/04/02 11:00:00 333
2019/04/02 11:30:00 3333

desired result:

Timestamp data1 data2 data3
2019/04/02 10:00:00 1 2 3
2019/04/02 10:10:00 1 NaN NaN
2019/04/02 10:15:00 NaN 22 NaN
2019/04/02 10:20:00 1 NaN NaN
2019/04/02 10:30:00 1 222 33
2019/04/02 10:45:00 NaN 2222 NaN
2019/04/02 11:00:00 NaN 22222 333
2019/04/02 11:30:00 NaN NaN 3333

I used the python concat function and the merge function but did not deliver he desired result.

like image 619
JASS Avatar asked Oct 17 '25 20:10

JASS


1 Answers

# Convert Timestamp columns to datetime
df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
df3['Timestamp'] = pd.to_datetime(df3['Timestamp'])

# Sort the DataFrames based on Timestamp
df1 = df1.sort_values('Timestamp')
df2 = df2.sort_values('Timestamp')
df3 = df3.sort_values('Timestamp')

# Merge using merge_asof
result = pd.merge_asof(df1, df2, on='Timestamp', direction='nearest')
result = pd.merge_asof(result, df3, on='Timestamp', direction='nearest')

EDIT: IF YOU WANT TO KEEP NAN VALUES

# Convert Timestamp columns to datetime
df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
df3['Timestamp'] = pd.to_datetime(df3['Timestamp'])

# Merge using merge with how='outer'
result = pd.merge(df1, df2, on='Timestamp', how='outer')
result = pd.merge(result, df3, on='Timestamp', how='outer')

# Sort the result based on Timestamp
result = result.sort_values('Timestamp')
like image 79
Cem Koçak Avatar answered Oct 19 '25 12:10

Cem Koçak



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!