Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: merge_asof-like solutions for merging two multi-indexed DataFrames?

I have two dataframes, df1 and df2 say, which are both multi-indexed.

At the first index level, both dataframes share the same keys (i.e. df1.index.get_level_values(0) and df2.index.get_level_values(0) contain the same elements). Those keys are unordered strings, such as ['foo','bar','baz'].

At the second index level, both dataframes have timestamps which are ordered, but unequally spaced.

My question is as follows. I would like to merge df1and df2 in such a way that, for each key at level 1, the values of df2 should be inserted into df1 without changing the order of df1.

I tried using pd.merge, pd.merge_asof and pd.MultiIndex.searchsorted. From the descriptions of those methods, it seems like one of them should do the trick for me, but I cannot figure out how. Ideally, I would like to find a solution that avoids looping over the keys in index.get_level_values(0), since my dataframes can get large.

A few failed attempts for illustration:

df_merged = pd.merge(left=df1.reset_index(), right=df2.reset_index(),
                     left_on=[['some_keys', 'timestamps_df1']], right_on=[['some_keys', 'timestamps_df2']],
                     suffixes=('', '_2')
                     ) # after sorting
# FAILED

df2.index.searchsorted(df1, side='right') # after sorting
# FAILED

Any help is greatly appreciated!

like image 321
A. Debugne Avatar asked Apr 03 '18 23:04

A. Debugne


1 Answers

Base on your description , here is the solution from merge_asof

df_merged = pd.merge_asof(left=df1.reset_index(), right=df2.reset_index(),
                     left_on=['timestamps_df1'], right_on=['timestamps_df2'],by='some_keys',
                     suffixes=('', '_2')
                     )
like image 60
BENY Avatar answered Oct 23 '22 06:10

BENY