Is there any way to merge on a single level of a MultiIndex without resetting the index?
I have a "static" table of time-invariant values, indexed by an ObjectID, and I have a "dynamic" table of time-varying fields, indexed by ObjectID+Date. I'd like to join these tables together.
Right now, the best I can think of is:
dynamic.reset_index().merge(static, left_on=['ObjectID'], right_index=True)
However, the dynamic table is very big, and I don't want to have to muck around with its index in order to combine the values.
To drop multiple levels from a multi-level column index, use the columns. droplevel() repeatedly. We have used the Multiindex. from_tuples() is used to create indexes column-wise.
Merging Dataframes by index of both the dataframes As both the dataframe contains similar IDs on the index. So, to merge the dataframe on indices pass the left_index & right_index arguments as True i.e. Both the dataframes are merged on index using default Inner Join.
Yes, since pandas 0.14.0, it is now possible to merge a singly-indexed DataFrame with a level of a multi-indexed DataFrame using .join
.
df1.join(df2, how='inner') # how='outer' keeps all records from both data frames
The 0.14 pandas docs describes this as equivalent but more memory efficient and faster than:
merge(df1.reset_index(), df2.reset_index(), on=['index1'], how='inner' ).set_index(['index1','index2'])
The docs also mention that .join
can not be used to merge two multiindexed DataFrames on a single level and from the GitHub tracker discussion for the previous issue, it seems like this might not of priority to implement:
so I merged in the single join, see #6363; along with some docs on how to do a multi-multi join. That's fairly complicated to actually implement. and IMHO not worth the effort as it really doesn't change the memory usage/speed that much at all.
However, there is a GitHub conversation regarding this, where there has been some recent development https://github.com/pydata/pandas/issues/6360. It is also possible achieve this by resetting the indices as mentioned earlier and described in the docs as well.
It is now possible to merge multiindexed data frames with each other. As per the release notes:
index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'), ('K1', 'X2')], names=['key', 'X']) left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']}, index=index_left) index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), ('K2', 'Y2'), ('K2', 'Y3')], names=['key', 'Y']) right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right) left.join(right)
Out:
A B C D key X Y K0 X0 Y0 A0 B0 C0 D0 X1 Y0 A1 B1 C0 D0 K1 X2 Y1 A2 B2 C1 D1 [3 rows x 4 columns]
I get around this by reindexing the dataframe merging to have the full multiindex so that a left join is possible.
# Create the left data frame import pandas as pd idx = pd.MultiIndex(levels=[['a','b'],['c','d']],labels=[[0,0,1,1],[0,1,0,1]], names=['lvl1','lvl2']) df = pd.DataFrame([1,2,3,4],index=idx,columns=['data']) #Create the factor to join to the data 'left data frame' newFactor = pd.DataFrame(['fact:'+str(x) for x in df.index.levels[0]], index=df.index.levels[0], columns=['newFactor'])
Do the join on the subindex by reindexing the newFactor dataframe to contain the index of the left data frame
df.join(newFactor.reindex(df.index,level=0))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With