I have a data frame called df1 with a 2-level MultiIndex (levels: '_Date' and _'ItemId'). There are multiple instances of each value of '_ItemId', like this:
_SomeOtherLabel
_Date _ItemId
2014-10-05 6588921 AA
6592520 AB
6836143 BA
2014-10-11 6588921 CA
6592520 CB
6836143 DA
I have a second data frame called df2 with '_ItemId' used as a key (not the index). In this df, there is only one occurrence of each value of _ItemId:
_ItemId _Cat
0 6588921 6_1
1 6592520 6_1
2 6836143 7_1
I want to recover the values in the column '_Cat' from df2 and merge them into df1 for the appropriate values of '_ItemId'. This is almost (I think?) a standard many-to-one merge, except that the appropriate key for the left df is one of MultiIndex levels. I tried this:
df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')
but I get the error
"ValueError: len(right_on) must equal the number of levels in the index of "left"
which I suppose makes sense since my (left) index is actually made of two keys. How do I select the one index level that I need? Or is there a better approach to this merge?
Thanks
To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index (). Syntax: DataFrame.reset_index (level=None, drop=False, inplace=False, col_level=0, col_fill=”) Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True. Reverting the Multi-index using the above way i.
As we know the multi-indexes form a hierarchy of indexes, that’s why these are also known as hierarchical indexes. In this Dataframe, ‘region’ is the level (0) index or the main index and the ‘state’ is the level (1) index and ‘individuals’ is the level (2) index.
Yes, since pandas 0.14.0, it is now possible to merge a singly-indexed DataFrame with a level of a multi-indexed DataFrame using .join. The 0.14 pandas docs describes this as equivalent but more memory efficient and faster than: merge (df1.reset_index (), df2.reset_index (), on= ['index1'], how='inner' ).set_index ( ['index1','index2'])
If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels. Use the index from the right DataFrame as the join key.
I could think of 2 ways of doing this.
use set_index()
and join()
:
>>> df1.join(df2.set_index('_ItemId'))
_SomeOtherLabel _Cat
_Date _ItemId
2014-10-05 6588921 AA 6_1
6592520 AB 6_1
6836143 BA 7_1
2014-10-11 6588921 CA 6_1
6592520 CB 6_1
6836143 DA 7_1
or use reset_index()
, merge()
and then set new multiindex
I think first approach should be faster, but not sure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With