Merge on single level of MultiIndex

Tags:

pandas

Is there any way to merge on a single level of a MultiIndex without resetting the index?

I have a "static" table of time-invariant values, indexed by an ObjectID, and I have a "dynamic" table of time-varying fields, indexed by ObjectID+Date. I'd like to join these tables together.

Right now, the best I can think of is:

dynamic.reset_index().merge(static, left_on=['ObjectID'], right_index=True)

However, the dynamic table is very big, and I don't want to have to muck around with its index in order to combine the values.

375

asked May 20 '13 13:05

2 Answers

Yes, since pandas 0.14.0, it is now possible to merge a singly-indexed DataFrame with a level of a multi-indexed DataFrame using .join.

df1.join(df2, how='inner') # how='outer' keeps all records from both data frames

The 0.14 pandas docs describes this as equivalent but more memory efficient and faster than:

merge(df1.reset_index(),       df2.reset_index(),       on=['index1'],       how='inner'      ).set_index(['index1','index2'])

The docs also mention that .join can not be used to merge two multiindexed DataFrames on a single level and from the GitHub tracker discussion for the previous issue, it seems like this might not of priority to implement:

so I merged in the single join, see #6363; along with some docs on how to do a multi-multi join. That's fairly complicated to actually implement. and IMHO not worth the effort as it really doesn't change the memory usage/speed that much at all.

However, there is a GitHub conversation regarding this, where there has been some recent development https://github.com/pydata/pandas/issues/6360. It is also possible achieve this by resetting the indices as mentioned earlier and described in the docs as well.

Update for pandas >= 0.24.0

It is now possible to merge multiindexed data frames with each other. As per the release notes:

index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),                                         ('K1', 'X2')],                                         names=['key', 'X'])  left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],                      'B': ['B0', 'B1', 'B2']}, index=index_left)  index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),                                         ('K2', 'Y2'), ('K2', 'Y3')],                                         names=['key', 'Y'])  right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],                       'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right)  left.join(right)

Out:

            A   B   C   D key X  Y                  K0  X0 Y0  A0  B0  C0  D0     X1 Y0  A1  B1  C0  D0 K1  X2 Y1  A2  B2  C1  D1  [3 rows x 4 columns]

answered Oct 03 '22 23:10

joelostblom

I get around this by reindexing the dataframe merging to have the full multiindex so that a left join is possible.

# Create the left data frame import pandas as pd idx = pd.MultiIndex(levels=[['a','b'],['c','d']],labels=[[0,0,1,1],[0,1,0,1]], names=['lvl1','lvl2']) df = pd.DataFrame([1,2,3,4],index=idx,columns=['data'])  #Create the factor to join to the data 'left data frame' newFactor = pd.DataFrame(['fact:'+str(x) for x in df.index.levels[0]], index=df.index.levels[0], columns=['newFactor'])

Do the join on the subindex by reindexing the newFactor dataframe to contain the index of the left data frame

df.join(newFactor.reindex(df.index,level=0))

answered Oct 04 '22 01:10

closedloop

Related questions
                            
                                How do I convert a tuple of tuples to a one-dimensional list using list comprehension? [duplicate]
                            
                                What is the pythonic way to calculate dot product?
                            
                                How do I convert a hex triplet to an RGB tuple and back?
                            
                                How can I make multiple empty lists in python?
                            
                                WebSocket server in Python: 'module' object has no attribute 'AF_INET'
                            
                                How can I format a list to print each element on a separate line in python? [duplicate]
                            
                                Django post_save preventing recursion without overriding model save()
                            
                                Python: Read data from Highcharts after setExtreme
                            
                                Pulling data from Neo4j using PySpark
                            
                                Keras difference between generator and sequence
                            
                                Managing helper death
                            
                                Pyparsing : white spaces sometimes matter... sometimes don't
                            
                                Pip (python) differences between `--install-option='--prefix'` and `--root` and `--target`
                            
                                Python 3: Catching warnings during multiprocessing
                            
                                Python: Multiple packages in one repository or one package per repository?
                            
                                Python embeddable zip
                            
                                Getting Spark, Python, and MongoDB to work together
                            
                                Installer and Updater for a python desktop application
                            
                                Unsupervised pre-training for convolutional neural network in theano
                            
                                Simple, hassle-free, zero-boilerplate serialization in Scala/Java similar to Python's Pickle?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Merge on single level of MultiIndex

Tags:

python

pandas

Johann Hibschman

People also ask

2 Answers

Update for pandas >= 0.24.0

joelostblom

closedloop

Recent Activity

Donate For Us