Using fillna with two multi-index dataframes throws InvalidIndexError

Tags:

I have two dataframes like this:

import pandas as pd
import numpy as np


df1 = pd.DataFrame({
    'key1': list('ABAACCA'),
    'key2': list('1675987'),
    'prop1': list('xyzuynb'),
    'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
    'key1': list('ABCCADD'),
    'key2': list('1598787'),
    'prop1': [np.nan] * 7,
    'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])

          prop1 prop2
key1 key2            
A    1        x     m
B    6        y     n
A    7        z     b
     5        u     b
C    9        y     b
     8        n     a
A    7        b     s

           prop1  prop2
key1 key2              
A    1       NaN    NaN
B    5       NaN    NaN
C    9       NaN    NaN
     8       NaN    NaN
A    7       NaN    NaN
D    8       NaN    NaN
     7       NaN    NaN

and would now like to use df1 to fill df2 using

Click to copy

df2.fillna(df1)

however,I get

site-packages/pandas/core/generic.py in _where(self, cond, other, inplace, axis, level, errors, try_cast) 8694
other._get_axis(i).equals(ax) for i, ax in enumerate(self.axes)
8695 ): -> 8696 raise InvalidIndexError 8697 8698 # slice me out of the other

InvalidIndexError:

I used this approach successfully in the past and I do not really understand why that one fails. Any ideas how to make it work?

EDIT

Here is an example which is very similar and works perfectly fine:

Click to copy

filler1 = pd.DataFrame({
    'key': list('AAABCCDD'),
    'prop1': list('xyzuyasj'),
    'prop2': list('mnbbbqwo')
})

tobefilled1 = pd.DataFrame({
    'key': list('AAABBCACDF'),
    'keep_me': ['stuff'] * 10,
    'prop1': [np.nan] * 10,
    'prop2': [np.nan] * 10,
    
})

filler1['g'] = filler1.groupby('key').cumcount()
tobefilled1['g'] = tobefilled1.groupby('key').cumcount()

filler1 = filler1.set_index(['key', 'g'])
tobefilled1 = tobefilled1.set_index(['key', 'g'])

print(tobefilled1.fillna(filler1))

prints

key g                    
A   0   stuff     x     m
    1   stuff     y     n
    2   stuff     z     b
B   0   stuff     u     b
    1   stuff   NaN   NaN
C   0   stuff     y     b
A   3   stuff   NaN   NaN
C   1   stuff     a     q
D   0   stuff     s     w
F   0   stuff   NaN   NaN

745

asked Jul 08 '20 07:07

Cleb

2 Answers

The problem here is the duplicate index defined in df1:

Click to copy

df1 = pd.DataFrame({
    'key1': list('ABAACCA'),
    'key2': list('1675987'),
    'prop1': list('xyzuynb'),
    'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

Note: Key1=A Key2=7 appears twice, the index for df1 is not unique.

Let's change that second A7 to A9

Click to copy

df1 = pd.DataFrame({
    'key1': list('ABAACCA'),
    'key2': list('1675989'),
    'prop1': list('xyzuynb'),
    'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
    'key1': list('ABCCADD'),
    'key2': list('1598787'),
    'prop1': [np.nan] * 7,
    'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])

Thus creating unique indexing in df1, now try df.fillna:

Click to copy

df2.fillna(df1)

Output:

Click to copy

          prop1 prop2
key1 key2            
A    1        x     m
B    5      NaN   NaN
C    9        y     b
     8        n     a
A    7        z     b
D    8      NaN   NaN
     7      NaN   NaN

I got hint of this when I tried the reindex_like method, first with unique indexing:

Click to copy

df1 = pd.DataFrame({
    'key1': list('ABAACCA'),
    'key2': list('1675989'),
    'prop1': list('xyzuynb'),
    'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
    'key1': list('ABCCADD'),
    'key2': list('1598787'),
    'prop1': [np.nan] * 7,
    'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])
print(df1.reindex_like(df2))

Output:

Click to copy

          prop1 prop2
key1 key2            
A    1        x     m
B    5      NaN   NaN
C    9        y     b
     8        n     a
A    7        z     b
D    8      NaN   NaN
     7      NaN   NaN

Now, let's revert to the original dataframes in the post:

Click to copy

df1 = pd.DataFrame({
    'key1': list('ABAACCA'),
    'key2': list('1675987'),
    'prop1': list('xyzuynb'),
    'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
    'key1': list('ABCCADD'),
    'key2': list('1598787'),
    'prop1': [np.nan] * 7,
    'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])
print(df1.reindex_like(df2))

Output ValueError:

Click to copy

ValueError: cannot handle a non-unique multi-index!

Another work-around it to create unique indexing by adding another index level with cumcount.

Click to copy

df1 = pd.DataFrame({
    'key1': list('ABAACCA'),
    'key2': list('1675987'),
    'prop1': list('xyzuynb'),
    'prop2': list('mnbbbas')
}).set_index(['key1', 'key2'])

df2 = pd.DataFrame({
    'key1': list('ABCCADD'),
    'key2': list('1598787'),
    'prop1': [np.nan] * 7,
    'prop2': [np.nan] * 7
}).set_index(['key1', 'key2'])

df1 = df1.set_index(df1.groupby(df1.index).cumcount(), append=True)
df2 = df2.set_index(df2.groupby(df2.index).cumcount(), append=True)

df2.fillna(df1)

Output:

Click to copy

            prop1 prop2
key1 key2              
A    1    0     x     m
B    5    0   NaN   NaN
C    9    0     y     b
     8    0     n     a
A    7    0     z     b
D    8    0   NaN   NaN
     7    0   NaN   NaN

Then you can drop index level 2:

Click to copy

df2.fillna(df1).reset_index(level=2, drop=True)

Output:

Click to copy

          prop1 prop2
key1 key2            
A    1        x     m
B    5      NaN   NaN
C    9        y     b
     8        n     a
A    7        z     b
D    8      NaN   NaN
     7      NaN   NaN

However, I think pandas should have nicer error messaging for fillna non-unique MultiIndexes like it does for reindex_like.

125

answered Sep 20 '22 01:09

Scott Boston

Here is problem some index values not match, for me working alternative solution with DataFrame.combine_first:

Click to copy

df = df2.combine_first(df1)
print (df)
          prop1 prop2
key1 key2            
A    1        x     m
     5        u     b
     7        z     b
     7        b     s
B    5      NaN   NaN
     6        y     n
C    8        n     a
     9        y     b
D    7      NaN   NaN
     8      NaN   NaN

answered Sep 21 '22 01:09

jezrael

Related questions
                            
                                Pandas apply, rolling, groupby with multiple input & multiple output columns
                            
                                How to define ylabel position relative to axis with matplotlib?
                            
                                Writing delta lake to AWS S3 (Without Databricks)
                            
                                How to set the Jinja environment variable in Flask?
                            
                                Topic modeling on short texts Python
                            
                                python multiprocessing : AttributeError: Can't pickle local object
                            
                                How to pass variable to JSON, for python?
                            
                                message.content.startswith Discord.Py
                            
                                Prunning model doesn't improve inference speed or reduce model size
                            
                                Get local time zone name on Windows (Python 3.9 zoneinfo)
                            
                                while loop requires a specific order to work?
                            
                                Correlation coefficient explanation--Feature Selection
                            
                                Download dependencies declared in pyproject.toml using Pip
                            
                                .flaskenv or .env file not being read
                            
                                Python Asyncio errors: "OSError: [WinError 6] The handle is invalid" and "RuntimeError: Event loop is closed" [duplicate]
                            
                                Tensorflow error in Colab - ValueError: Shapes (None, 1) and (None, 10) are incompatible
                            
                                How to specify return value of mocked function with pytest-mock?
                            
                                Why does pandas use "NaN" from numpy, instead of its own null value?
                            
                                The Run button in VS Code don't show up [Python]
                            
                                How to download a file from Google Cloud Platform storage

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using fillna with two multi-index dataframes throws InvalidIndexError

Tags:

python

pandas

dataframe

multi-index

fillna

Cleb

People also ask

2 Answers

Scott Boston

jezrael

Recent Activity

Donate For Us