Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding NaN Values in Pandas MultiIndex

Tags:

python

pandas

I'm trying to find the difference between two Pandas MultiIndex objects of different shapes. I've used:

df1.index.difference(df2)

and receive

TypeError: '<' not supported between instances of 'float' and 'str'

My indices are str and datetime, but I suspect there are NaNs hidden there (the floats). Hence my question:

What's the best way to find the NaNs somewhere in the MultiIndex? How does one iterate through the levels and names? Can I use something like isna()?

like image 652
Josh Friedlander Avatar asked Jan 07 '19 15:01

Josh Friedlander


People also ask

How do you check the NaN values?

The math. isnan() method checks whether a value is NaN (Not a Number), or not. This method returns True if the specified value is a NaN, otherwise it returns False.

Does Isnull check for NaN?

Detect missing values for an array-like object. This function takes a scalar or array-like object and indicates whether values are missing ( NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).


2 Answers

For MultiIndex are not implemented many functions, you can check this.

You need convert MultiIndex to DataFrame by MultiIndex.to_frame first:

#W-B sample
idx=pd.MultiIndex.from_tuples([(np.nan,1),(1,1),(1,2)])

print (idx.to_frame())
         0  1
NaN 1  NaN  1
1   1  1.0  1
    2  1.0  2

print (idx.to_frame().isnull())
           0      1
NaN 1   True  False
1   1  False  False
    2  False  False

Or use DataFrame constructor:

print (pd.DataFrame(list(idx.tolist())))
     0  1
0  NaN  1
1  1.0  1
2  1.0  2

Because:

print (pd.isnull(idx))

NotImplementedError: isna is not defined for MultiIndex

EDIT:

For check at least one True per rows use any with boolean indexing:

df = idx.to_frame()
print (df[df.isna().any(axis=1)])
        0  1
NaN 1 NaN  1

Also is possible filter MultiIndex, but is necessary add MultiIndex.remove_unused_levels:

print (idx[idx.to_frame().isna().any(axis=1)].remove_unused_levels())
MultiIndex(levels=[[], [1]],
           labels=[[-1], [0]])
like image 184
jezrael Avatar answered Oct 09 '22 17:10

jezrael


We can using reset_index , then with isna

idx=pd.MultiIndex.from_tuples([(np.nan,1),(1,1),(1,2)])
df=pd.DataFrame([1,2,3],index=idx)
df.reset_index().filter(like='level_').isna()
Out[304]: 
   level_0  level_1
0     True    False
1    False    False
2    False    False
like image 39
BENY Avatar answered Oct 09 '22 15:10

BENY