Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: IndexingError: Unalignable boolean Series provided as indexer

Tags:

python

pandas

I'm trying to run what I think is simple code to eliminate any columns with all NaNs, but can't get this to work (axis = 1 works just fine when eliminating rows):

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})

df = df[df.notnull().any(axis = 0)]

print df

Full error:

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Expected output:

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN
like image 913
elPastor Avatar asked Jul 27 '17 13:07

elPastor


3 Answers

You need loc, because filter by columns:

print (df.notnull().any(axis = 0))
a     True
b     True
c     True
d    False
dtype: bool

df = df.loc[:, df.notnull().any(axis = 0)]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

Or filter columns and then select by []:

print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')

df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

Or dropna with parameter how='all' for remove all columns filled by NaNs only:

print (df.dropna(axis=1, how='all'))
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN
like image 68
jezrael Avatar answered Nov 20 '22 08:11

jezrael


You can use dropna with axis=1 and thresh=1:

In[19]:
df.dropna(axis=1, thresh=1)

Out[19]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

This will drop any column which doesn't have at least 1 non-NaN value which will mean any column with all NaN will get dropped

The reason what you tried failed is because the boolean mask:

In[20]:
df.notnull().any(axis = 0)

Out[20]: 
a     True
b     True
c     True
d    False
dtype: bool

cannot be aligned on the index which is what is used by default, as this produces a boolean mask on the columns

like image 44
EdChum Avatar answered Nov 20 '22 08:11

EdChum


I came here because I tried to filter the 1st 2 letters like this:

filtered = df[(df.Name[0:2] != 'xx')] 

The fix was:

filtered = df[(df.Name.str[0:2] != 'xx')]
like image 1
kztd Avatar answered Nov 20 '22 08:11

kztd