Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python error cannot do a non empty take from an empty axes

I have a pandas dataframe with more than 400 thousands rows and now I want to calculate the interquartile range for each row but my code produced the following errors:

cannot do a non empty take from an empty axes

My code:

def calIQR(x):
    x=x.dropna()
    return (np.percentile(x,75),np.percentile(x,25))

df["count"]=df.iloc[:,2:64].apply(calIQR,axis=1)

I am running Python 2.7.13

I searched online but still had no idea why this error occurred.

The 2 to 64 columns of dataset basically look like that: dataset

In each row, there are some NaN values, but I am sure that there is no row will all NaN.

like image 966
ELI Avatar asked Jul 17 '17 08:07

ELI


1 Answers

I think here is problem row has all NaNs values in 2 to 63 columns and x = x.dropna return empty Series.

So need add dropna after iloc:

np.random.seed(100)
df = pd.DataFrame(np.random.random((5,5)))
df.loc[3, [3,4]] = np.nan
df.loc[2] = np.nan
print (df)
         0         1         2         3         4
0  0.543405  0.278369  0.424518  0.844776  0.004719
1  0.121569  0.670749  0.825853  0.136707  0.575093
2       NaN       NaN       NaN       NaN       NaN
3  0.978624  0.811683  0.171941       NaN       NaN
4  0.431704  0.940030  0.817649  0.336112  0.175410

def calIQR(x):
    x = x.dropna()
    return (np.percentile(x,75),np.percentile(x,25))

df["count"]=df.iloc[:,2:4].dropna(how='all').apply(calIQR,axis=1)
print (df)
          0         1         2         3         4  \
0  0.543405  0.278369  0.424518  0.844776  0.004719   
1  0.121569  0.670749  0.825853  0.136707  0.575093   
2       NaN       NaN       NaN       NaN       NaN   
3  0.978624  0.811683  0.171941       NaN       NaN   
4  0.431704  0.940030  0.817649  0.336112  0.175410   

                              count  
0  (0.739711496927, 0.529582226142)  
1    (0.65356621375, 0.30899313104)  
2                               NaN  
3  (0.171941012733, 0.171941012733)  
4  (0.697265021613, 0.456496307285)  

Or use Series.quantile:

 def calIQR(x):
    return (x.quantile(.75),x.quantile(.25))

#with real data change 2;4 to 2:64
df["count"]=df.iloc[:,2:4].apply(calIQR,axis=1)
print (df)
          0         1         2         3         4  \
0  0.543405  0.278369  0.424518  0.844776  0.004719   
1  0.121569  0.670749  0.825853  0.136707  0.575093   
2       NaN       NaN       NaN       NaN       NaN   
3  0.978624  0.811683  0.171941       NaN       NaN   
4  0.431704  0.940030  0.817649  0.336112  0.175410   

                                       count  
0   (0.7397114969272109, 0.5295822261418257)  
1    (0.653566213750024, 0.3089931310399766)  
2                                 (nan, nan)  
3   (0.1719410127325942, 0.1719410127325942)  
4  (0.6972650216127702, 0.45649630728485585)  
like image 52
jezrael Avatar answered Nov 08 '22 17:11

jezrael