Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select all non-NaN columns and non-NaN last column using pandas?

Forgive me if the title a little bit confusing.

Assuming I have test.h5. Below is the result of reading this file using df.read_hdf('test.h5', 'testdata')

     0     1     2     3     4     5    6
0   123   444   111   321   NaN   NaN  NaN
1   12    234   113   67    21    32   900
3   212   112   543   321   45    NaN  NaN

I want to select the last non-Nan column. My expected result is like this

0   321
1   900
2   45

Also I want to select all column except the last non-NaN column. My expected result perhaps is like this. It might can be in numpy array but I have not foud any solution yet.

      0     1     2     3     4     5    6
0    123   444   111   
1    12    234   113   67    21    32  
3    212   112   543   321  

I searched online and found df.iloc[:, :-1] for reading all column but the last one and df.iloc[:, -1] for reading the last column.

My current result using these 2 command is like this: 1. for reading all column except the last one

       0     1     2     3     4     5    
0     123   444   111   321   NaN   NaN  
1     12    234   113   67    21    32   
3     212   112   543   321   45    NaN  

2.for reading the last column

0   NaN
1   900
2   Nan

My question is, is there any command or query used in pandas to address these condition?

Thank you for any help and suggestion.

like image 278
Fang Avatar asked Oct 02 '17 04:10

Fang


People also ask

How do I select a column without NaN?

It is simple use pd. isnull with sum i.e df. loc[:,pd. isnull(df).

How do you exclude NaN values from DataFrame?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .


1 Answers

You can use sorted to satisfy your condition i.e

ndf = df.apply(lambda x : sorted(x,key=pd.notnull),1)

This will give

     0      1      2      3      4      5      6
0   NaN    NaN    NaN  123.0  444.0  111.0  321.0
1  12.0  234.0  113.0   67.0   21.0   32.0  900.0
3   NaN    NaN  212.0  112.0  543.0  321.0   45.0

Now you can select the last column i.e

ndf.iloc[:,-1]
0    321.0
1    900.0
3     45.0
Name: 6, dtype: float64
ndf.iloc[:,:-1].apply(lambda x : sorted(x,key=pd.isnull),1)
      0      1      2      3     4     5
0  123.0  444.0  111.0    NaN   NaN   NaN
1   12.0  234.0  113.0   67.0  21.0  32.0
3  212.0  112.0  543.0  321.0   NaN   NaN
like image 66
Bharath Avatar answered Nov 14 '22 23:11

Bharath