Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop all rows that have all NA values after last row that is not NA

          0     1     2     3        4  
0        2.0  None  None  None  21041.0  
1        1.0  None  None  None   3003.0  
2        2.0  None  None  None   1210.0  
3        NaN  None  None  None      NaN  
4        2    None  None  None      NaN 
5        NaN  None  None  None      NaN
6        NaN  None  None  None      NaN  

So I would drop 5 and 6 but keep 3, even though all values are NaN.

I know of:

df.dropna(axis = 0, how = 'all', inplace = True)

This would delete 3 as well. I guess I need to combine with some other operation.

like image 974
Borut Flis Avatar asked Jun 26 '21 08:06

Borut Flis


4 Answers

You can get the index of the last row that have at least one value not NaN and just slice the dataset until that point:

df=df.replace('None', np.nan)
ids = df[df.notnull().any(axis=1)].index
last_id = ids[-1]

res = df.loc[:last_id, :]

print(res)

Output:

     0   1   2   3        4
0  2.0 NaN NaN NaN  21041.0
1  1.0 NaN NaN NaN   3003.0
2  2.0 NaN NaN NaN   1210.0
3  NaN NaN NaN NaN      NaN
4  2.0 NaN NaN NaN      NaN
like image 82
IoaTzimas Avatar answered Nov 15 '22 02:11

IoaTzimas


Use df.last_valid_index() + df.loc:

Optional step in case your None shown here is actually text 'None':

df = df.replace(['None'], [None])   # replace text 'None' with None

Main codes:

df.loc[:df.last_valid_index()]

Result:

     0     1     2     3        4
0  2.0  None  None  None  21041.0
1  1.0  None  None  None   3003.0
2  2.0  None  None  None   1210.0
3  NaN  None  None  None      NaN
4  2.0  None  None  None      NaN
like image 42
SeaBean Avatar answered Nov 15 '22 04:11

SeaBean


I am not sure which column is needed not NaN.so I combine column 0 and column 4

df['combine'] = df['0'].notna() | df['0'].notna()
    0   4       combine
0   2.0 21041.0 True
1   1.0 3003.0  True
2   2.0 1210.0  True
3   NaN NaN     False
4   2.0 NaN     True
5   NaN NaN     False
6   NaN NaN     False

then I get the last True index

df_temp = df[df['combine'] == True]
last_true = df_temp.iloc[-1].name
df.iloc[:last_true+1]

result

    0   4       combine
0   2.0 21041.0 True
1   1.0 3003.0  True
2   2.0 1210.0  True
3   NaN NaN     False
4   2.0 NaN     True
like image 41
nay Avatar answered Nov 15 '22 02:11

nay


Very simple. Just retrieve the index of the last index that does not contain only NaN values. Then use iloc to slice the DataFrame up to that index. You will have to add one to the found index, since you want to include this row that does not only contain NaN values.

df.iloc[:df.dropna(axis=0, how='all').index[-1]+1]
>>
     0   1       2       3       4
0   2.0 None    None    None    21041.0
1   1.0 None    None    None    3003.0
2   2.0 None    None    None    1210.0
3   NaN None    None    None    NaN
4   2.0 None    None    None    NaN
like image 23
chatax Avatar answered Nov 15 '22 04:11

chatax