Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Remove NaN only at beginning and end of dataframe

I've got a pandas DataFrame that looks like this:

       sum
1948   NaN
1949   NaN
1950     5
1951     3
1952   NaN
1953     4
1954     8
1955   NaN

and I would like to cut off the NaNs at the beginning and at the end ONLY (i.e. only the values incl. NaN from 1950 to 1954 should remain). I already tried .isnull() and dropna(), but somehow I couldn't find a proper solution. Can anyone help?

like image 875
user3017048 Avatar asked Jul 20 '15 06:07

user3017048


People also ask

How do I remove NaN values from a data frame?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

Does Dropna remove NaN?

In the pandas series constructor, the method called dropna() is used to remove missing values from a series object. And it does not update the original series object with removed NaN values instead of updating the original series object, it will return another series object with updated values.

How do I drop NaN columns in pandas?

pandas. DataFrame. dropna() is used to drop/remove columns with NaN / None values.


3 Answers

Use the built in first_valid_index and last_valid_index they are designed specifically for this and slice your df:

In [5]:  first_idx = df.first_valid_index() last_idx = df.last_valid_index() print(first_idx, last_idx) df.loc[first_idx:last_idx] 1950 1954 Out[5]:       sum 1950    5 1951    3 1952  NaN 1953    4 1954    8 
like image 55
EdChum Avatar answered Sep 23 '22 01:09

EdChum


Here is one way to do it.

import pandas as pd  # your data # ============================== df        sum 1948  NaN 1949  NaN 1950    5 1951    3 1952  NaN 1953    4 1954    8 1955  NaN  # processing # =============================== idx = df.fillna(method='ffill').dropna().index res_idx = df.loc[idx].fillna(method='bfill').dropna().index df.loc[res_idx]        sum 1950    5 1951    3 1952  NaN 1953    4 1954    8 
like image 22
Jianxun Li Avatar answered Sep 25 '22 01:09

Jianxun Li


Here is a an approach with Numpy:

import numpy as np

x    = np.logical_not(pd.isnull(df))
mask = np.logical_and(np.cumsum(x)!=0, np.cumsum(x[::-1])[::-1]!=0)

In [313]: df.loc[mask['sum'].tolist()]

Out[313]:
      sum
1950    5
1951    3
1952  NaN
1953    4
1954    8
like image 45
Colonel Beauvel Avatar answered Sep 24 '22 01:09

Colonel Beauvel