Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove leading NaN in pandas

How can I remove leading NaN's in pandas?

pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

I want to remove only the first 3 NaN's from above, so the result should be:

pd.Series([1, 2, np.nan, 3])
like image 909
Meh Avatar asked Jul 17 '15 07:07

Meh


People also ask

How do I remove NaN from pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

How do I drop NaN index?

If the Index is a MultiIndex, drop the value when any or all levels are NaN. Example #1: Use Index. dropna() function to remove all missing values from the given Index containing datetime data.


3 Answers

Here is another method using pandas methods only:

In [103]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
first_valid = s[s.notnull()].index[0]
s.iloc[first_valid:]

Out[103]:
3     1
4     2
5   NaN
6     3
dtype: float64

So we filter the series using notnull to get the first valid index. Then use iloc to slice the series

EDIT

As @ajcr has pointed out it is better to use the built-in method first_valid_index as this does not return a temp series which I'm using to mask in the above answer, additionally using loc uses the index label rather than iloc which uses ordinal position which works for the general case where the index is not an int64Index:

In [104]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
s.loc[s.first_valid_index():]

Out[104]:
3     1
4     2
5   NaN
6     3
dtype: float64
like image 134
EdChum Avatar answered Oct 16 '22 19:10

EdChum


Find first non-nan index

To find the index of the first non-nan item

s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

nans = s.apply(np.isnan)

first_non_nan = nans[nans == False].index[0] # get the first one

Output

s[first_non_nan:]
Out[44]:
3     1
4     2
5   NaN
6     3
dtype: float64
like image 23
bakkal Avatar answered Oct 16 '22 17:10

bakkal


Two more approaches could be suggested here, assuming A as the input series.

Approach #1: With slicing -

A[np.where(~np.isnan(A))[0][0]:] 

Approach #2: With masking -

A[np.maximum.accumulate(~np.isnan(A))]

Sample run -

In [219]: A = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

In [220]: A
Out[220]: 
0   NaN
1   NaN
2   NaN
3     1
4     2
5   NaN
6     3
dtype: float64

In [221]: A[np.where(~np.isnan(A))[0][0]:]       # Approach 1
Out[221]: 
3     1
4     2
5   NaN
6     3
dtype: float64

In [222]: A[np.maximum.accumulate(~np.isnan(A))]  # Approach 2
Out[222]: 
3     1
4     2
5   NaN
6     3
dtype: float64
like image 1
Divakar Avatar answered Oct 16 '22 17:10

Divakar