How can I remove leading NaN's in pandas?
pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
I want to remove only the first 3 NaN's from above, so the result should be:
pd.Series([1, 2, np.nan, 3])
By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .
If the Index is a MultiIndex, drop the value when any or all levels are NaN. Example #1: Use Index. dropna() function to remove all missing values from the given Index containing datetime data.
Here is another method using pandas methods only:
In [103]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
first_valid = s[s.notnull()].index[0]
s.iloc[first_valid:]
Out[103]:
3 1
4 2
5 NaN
6 3
dtype: float64
So we filter the series using notnull
to get the first valid index. Then use iloc
to slice the series
EDIT
As @ajcr has pointed out it is better to use the built-in method first_valid_index
as this does not return a temp series which I'm using to mask in the above answer, additionally using loc
uses the index label rather than iloc
which uses ordinal position which works for the general case where the index is not an int64Index:
In [104]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
s.loc[s.first_valid_index():]
Out[104]:
3 1
4 2
5 NaN
6 3
dtype: float64
Find first non-nan index
To find the index of the first non-nan item
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
nans = s.apply(np.isnan)
first_non_nan = nans[nans == False].index[0] # get the first one
Output
s[first_non_nan:]
Out[44]:
3 1
4 2
5 NaN
6 3
dtype: float64
Two more approaches could be suggested here, assuming A
as the input series.
Approach #1: With slicing -
A[np.where(~np.isnan(A))[0][0]:]
Approach #2: With masking -
A[np.maximum.accumulate(~np.isnan(A))]
Sample run -
In [219]: A = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
In [220]: A
Out[220]:
0 NaN
1 NaN
2 NaN
3 1
4 2
5 NaN
6 3
dtype: float64
In [221]: A[np.where(~np.isnan(A))[0][0]:] # Approach 1
Out[221]:
3 1
4 2
5 NaN
6 3
dtype: float64
In [222]: A[np.maximum.accumulate(~np.isnan(A))] # Approach 2
Out[222]:
3 1
4 2
5 NaN
6 3
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With