How can I remove leading NaN's in pandas? <pre class="prettyprint"><code>pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3]) </code></pre> I want to remove only the first 3 NaN's from above, so the result should be: <pre class="prettyprint"><code>pd.Series([1, 2, np.nan, 3]) </code></pre>

Here is another method using pandas methods only: <pre class="prettyprint"><code>In [103]: s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3]) first_valid = s[s.notnull()].index[0] s.iloc[first_valid:] Out[103]: 3 1 4 2 5 NaN 6 3 dtype: float64 </code></pre> So we filter the series using <code>notnull</code> to get the first valid index. Then use <code>iloc</code> to slice the series EDIT As @ajcr has pointed out it is better to use the built-in method <code>first_valid_index</code> as this does not return a temp series which I'm using to mask in the above answer, additionally using <code>loc</code> uses the index label rather than <code>iloc</code> which uses ordinal position which works for the general case where the index is not an int64Index: <pre class="prettyprint"><code>In [104]: s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3]) s.loc[s.first_valid_index():] Out[104]: 3 1 4 2 5 NaN 6 3 dtype: float64 </code></pre>

Two more approaches could be suggested here, assuming <code>A</code> as the input series. Approach #1: With slicing - <pre class="prettyprint"><code>A[np.where(~np.isnan(A))[0][0]:] </code></pre> Approach #2: With masking - <pre class="prettyprint"><code>A[np.maximum.accumulate(~np.isnan(A))] </code></pre> Sample run - <pre class="prettyprint"><code>In [219]: A = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3]) In [220]: A Out[220]: 0 NaN 1 NaN 2 NaN 3 1 4 2 5 NaN 6 3 dtype: float64 In [221]: A[np.where(~np.isnan(A))[0][0]:] # Approach 1 Out[221]: 3 1 4 2 5 NaN 6 3 dtype: float64 In [222]: A[np.maximum.accumulate(~np.isnan(A))] # Approach 2 Out[222]: 3 1 4 2 5 NaN 6 3 dtype: float64 </code></pre>

Remove leading NaN in pandas

Tags:

python

pandas

numpy

How can I remove leading NaN's in pandas?

Click to copy

pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

I want to remove only the first 3 NaN's from above, so the result should be:

Click to copy

pd.Series([1, 2, np.nan, 3])

909

asked Jul 17 '15 07:07

Meh

3 Answers

Here is another method using pandas methods only:

Click to copy

In [103]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
first_valid = s[s.notnull()].index[0]
s.iloc[first_valid:]

Out[103]:
3     1
4     2
5   NaN
6     3
dtype: float64

So we filter the series using notnull to get the first valid index. Then use iloc to slice the series

EDIT

As @ajcr has pointed out it is better to use the built-in method first_valid_index as this does not return a temp series which I'm using to mask in the above answer, additionally using loc uses the index label rather than iloc which uses ordinal position which works for the general case where the index is not an int64Index:

Click to copy

In [104]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
s.loc[s.first_valid_index():]

Out[104]:
3     1
4     2
5   NaN
6     3
dtype: float64

134

answered Oct 16 '22 19:10

EdChum

Find first non-nan index

To find the index of the first non-nan item

Click to copy

s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

nans = s.apply(np.isnan)

first_non_nan = nans[nans == False].index[0] # get the first one

Output

Click to copy

s[first_non_nan:]
Out[44]:
3     1
4     2
5   NaN
6     3
dtype: float64

answered Oct 16 '22 17:10

bakkal

Two more approaches could be suggested here, assuming A as the input series.

Approach #1: With slicing -

Click to copy

A[np.where(~np.isnan(A))[0][0]:]

Approach #2: With masking -

Click to copy

A[np.maximum.accumulate(~np.isnan(A))]

Sample run -

Click to copy

In [219]: A = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

In [220]: A
Out[220]: 
0   NaN
1   NaN
2   NaN
3     1
4     2
5   NaN
6     3
dtype: float64

In [221]: A[np.where(~np.isnan(A))[0][0]:]       # Approach 1
Out[221]: 
3     1
4     2
5   NaN
6     3
dtype: float64

In [222]: A[np.maximum.accumulate(~np.isnan(A))]  # Approach 2
Out[222]: 
3     1
4     2
5   NaN
6     3
dtype: float64

answered Oct 16 '22 17:10

Divakar

Related questions
                            
                                Is there a vectorized way to calculate the gradient in sympy?
                            
                                Pure virtual methods in Python
                            
                                Conditional skip TestCase decorator in nosetests
                            
                                Plot topics with bokeh or matplotlib
                            
                                Selenium - get all iframes in a page (even nested ones)?
                            
                                How to make a subprocess.call timeout using python 2.7.6?
                            
                                Get the index that caused an IndexError exception
                            
                                Boolean to string with lowercase
                            
                                Include run-time dependencies in Python wheels
                            
                                Django RelatedObjectDoesNotExist error
                            
                                Why are lil_matrix and dok_matrix so slow compared to common dict of dicts?
                            
                                How to manage logging in curses
                            
                                Changing the appearance of a Scrollbar in tkinter (using ttk styles)
                            
                                Improving line-wise I/O operations in D
                            
                                Calculating the number of specific consecutive equal values in a vectorized way in pandas
                            
                                SpooledTemporaryFile: units of maximum (in-memory) size?
                            
                                Difference between using train_test_split and cross_val_score in sklearn.cross_validation
                            
                                Plotting a imshow() image in 3d in matplotlib
                            
                                Anaconda python not available from sudo
                            
                                How to get value from a theano tensor variable backed by a shared variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove leading NaN in pandas

Tags:

python

pandas

numpy

Meh

People also ask

3 Answers

EdChum

bakkal

Divakar

Recent Activity

Donate For Us