Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get the index of next non-NaN number with series in pandas?

In pandas, I am now looping with an instance of Series, is it possible for me to know the index of the next non-NaN instantly when I meet a NaN. I don't want to skip those NaNs, because I want to do the interpolation against them.

e.g now I have a Series a with elements

5, 6, 5, NaN, NaN, NaN, 7, 8, 9, NaN, NaN, NaN, 10, 10

The indexes of them is from 0 to 13, when I iterating the Series, when would simply love to know what is the index of the next NaN, and what is the next non-NaN. So from the beginning, can I instantly know the index of the first NaN is 4? Then when I jump to a[4], I need to know the index of the next non-NaN number, which is 6 in this case.

Thank you so much.

like image 589
xxx222 Avatar asked Feb 09 '16 04:02

xxx222


People also ask

Which method returns the number of non NaN values in the series?

The count() function is used to get number of non-NA/null observations in the Series. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a smaller Series. Number of non-null values in the Series.

Can Pandas series have index?

Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.

How do you access the index of a Pandas series?

In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.


1 Answers

You could use isnull method to find in what indices you have NaN values and then for current step you could compare your index with the next:

In [48]: s.index[s.isnull()]
Out[48]: Int64Index([3, 4, 5, 9, 10, 11], dtype='int64')

You could also use first_valid_index to find first non NaN value, e.g.:

In [49]: s[4:]
Out[49]:
4    NaN
5    NaN
6      7
7      8
8      9
9    NaN
10   NaN
11   NaN
12    10
13    10
dtype: float64

In [50]: s[4:].first_valid_index()
Out[50]: 6

EDIT

If you want to an integer index you could use get_loc of the pandas indices:

b = s[4:]

In [156]: b
Out[156]:
4    NaN
5    NaN
6      7
7      8
8      9
9    NaN
10   NaN
11   NaN
12    10
13    10
dtype: float64

In [157]: b.first_valid_index()
Out[157]: 6

In [158]: b.index.get_loc(b.first_valid_index())
Out[158]: 2

EDIT2

You could use get_indexer to get all indices where you have NaNs and where you have valid values:

import string
s = pd.Series([5, 6, 5, np.nan, np.nan, np.nan, 7, 8, 9, np.nan, np.nan, np.nan, 10, 10], index = list(string.ascii_letters[:len(s.index)]))

In [216]: s
Out[216]:
a     5
b     6
c     5
d   NaN
e   NaN
f   NaN
g     7
h     8
i     9
j   NaN
k   NaN
l   NaN
m    10
n    10
dtype: float64

valid_indx = s.index.get_indexer(s.index[~s.isnull()])
nan_indx = s.index.get_indexer(s.index[s.isnull()])

In [220]: valid_indx
Out[220]: array([ 0,  1,  2,  6,  7,  8, 12, 13])

In [221]: nan_indx
Out[221]: array([ 3,  4,  5,  9, 10, 11])    

Or the simplest way will be with np.where:

In [222]: np.where(s.isnull())
Out[222]: (array([ 3,  4,  5,  9, 10, 11], dtype=int32),)

In [223]: np.where(~s.isnull())
Out[223]: (array([ 0,  1,  2,  6,  7,  8, 12, 13], dtype=int32),)
like image 128
Anton Protopopov Avatar answered Oct 03 '22 22:10

Anton Protopopov