In pandas, I am now looping with an instance of Series, is it possible for me to know the index of the next non-NaN instantly when I meet a NaN. I don't want to skip those NaNs, because I want to do the interpolation against them.
e.g now I have a Series a
with elements
5, 6, 5, NaN, NaN, NaN, 7, 8, 9, NaN, NaN, NaN, 10, 10
The indexes of them is from 0 to 13, when I iterating the Series, when would simply love to know what is the index of the next NaN, and what is the next non-NaN. So from the beginning, can I instantly know the index of the first NaN is 4? Then when I jump to a[4], I need to know the index of the next non-NaN number, which is 6 in this case.
Thank you so much.
The count() function is used to get number of non-NA/null observations in the Series. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a smaller Series. Number of non-null values in the Series.
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.
In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.
You could use isnull
method to find in what indices you have NaN
values and then for current step you could compare your index with the next:
In [48]: s.index[s.isnull()]
Out[48]: Int64Index([3, 4, 5, 9, 10, 11], dtype='int64')
You could also use first_valid_index
to find first non NaN
value, e.g.:
In [49]: s[4:]
Out[49]:
4 NaN
5 NaN
6 7
7 8
8 9
9 NaN
10 NaN
11 NaN
12 10
13 10
dtype: float64
In [50]: s[4:].first_valid_index()
Out[50]: 6
EDIT
If you want to an integer index you could use get_loc
of the pandas indices:
b = s[4:]
In [156]: b
Out[156]:
4 NaN
5 NaN
6 7
7 8
8 9
9 NaN
10 NaN
11 NaN
12 10
13 10
dtype: float64
In [157]: b.first_valid_index()
Out[157]: 6
In [158]: b.index.get_loc(b.first_valid_index())
Out[158]: 2
EDIT2
You could use get_indexer
to get all indices where you have NaNs
and where you have valid values:
import string
s = pd.Series([5, 6, 5, np.nan, np.nan, np.nan, 7, 8, 9, np.nan, np.nan, np.nan, 10, 10], index = list(string.ascii_letters[:len(s.index)]))
In [216]: s
Out[216]:
a 5
b 6
c 5
d NaN
e NaN
f NaN
g 7
h 8
i 9
j NaN
k NaN
l NaN
m 10
n 10
dtype: float64
valid_indx = s.index.get_indexer(s.index[~s.isnull()])
nan_indx = s.index.get_indexer(s.index[s.isnull()])
In [220]: valid_indx
Out[220]: array([ 0, 1, 2, 6, 7, 8, 12, 13])
In [221]: nan_indx
Out[221]: array([ 3, 4, 5, 9, 10, 11])
Or the simplest way will be with np.where
:
In [222]: np.where(s.isnull())
Out[222]: (array([ 3, 4, 5, 9, 10, 11], dtype=int32),)
In [223]: np.where(~s.isnull())
Out[223]: (array([ 0, 1, 2, 6, 7, 8, 12, 13], dtype=int32),)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With