I have a pandas dataframe "df", a sample of which is below:
time x
0 1 1
1 2 Nan
2 3 3
3 4 Nan
4 5 8
5 6 7
6 7 5
7 8 Nan
The real frame is much bigger. I am trying to find the longest stretch of non NaN values in the "x" series, and print out the starting and ending index for this frame. Is this possible?
Thank You
pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.
The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
Here's a vectorized approach with NumPy tools -
a = df.x.values # Extract out relevant column from dataframe as array
m = np.concatenate(( [True], np.isnan(a), [True] )) # Mask
ss = np.flatnonzero(m[1:] != m[:-1]).reshape(-1,2) # Start-stop limits
start,stop = ss[(ss[:,1] - ss[:,0]).argmax()] # Get max interval, interval limits
Sample run -
In [474]: a
Out[474]:
array([ 1., nan, 3., nan, nan, nan, nan, 8., 7., 5., 2.,
5., nan, nan])
In [475]: start, stop
Out[475]: (7, 12)
The intervals are set such that the difference between each start and stop would give us the length of each interval. So, by ending index
if you meant to get the last index of non-zero element, we need to subtract one from stop
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With