Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Find longest stretch without Nan values

I have a pandas dataframe "df", a sample of which is below:

   time  x
0  1     1
1  2     Nan 
2  3     3
3  4     Nan
4  5     8
5  6     7
6  7     5
7  8     Nan

The real frame is much bigger. I am trying to find the longest stretch of non NaN values in the "x" series, and print out the starting and ending index for this frame. Is this possible?

Thank You

like image 653
Jeff Saltfist Avatar asked Jan 05 '17 20:01

Jeff Saltfist


People also ask

Does pandas mean ignore NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.

What does Fillna () method do?

The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.


1 Answers

Here's a vectorized approach with NumPy tools -

a = df.x.values  # Extract out relevant column from dataframe as array
m = np.concatenate(( [True], np.isnan(a), [True] ))  # Mask
ss = np.flatnonzero(m[1:] != m[:-1]).reshape(-1,2)   # Start-stop limits
start,stop = ss[(ss[:,1] - ss[:,0]).argmax()]  # Get max interval, interval limits

Sample run -

In [474]: a
Out[474]: 
array([  1.,  nan,   3.,  nan,  nan,  nan,  nan,   8.,   7.,   5.,   2.,
         5.,  nan,  nan])

In [475]: start, stop
Out[475]: (7, 12)

The intervals are set such that the difference between each start and stop would give us the length of each interval. So, by ending index if you meant to get the last index of non-zero element, we need to subtract one from stop.

like image 158
Divakar Avatar answered Oct 18 '22 20:10

Divakar