Pandas: run length of NaN holes

Tags:

I have hundreds of timeseries objects with 100000's of entries in each. Some percentage of the data entries are missing (NaN). It is important to my application whether those are single, scattered NaNs or long sequences of NaNs.

Therefore I would like a function for giving me the runlength of each contiguous sequence of NaN. I can do

myseries.isnull()

to get a series of bool. And I can do moving median or moving average to get an idea about the size of the data holes. However, it would be nice if there was an efficient way of getting a list of hole lenghts for a series.

I.e., it would be nice to have a myfunc so that

a = pdSeries([1, 2, 3, np.nan, 4, np.nan, np.nan, np.nan, 5, np.nan, np.nan])
myfunc(a.isnull())
==> Series([1, 3, 2])

(because there are 1, 3 and 2 NaNs, respectively)

From that, I can make histograms of hole lengths, and of the and or or of isnull of multiple series (that might be substitutes for eachother), and other nice things.

I would also like to get ideas of other ways to quantify the "clumpiness" of the data holes.

402

asked May 31 '13 12:05

Bjarke Ebert

1 Answers

import pandas as pd
import numpy as np
import itertools

a = pd.Series([1, 2, 3, np.nan, 4, np.nan, np.nan, np.nan, 5, np.nan, np.nan])
len_holes = [len(list(g)) for k, g in itertools.groupby(a, lambda x: np.isnan(x)) if k]
print len_holes

results in

[1, 3, 2]

137

answered Oct 22 '22 09:10

Wouter Overmeire

Related questions
                            
                                Finding first non-zero value along axis of a sorted two dimensional numpy array
                            
                                Install tkinter and python locally
                            
                                python efficient substring search [duplicate]
                            
                                Does '[ab]+' equal '(a|b)+' in python re module?
                            
                                Flask OperationalError: unable to open database file using sqlite3
                            
                                Matplotlib savefig into different pages of a PDF
                            
                                Why do you need to set the WORKON_HOME environment variable?
                            
                                Dynamically adding methods with or without metaclass
                            
                                How to join mixed list (array) (with integers in it) in Python?
                            
                                Why does matplotlib fill_between draw edgelines only on a PDF?
                            
                                Why required and default are mutally exclusive in ndb?
                            
                                Prevent Sympy from rearranging the equation
                            
                                In sqlalchemy, how can I use polymorphic joined table inheritance when the child table has multiple foreign keys to the parent table?
                            
                                Python argparse: How to insert newline the help text in subparser?
                            
                                How can I retrieve all Tweets and attributes for a given user using Python?
                            
                                webapp2 with python3
                            
                                How to make a timer program in Python
                            
                                How to monkeypatch builtin function datetime.datetime.now?
                            
                                Classifiying a set of Images into Classes
                            
                                Simultaneous assignment semantics in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: run length of NaN holes

Tags:

python

pandas

Bjarke Ebert

People also ask

1 Answers

Wouter Overmeire

Recent Activity

Donate For Us