How to group consecutive NaN values from a Pandas Series in a set of slices?

Tags:

I want to merge consecutive NaN values into slices. Is there a simple way of doing this with numpy or pandas?

l = [
    (996, np.nan), (997, np.nan), (998, np.nan),
    (999, -47.3), (1000, -72.5), (1100, -97.7),
    (1200, np.nan), (1201, np.nan), (1205, -97.8),
    (1300, np.nan), (1302, np.nan), (1305, -97.9),
    (1400, np.nan), (1405, -97.10), (1408, np.nan)
]
l = pd.Series(dict(l))

Expected result:

[
    (slice(996, 999, None), array([nan, nan, nan])),
    (999, -47.3),
    (1000, -72.5),
    (1100, -97.7),
    (slice(1200, 1202, None), array([nan, nan])),
    (1205, -97.8),
    (slice(1300, 1301, None), array([nan])),
    (slice(1302, 1303, None), array([nan])),
    (1305, -97.9),
    (slice(1400, 1401, None), array([nan])),
    (1405, -97.1),
    (slice(1408, 1409, None), array([nan]))
]

A numpy array with two dimensions would be OK as well, rather than a list of tuples

Update 2019/05/31: I have just realised that if I just use a dictionary instead of a Pandas Series the algorythm is much more efficient

615

asked May 23 '19 10:05

ChesuCR

1 Answers

What you want is full or corner cases, nan equality, first element of each pair being a slice or a single value, second being a np.array or a single value.

For so complex requirements, I would just rely on a plain Python non vectorized way:

def trans(ser):
    def build(last, cur, val):
        if cur == last + 1:
            if np.isnan(val):
                return (slice(last, cur), np.array([np.nan]))
            else:
                return (last, val)
        else:
            return (slice(last, cur), np.array([val] * (cur - last)))
    last = ser.iloc[0]
    old = last_index = ser.index[0]
    resul = []
    for i in ser.index[1:]:
        val = ser[i]
        if ((val != last) and not(np.isnan(val) and np.isnan(last))) \
           or i != old + 1:
            resul.append(build(last_index, old + 1, last))
            last_index = i
            last = val
        old = i
    resul.append(build(last_index, old+1, last))
    return resul

It gives something close to the expected result:

[(slice(996, 999, None), array([nan, nan, nan])),
 (999, -47.3),
 (1000, -72.5),
 (1100, -97.7),
 (slice(1200, 1202, None), array([nan, nan])),
 (1205, -97.8),
 (slice(1300, 1301, None), array([nan])),
 (slice(1302, 1303, None), array([nan])),
 (1305, -97.9),
 (slice(1400, 1401, None), array([nan])),
 (1405, -97.1),
 (slice(1408, 1409, None), array([nan]))]

answered Oct 04 '22 16:10

Serge Ballesta

Related questions
                            
                                Datetime strptime issue with a timezone offset with colons [duplicate]
                            
                                How to draw fibonacci sequence using turtle module
                            
                                Sphinx autodoc fails to import module
                            
                                Apache Tika exclude some html tags
                            
                                Inference time using of Tensorflow Object Detection
                            
                                Clustering overlapping ellipses
                            
                                AttributeError: module 'cv2' has no attribute 'createStereoBM'
                            
                                Keras - LeakyReLU has no attribute name error when saving model
                            
                                How to save downloaded file when running spider on Scrapinghub?
                            
                                Sympy - dot product and norm of symbolic vector
                            
                                Can pytest hooks use fixtures?
                            
                                Failed to create a directory: logs/fit
                            
                                How do I write a Django query with a subquery as part of the WHERE clause?
                            
                                Change in max length of interned strings in CPython
                            
                                pymatch giving error when fitting: Unable to coerce to Series, length must be 1: given xxx
                            
                                ModuleNotFoundError for import within the same package while using pytest
                            
                                Faster double iteration over a single array in Python
                            
                                What are the keycodes `getwch` returns?
                            
                                Troublesome filter behavior when implementing the "Sieve of Eratosthenes" in python
                            
                                What happens if tf.stop_gradient is not set?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to group consecutive NaN values from a Pandas Series in a set of slices?

Tags:

python

python-3.x

pandas

nan

numpy

ChesuCR

People also ask

1 Answers

Serge Ballesta

Recent Activity

Donate For Us