How to split a pandas time-series by NAN values

Tags:

I have a pandas TimeSeries which looks like this:

2007-02-06 15:00:00    0.780
2007-02-06 16:00:00    0.125
2007-02-06 17:00:00    0.875
2007-02-06 18:00:00      NaN
2007-02-06 19:00:00    0.565
2007-02-06 20:00:00    0.875
2007-02-06 21:00:00    0.910
2007-02-06 22:00:00    0.780
2007-02-06 23:00:00      NaN
2007-02-07 00:00:00      NaN
2007-02-07 01:00:00    0.780
2007-02-07 02:00:00    0.580
2007-02-07 03:00:00    0.880
2007-02-07 04:00:00    0.791
2007-02-07 05:00:00      NaN

I would like split the pandas TimeSeries everytime there occurs one or more NaN values in a row. The goal is that I have separated events.

Event1:
2007-02-06 15:00:00    0.780
2007-02-06 16:00:00    0.125
2007-02-06 17:00:00    0.875

Event2:
2007-02-06 19:00:00    0.565
2007-02-06 20:00:00    0.875
2007-02-06 21:00:00    0.910
2007-02-06 22:00:00    0.780

I could loop through every row but is there also a smart way of doing that???

929

asked Jan 28 '14 10:01

MonteCarlo

1 Answers

For anyone looking for a non-deprecated (pandas>=0.25.0) version of bloudermilk's answer, after a bit of digging in the pandas sparse source code, I came up with the following. I tried to keep it as similar as possible to their answer so you can compare:

Given some data:

import pandas as pd
import numpy as np

# 10 days at per-second resolution, starting at midnight Jan 1st, 2011
rng = pd.date_range('1/1/2011', periods=10 * 24 * 60 * 60, freq='S')

# NaN data interspersed with 3 blocks of non-NaN data
dense_ts = pd.Series(np.nan, index=rng, dtype=np.float64)
dense_ts[500:510] = np.random.randn(10)
dense_ts[12000:12015] = np.random.randn(15)
dense_ts[20000:20050] = np.random.randn(50)

Which looks like:

2011-01-01 00:00:00   NaN
2011-01-01 00:00:01   NaN
2011-01-01 00:00:02   NaN
2011-01-01 00:00:03   NaN
2011-01-01 00:00:04   NaN
                       ..
2011-01-10 23:59:55   NaN
2011-01-10 23:59:56   NaN
2011-01-10 23:59:57   NaN
2011-01-10 23:59:58   NaN
2011-01-10 23:59:59   NaN
Freq: S, Length: 864000, dtype: float64

We can find the blocks efficiently and easily:

# Convert to sparse then query index to find block locations
# different way of converting to sparse in pandas>=0.25.0
sparse_ts = dense_ts.astype(pd.SparseDtype('float'))
# we need to use .values.sp_index.to_block_index() in this version of pandas
block_locs = zip(
    sparse_ts.values.sp_index.to_block_index().blocs,
    sparse_ts.values.sp_index.to_block_index().blengths,
)
# Map the sparse blocks back to the dense timeseries
blocks = [
    dense_ts.iloc[start : (start + length - 1)]
    for (start, length) in block_locs
]

Voila

> blocks
[2011-01-01 00:08:20    0.092338
 2011-01-01 00:08:21    1.196703
 2011-01-01 00:08:22    0.936586
 2011-01-01 00:08:23   -0.354768
 2011-01-01 00:08:24   -0.209642
 2011-01-01 00:08:25   -0.750103
 2011-01-01 00:08:26    1.344343
 2011-01-01 00:08:27    1.446148
 2011-01-01 00:08:28    1.174443
 Freq: S, dtype: float64,
 2011-01-01 03:20:00    1.327026
 2011-01-01 03:20:01   -0.431162
 2011-01-01 03:20:02   -0.461407
 2011-01-01 03:20:03   -1.330671
 2011-01-01 03:20:04   -0.892480
 2011-01-01 03:20:05   -0.323433
 2011-01-01 03:20:06    2.520965
 2011-01-01 03:20:07    0.140757
 2011-01-01 03:20:08   -1.688278
 2011-01-01 03:20:09    0.856346
 2011-01-01 03:20:10    0.013968
 2011-01-01 03:20:11    0.204514
 2011-01-01 03:20:12    0.287756
 2011-01-01 03:20:13   -0.727400
 Freq: S, dtype: float64,
 2011-01-01 05:33:20   -1.409744
 2011-01-01 05:33:21    0.338251
 2011-01-01 05:33:22    0.215555
 2011-01-01 05:33:23   -0.309874
 2011-01-01 05:33:24    0.753737
 2011-01-01 05:33:25   -0.349966
 2011-01-01 05:33:26    0.074758
 2011-01-01 05:33:27   -1.574485
 2011-01-01 05:33:28    0.595844
 2011-01-01 05:33:29   -0.670004
 2011-01-01 05:33:30    1.655479
 2011-01-01 05:33:31   -0.362853
 2011-01-01 05:33:32    0.167355
 2011-01-01 05:33:33    0.703780
 2011-01-01 05:33:34    2.633756
 2011-01-01 05:33:35    1.898891
 2011-01-01 05:33:36   -1.129365
 2011-01-01 05:33:37   -0.765057
 2011-01-01 05:33:38    0.279869
 2011-01-01 05:33:39    1.388705
 2011-01-01 05:33:40   -1.420761
 2011-01-01 05:33:41    0.455692
 2011-01-01 05:33:42    0.367106
 2011-01-01 05:33:43    0.856598
 2011-01-01 05:33:44    1.920748
 2011-01-01 05:33:45    0.648581
 2011-01-01 05:33:46   -0.606784
 2011-01-01 05:33:47   -0.246285
 2011-01-01 05:33:48   -0.040520
 2011-01-01 05:33:49    1.422764
 2011-01-01 05:33:50   -1.686568
 2011-01-01 05:33:51    1.282430
 2011-01-01 05:33:52    1.358482
 2011-01-01 05:33:53   -0.998765
 2011-01-01 05:33:54   -0.009527
 2011-01-01 05:33:55    0.647671
 2011-01-01 05:33:56   -1.098435
 2011-01-01 05:33:57   -0.638245
 2011-01-01 05:33:58   -1.820668
 2011-01-01 05:33:59    0.768250
 2011-01-01 05:34:00   -1.029975
 2011-01-01 05:34:01   -0.744205
 2011-01-01 05:34:02    1.627130
 2011-01-01 05:34:03    2.058689
 2011-01-01 05:34:04   -1.194971
 2011-01-01 05:34:05    1.293214
 2011-01-01 05:34:06    0.029523
 2011-01-01 05:34:07   -0.405785
 2011-01-01 05:34:08    0.837123
 Freq: S, dtype: float64]

answered Oct 09 '22 22:10

thesofakillers

Related questions
                            
                                Call super().__init__() in classes derived from `object`?
                            
                                change some lowercase letters to uppercase in string
                            
                                How to fill a list
                            
                                Python - Access object attributes as in a dictionary
                            
                                Leave arguments untouched with argparse
                            
                                3d rotation on image
                            
                                most efficent way of finding the minimum float in a python list
                            
                                Concatenate or print list elements with a trailing comma in Python
                            
                                Django - catch exception
                            
                                Ackermann Function Understanding
                            
                                reorder byte order in hex string (python)
                            
                                Removing key values pairs from a list of dictionaries
                            
                                Python ftplib connection error (gaierror)
                            
                                PythonMagick can't find my pdf files
                            
                                Python - Descriptor 'split' requires a 'str' object but received a 'unicode'
                            
                                Convert unicode with utf-8 string as content to str
                            
                                Using Beautiful Soup to get the full URL in source code
                            
                                Getting a list of locally-defined functions in python
                            
                                Tulip/asyncIO: why not all calls be async and specify when things should be synchronous?
                            
                                How to assign IP address to interface in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to split a pandas time-series by NAN values

Tags:

python

split

pandas

numpy

time-series

MonteCarlo

People also ask

1 Answers

thesofakillers

Recent Activity

Donate For Us