Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a pandas time-series by NAN values

I have a pandas TimeSeries which looks like this:

2007-02-06 15:00:00    0.780
2007-02-06 16:00:00    0.125
2007-02-06 17:00:00    0.875
2007-02-06 18:00:00      NaN
2007-02-06 19:00:00    0.565
2007-02-06 20:00:00    0.875
2007-02-06 21:00:00    0.910
2007-02-06 22:00:00    0.780
2007-02-06 23:00:00      NaN
2007-02-07 00:00:00      NaN
2007-02-07 01:00:00    0.780
2007-02-07 02:00:00    0.580
2007-02-07 03:00:00    0.880
2007-02-07 04:00:00    0.791
2007-02-07 05:00:00      NaN   

I would like split the pandas TimeSeries everytime there occurs one or more NaN values in a row. The goal is that I have separated events.

Event1:
2007-02-06 15:00:00    0.780
2007-02-06 16:00:00    0.125
2007-02-06 17:00:00    0.875

Event2:
2007-02-06 19:00:00    0.565
2007-02-06 20:00:00    0.875
2007-02-06 21:00:00    0.910
2007-02-06 22:00:00    0.780

I could loop through every row but is there also a smart way of doing that???

like image 929
MonteCarlo Avatar asked Jan 28 '14 10:01

MonteCarlo


People also ask

Does Panda read NaN na?

This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.


1 Answers

For anyone looking for a non-deprecated (pandas>=0.25.0) version of bloudermilk's answer, after a bit of digging in the pandas sparse source code, I came up with the following. I tried to keep it as similar as possible to their answer so you can compare:

Given some data:

import pandas as pd
import numpy as np

# 10 days at per-second resolution, starting at midnight Jan 1st, 2011
rng = pd.date_range('1/1/2011', periods=10 * 24 * 60 * 60, freq='S')

# NaN data interspersed with 3 blocks of non-NaN data
dense_ts = pd.Series(np.nan, index=rng, dtype=np.float64)
dense_ts[500:510] = np.random.randn(10)
dense_ts[12000:12015] = np.random.randn(15)
dense_ts[20000:20050] = np.random.randn(50)

Which looks like:

2011-01-01 00:00:00   NaN
2011-01-01 00:00:01   NaN
2011-01-01 00:00:02   NaN
2011-01-01 00:00:03   NaN
2011-01-01 00:00:04   NaN
                       ..
2011-01-10 23:59:55   NaN
2011-01-10 23:59:56   NaN
2011-01-10 23:59:57   NaN
2011-01-10 23:59:58   NaN
2011-01-10 23:59:59   NaN
Freq: S, Length: 864000, dtype: float64

We can find the blocks efficiently and easily:

# Convert to sparse then query index to find block locations
# different way of converting to sparse in pandas>=0.25.0
sparse_ts = dense_ts.astype(pd.SparseDtype('float'))
# we need to use .values.sp_index.to_block_index() in this version of pandas
block_locs = zip(
    sparse_ts.values.sp_index.to_block_index().blocs,
    sparse_ts.values.sp_index.to_block_index().blengths,
)
# Map the sparse blocks back to the dense timeseries
blocks = [
    dense_ts.iloc[start : (start + length - 1)]
    for (start, length) in block_locs
]

Voila

> blocks
[2011-01-01 00:08:20    0.092338
 2011-01-01 00:08:21    1.196703
 2011-01-01 00:08:22    0.936586
 2011-01-01 00:08:23   -0.354768
 2011-01-01 00:08:24   -0.209642
 2011-01-01 00:08:25   -0.750103
 2011-01-01 00:08:26    1.344343
 2011-01-01 00:08:27    1.446148
 2011-01-01 00:08:28    1.174443
 Freq: S, dtype: float64,
 2011-01-01 03:20:00    1.327026
 2011-01-01 03:20:01   -0.431162
 2011-01-01 03:20:02   -0.461407
 2011-01-01 03:20:03   -1.330671
 2011-01-01 03:20:04   -0.892480
 2011-01-01 03:20:05   -0.323433
 2011-01-01 03:20:06    2.520965
 2011-01-01 03:20:07    0.140757
 2011-01-01 03:20:08   -1.688278
 2011-01-01 03:20:09    0.856346
 2011-01-01 03:20:10    0.013968
 2011-01-01 03:20:11    0.204514
 2011-01-01 03:20:12    0.287756
 2011-01-01 03:20:13   -0.727400
 Freq: S, dtype: float64,
 2011-01-01 05:33:20   -1.409744
 2011-01-01 05:33:21    0.338251
 2011-01-01 05:33:22    0.215555
 2011-01-01 05:33:23   -0.309874
 2011-01-01 05:33:24    0.753737
 2011-01-01 05:33:25   -0.349966
 2011-01-01 05:33:26    0.074758
 2011-01-01 05:33:27   -1.574485
 2011-01-01 05:33:28    0.595844
 2011-01-01 05:33:29   -0.670004
 2011-01-01 05:33:30    1.655479
 2011-01-01 05:33:31   -0.362853
 2011-01-01 05:33:32    0.167355
 2011-01-01 05:33:33    0.703780
 2011-01-01 05:33:34    2.633756
 2011-01-01 05:33:35    1.898891
 2011-01-01 05:33:36   -1.129365
 2011-01-01 05:33:37   -0.765057
 2011-01-01 05:33:38    0.279869
 2011-01-01 05:33:39    1.388705
 2011-01-01 05:33:40   -1.420761
 2011-01-01 05:33:41    0.455692
 2011-01-01 05:33:42    0.367106
 2011-01-01 05:33:43    0.856598
 2011-01-01 05:33:44    1.920748
 2011-01-01 05:33:45    0.648581
 2011-01-01 05:33:46   -0.606784
 2011-01-01 05:33:47   -0.246285
 2011-01-01 05:33:48   -0.040520
 2011-01-01 05:33:49    1.422764
 2011-01-01 05:33:50   -1.686568
 2011-01-01 05:33:51    1.282430
 2011-01-01 05:33:52    1.358482
 2011-01-01 05:33:53   -0.998765
 2011-01-01 05:33:54   -0.009527
 2011-01-01 05:33:55    0.647671
 2011-01-01 05:33:56   -1.098435
 2011-01-01 05:33:57   -0.638245
 2011-01-01 05:33:58   -1.820668
 2011-01-01 05:33:59    0.768250
 2011-01-01 05:34:00   -1.029975
 2011-01-01 05:34:01   -0.744205
 2011-01-01 05:34:02    1.627130
 2011-01-01 05:34:03    2.058689
 2011-01-01 05:34:04   -1.194971
 2011-01-01 05:34:05    1.293214
 2011-01-01 05:34:06    0.029523
 2011-01-01 05:34:07   -0.405785
 2011-01-01 05:34:08    0.837123
 Freq: S, dtype: float64]
like image 54
thesofakillers Avatar answered Oct 09 '22 22:10

thesofakillers