Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duration of Value in a Pandas DataFrame

Tags:

python

pandas

I have the following DataFrame:

            f_1    f_2    f_3
00:00:00  False  False  False
00:05:22   True  False  False
00:06:40   True  False  False
00:06:41  False  False  False
00:06:42  False  False  False
00:06:43  False  False  False
00:06:44  False  False  False
00:06:46  False  False  False
00:06:58  False  False  False

and I want to compute the total duration of when a Series was True. In this example, the only series that became True for a while was f_1. Currently, I use the following code:

result = pandas.Timedelta(0)

for _, series in falsePositives.iteritems():
    previousTime = None
    previousValue = None
    for currentTime, currentValue in series.iteritems():
        if previousValue:
            result += (currentTime - previousTime)
        previousTime = currentTime
        previousValue = currentValue

print result.total_seconds()

Is there a better solution? I reckon there is already a method in Pandas which is doing either this or something similar to this.

like image 508
major4x Avatar asked Jan 20 '26 03:01

major4x


1 Answers

I think you can create Series from index by to_series, difference by diff and shift by shift and last get dt.total_seconds.

Last multiple boolean DataFrame by mul and last get sum:

#if necessary convert index to Timedelta
df.index = pd.to_timedelta(df.index)

s = df.index.to_series().diff().shift(-1).dt.total_seconds()
df1 = df.mul(s, 0)
print (df1)
           f_1  f_2  f_3
00:00:00   0.0  0.0  0.0
00:05:22  78.0  0.0  0.0
00:06:40   1.0  0.0  0.0
00:06:41   0.0  0.0  0.0
00:06:42   0.0  0.0  0.0
00:06:43   0.0  0.0  0.0
00:06:44   0.0  0.0  0.0
00:06:46   0.0  0.0  0.0
00:06:58   NaN  NaN  NaN

print (df1.sum())
f_1    79.0
f_2     0.0
f_3     0.0
dtype: float64
like image 161
jezrael Avatar answered Jan 23 '26 20:01

jezrael