Selecting time-window in a dataframe

Question

I have a dataframe, df, which looks like this:

                     HeartRate_smooth
2018-01-01 00:07:00  58.000000
2018-01-01 00:13:00  59.333333
2018-01-01 00:14:00  57.333333
2018-01-01 00:20:00  59.333333
2018-01-01 00:21:00  59.333333
2018-01-01 00:22:00  57.333333
2018-01-01 00:34:00  59.666667
2018-01-01 00:36:00  58.666667
2018-01-01 00:37:00  57.666667
2018-01-01 00:38:00  55.000000
2018-01-01 00:39:00  58.333333
2018-01-01 01:03:00  57.666667
2018-01-01 01:08:00  59.666667
2018-01-01 01:09:00  56.333333
2018-01-01 01:10:00  54.666667
2018-01-01 01:32:00  59.666667
2018-01-01 01:33:00  57.000000
2018-01-01 01:34:00  54.333333
2018-01-01 01:56:00  56.000000
2018-01-01 01:57:00  58.000000
2018-01-01 01:58:00  59.000000
2018-01-01 02:03:00  59.666667
2018-01-01 02:07:00  58.666667
2018-01-01 03:00:00  59.666667
2018-01-01 03:09:00  59.333333
2018-01-01 03:10:00  58.333333
2018-01-01 03:31:00  58.666667
2018-01-01 10:46:00  59.666667
2018-01-01 12:40:00  58.333333
2018-01-01 14:42:00  59.000000

This dataframe is collection of the timepoints for when the patient's heartrate is bellow a threshold. I am assuming that these points are either when the patient is at rest or asleep. I am trying to find away where I can identify the period where the patient is asleep. I assume the patient is asleep when there is data present for more than an hour with less than 30mins interval between each row of a time period.

In the given dataframe, I can assume that the patient is asleep from 00:07 to 02:07. This is because there is less than 30mins of missing data between each row from 00:07 to 02:07. The row that comes after 02:07 has a time difference of more than 30mins and so I assume that the patient has woken.

Please note that I would be looping through multiple patient data and the period that the patient is asleep will be different. It may not always begin from the first entry in the dataframe.

My questions are:
1. How would I identify the period that the patient is asleep and split the current dataframe into 2, where one of the dfs is used to store data when the patient is asleep and the other, when the patient is awake?
2. This is not neccessary, but if possible, how can I print out the time and amount of time that the patient is asleep?

Sample data output based on sample dataframe provided:
Asleep_df:

                     HeartRate_smooth
2018-01-01 00:07:00  58.000000
2018-01-01 00:13:00  59.333333
2018-01-01 00:14:00  57.333333
2018-01-01 00:20:00  59.333333
2018-01-01 00:21:00  59.333333
2018-01-01 00:22:00  57.333333
2018-01-01 00:34:00  59.666667
2018-01-01 00:36:00  58.666667
2018-01-01 00:37:00  57.666667
2018-01-01 00:38:00  55.000000
2018-01-01 00:39:00  58.333333
2018-01-01 01:03:00  57.666667
2018-01-01 01:08:00  59.666667
2018-01-01 01:09:00  56.333333
2018-01-01 01:10:00  54.666667
2018-01-01 01:32:00  59.666667
2018-01-01 01:33:00  57.000000
2018-01-01 01:34:00  54.333333
2018-01-01 01:56:00  56.000000
2018-01-01 01:57:00  58.000000
2018-01-01 01:58:00  59.000000
2018-01-01 02:03:00  59.666667
2018-01-01 02:07:00  58.666667

Awake_df:

                     HeartRate_smooth
2018-01-01 03:00:00  59.666667
2018-01-01 03:09:00  59.333333
2018-01-01 03:10:00  58.333333
2018-01-01 03:31:00  58.666667
2018-01-01 10:46:00  59.666667
2018-01-01 12:40:00  58.333333
2018-01-01 14:42:00  59.000000

"Patient was asleep from 00:07 to 03:31 for 3Hours and 24 minutes"

Quang Hoang · Accepted Answer

I find it's easier to handle time which is not index:

df.reset_index(inplace=True)

# df now has a timestamp column named 'index'

# difference with previous row larger than 30 mins
# cumsum for consecutive block:
df['block'] = df['index'].diff().dt.seconds.ge(30*60).cumsum()

# all sleep chunks
awake_df = (df.set_index('index')
              .groupby('block')[['HeartRate_smooth']]
              .apply(lambda x: x if len(x) > 1 else None)
           )

Output awake_df:

+--------+----------------------+-------------------+
|        |                      | HeartRate_smooth  |
+--------+----------------------+-------------------+
| block  | index                |                   |
+--------+----------------------+-------------------+    
| 0      | 2018-01-01 00:07:00  | 58.000000         |
|        | 2018-01-01 00:13:00  | 59.333333         |
|        | 2018-01-01 00:14:00  | 57.333333         |
|        | 2018-01-01 00:20:00  | 59.333333         |
|        | 2018-01-01 00:21:00  | 59.333333         |
|        | 2018-01-01 00:22:00  | 57.333333         |
|        | 2018-01-01 00:34:00  | 59.666667         |
|        | 2018-01-01 00:36:00  | 58.666667         |
|        | 2018-01-01 00:37:00  | 57.666667         |
|        | 2018-01-01 00:38:00  | 55.000000         |
|        | 2018-01-01 00:39:00  | 58.333333         |
|        | 2018-01-01 01:03:00  | 57.666667         |
|        | 2018-01-01 01:08:00  | 59.666667         |
|        | 2018-01-01 01:09:00  | 56.333333         |
|        | 2018-01-01 01:10:00  | 54.666667         |
|        | 2018-01-01 01:32:00  | 59.666667         |
|        | 2018-01-01 01:33:00  | 57.000000         |
|        | 2018-01-01 01:34:00  | 54.333333         |
|        | 2018-01-01 01:56:00  | 56.000000         |
|        | 2018-01-01 01:57:00  | 58.000000         |
|        | 2018-01-01 01:58:00  | 59.000000         |
|        | 2018-01-01 02:03:00  | 59.666667         |
|        | 2018-01-01 02:07:00  | 58.666667         |
| 1      | 2018-01-01 03:00:00  | 59.666667         |
|        | 2018-01-01 03:09:00  | 59.333333         |
|        | 2018-01-01 03:10:00  | 58.333333         |
|        | 2018-01-01 03:31:00  | 58.666667         |
+--------+----------------------+-------------------+

Note that there are two sleeping chunks since your data actually has a 53 min gap between 02:07 to 03:00. And to get the sleeping time:

(awake_df.reset_index(level=1)
         .groupby('block')['index']
         .apply(lambda x: x.max()-x.min())
)

gives:

block
0     02:00:00
1     00:22:00
Name: index, dtype: timedelta64[ns]

Selecting time-window in a dataframe

Tags:

python

pandas

dataframe

time-series

user11371534

1 Answers

Quang Hoang

Recent Activity

Donate For Us

Selecting time-window in a dataframe

Tags:

python

pandas

dataframe

time-series

user11371534

1 Answers

Quang Hoang

Related questions

Recent Activity

Donate For Us