I would like to count the amount of sessions started at each 15 minutes intervals for businessdays within a large dataset.
My data looks like:
df <-
Start_datetime End_datetime Duration Volume
2016-04-01 06:20:55 2016-04-01 14:41:22 08:20:27 8.360
2016-04-01 08:22:27 2016-04-01 08:22:40 00:00:13 0.000
2016-04-01 08:38:53 2016-04-01 09:31:58 00:53:05 12.570
2016-04-01 09:33:57 2016-04-01 12:37:43 03:03:46 7.320
2016-04-01 10:05:03 2016-04-01 16:41:16 06:36:13 9.520
2016-04-01 12:07:57 2016-04-02 22:22:32 34:14:35 7.230
2016-04-01 16:56:55 2016-04-02 10:40:17 17:43:22 5.300
2016-04-01 17:29:18 2016-04-01 19:50:29 02:21:11 7.020
2016-04-01 17:42:39 2016-04-01 19:45:38 02:02:59 2.430
2016-04-01 17:47:57 2016-04-01 20:26:35 02:38:38 8.090
2016-04-01 22:00:15 2016-04-04 08:22:21 58:22:06 4.710
2016-04-02 01:12:38 2016-04-02 09:49:00 08:36:22 3.150
2016-04-02 01:32:00 2016-04-02 12:49:47 11:17:47 5.760
2016-04-02 07:28:48 2016-04-04 06:58:56 47:30:08 0.000
2016-04-02 07:55:18 2016-04-05 07:55:15 71:59:57 0.240
I would like to count all the starting sessions per 15minutes starting, where:
For business days
Time PTU Count
00:00:00 - 00:15:00 1 10 #(where count is the amount of sessions started between 00:00:00 and 00:15:00)
00:15:00 - 00:30:00 2 6
00:30:00 - 00:45:00 3 5
00:45:00 - 01:00:00 3 3
And so on and the same data for the weekend.
I have tried the cut function:
df$PTU <- table (cut(df$Start_datetime, breaks="15 minutes"))
data.frame(PTU)
EDIT: When I run this i receive the following error:
Error in cut.default(df$Start_datetime, breaks = "15 minutes") :'x' must be numeric
And some functions with lubridate, but I can't seem to make it work. My final goal is to create a table like the following, but then with 15 minutes interval.
Here is a sort of complete process from datetime "strings" to the format you want. The start is a string vector:
Start_time <-
c("2016-04-01 06:20:55", "2016-04-01 08:22:27", "2016-04-01 08:38:53",
"2016-04-01 09:33:57", "2016-04-01 10:05:03", "2016-04-01 12:07:57",
"2016-04-01 16:56:55", "2016-04-01 17:29:18", "2016-04-01 17:42:39",
"2016-04-01 17:47:57", "2016-04-01 22:00:15", "2016-04-02 01:12:38",
"2016-04-02 01:32:00", "2016-04-02 07:28:48", "2016-04-02 07:55:18"
)
df <- data.frame(Start_time)
And this is an actual processing
## We will use two packages
library(lubridate)
library(data.table)
# convert df to data.table, parse the datetime string
setDT(df)[, Start_time := ymd_hms(Start_time)]
# floor time by 15 min to assign the appropriate slot (new variable Start_time_slot)
df[, Start_time_slot := floor_date(Start_time, "15 min")]
# aggregate by wday and time in a date
start_time_data_frame <- df[, .N, by = .(wday(Start_time_slot), format(Start_time_slot, format="%H:%M:%S") )]
# output looks like this
start_time_data_frame
## wday time N
## 1: 6 06:15:00 1
## 2: 6 08:15:00 1
## 3: 6 08:30:00 1
## 4: 6 09:30:00 1
## 5: 6 10:00:00 1
## 6: 6 12:00:00 1
## 7: 6 16:45:00 1
## 8: 6 17:15:00 1
## 9: 6 17:30:00 1
## 10: 6 17:45:00 1
## 11: 6 22:00:00 1
## 12: 7 01:00:00 1
## 13: 7 01:30:00 1
## 14: 7 07:15:00 1
## 15: 7 07:45:00 1
There's two things you have to keep in mind when using cut
on datetimes:
POSIXt
class. I'm quite sure yours isn't, or R wouldn't be using cut.default
but cut.POSIXt
as a method."15 minutes"
should be "15 min"
. See ?cut.POSIXt
So this works:
Start_datetime <- as.POSIXct(
c("2016-04-01 06:20:55",
"2016-04-01 06:22:12",
"2016-04-01 05:30:12")
)
table(cut(Start_datetime, breaks = "15 min"))
# 2016-04-01 05:30:00 2016-04-01 05:45:00 2016-04-01 06:00:00 2016-04-01 06:15:00
# 1 0 0 2
Note that the output gives you the start of the 15 minute interval as names of the table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With