I've been racking my brain at the following issue.
Let's say I have transcript data from speakers. Every row is a particular utterance from a speaker (given by speaker_id) and when they started talking, Timestamp. Sometimes consecutive rows will contain utterances from the same speaker because they continue speaking after a pause.
I'd like to figure out the total amount of time the speakers in the dataset have spoken, which requires that I compute the difference between the start of a timestamp and the beginning of the first timestamp of the next speaker. How would I do this?
Here's a sample of the dataset
file = read.table(text = "speaker_id,Timestamp
5,2022-03-30 03:00:00
5,2022-03-30 03:00:24
3,2022-03-30 03:00:52
3,2022-03-30 03:00:56
3,2022-03-30 03:00:58
5,2022-03-30 03:01:25
5,2022-03-30 03:02:15
3,2022-03-30 03:03:14
5,2022-03-30 03:03:36
3,2022-03-30 03:04:26
3,2022-03-30 03:06:02
3,2022-03-30 03:06:10
5,2022-03-30 03:06:28
5,2022-03-30 03:07:28
3,2022-03-30 03:08:56
5,2022-03-30 03:09:11
5,2022-03-30 03:10:02
5,2022-03-30 03:10:56
3,2022-03-30 03:11:53
5,2022-03-30 03:12:20", header = T, sep = ",")
Any ideas?
Here is an approach using {dplyr}:
dplyr::leadlibrary(dplyr)
file %>%
mutate(
Timestamp = as.POSIXct(Timestamp),
duration = lead(Timestamp) - Timestamp) %>%
group_by(speaker_id) %>%
summarize(total_duration = sum(duration, na.rm = TRUE))
One option could be:
file %>%
mutate(Timestamp2 = lead(Timestamp, default = last(Timestamp))) %>%
group_by(speaker_id,
rleid = cumsum(speaker_id != lag(speaker_id, default = first(speaker_id)))) %>%
mutate(Timestamp_diff = last(ymd_hms(Timestamp2)) - first(ymd_hms(Timestamp)))
speaker_id Timestamp Timestamp2 rleid Timestamp_diff
<int> <chr> <chr> <int> <drtn>
1 5 2022-03-30 03:00:00 2022-03-30 03:00:24 0 52 secs
2 5 2022-03-30 03:00:24 2022-03-30 03:00:52 0 52 secs
3 3 2022-03-30 03:00:52 2022-03-30 03:00:56 1 33 secs
4 3 2022-03-30 03:00:56 2022-03-30 03:00:58 1 33 secs
5 3 2022-03-30 03:00:58 2022-03-30 03:01:25 1 33 secs
6 5 2022-03-30 03:01:25 2022-03-30 03:02:15 2 109 secs
7 5 2022-03-30 03:02:15 2022-03-30 03:03:14 2 109 secs
8 3 2022-03-30 03:03:14 2022-03-30 03:03:36 3 22 secs
9 5 2022-03-30 03:03:36 2022-03-30 03:04:26 4 50 secs
10 3 2022-03-30 03:04:26 2022-03-30 03:06:02 5 122 secs
11 3 2022-03-30 03:06:02 2022-03-30 03:06:10 5 122 secs
12 3 2022-03-30 03:06:10 2022-03-30 03:06:28 5 122 secs
13 5 2022-03-30 03:06:28 2022-03-30 03:07:28 6 148 secs
14 5 2022-03-30 03:07:28 2022-03-30 03:08:56 6 148 secs
15 3 2022-03-30 03:08:56 2022-03-30 03:09:11 7 15 secs
16 5 2022-03-30 03:09:11 2022-03-30 03:10:02 8 162 secs
17 5 2022-03-30 03:10:02 2022-03-30 03:10:56 8 162 secs
18 5 2022-03-30 03:10:56 2022-03-30 03:11:53 8 162 secs
19 3 2022-03-30 03:11:53 2022-03-30 03:12:20 9 27 secs
20 5 2022-03-30 03:12:20 2022-03-30 03:12:20 10 0 secs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With