I have a dataframe of ids and timestamps. I'd like to calculate the difference between each sequential timestamp for an individual id.
My dataframe looks like this:
id time
Alpha 1
Alpha 4
Alpha 7
Beta 5
Beta 10
I'm trying to add a column like time.difference
below:
id time time.difference
Alpha 1 NA
Alpha 4 3
Alpha 7 4
Beta 5 NA
Beta 10 5
Is there a clean way to do this using dplyr? (or tidyr or something else that's easier to read than vanilla R?)
diff() method in base R is used to find the difference among all the pairs of consecutive rows in the R dataframe. It returns a vector with the length equivalent to the length of the input column – 1.
The difference is calculated by using the particular row of the specified column and subtracting from it the previous value computed using the shift() method.
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
Like this:
dat %>%
group_by(id) %>%
mutate(time.difference = time - lag(time))
using data.table
library(data.table)
library(dplyr)
setDT(dat)[, time.difference := time - lag(time, 1L), by = id]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With