Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating the difference between consecutive rows by group using dplyr?

Tags:

r

dplyr

I have a dataframe of ids and timestamps. I'd like to calculate the difference between each sequential timestamp for an individual id.

My dataframe looks like this:

id  time
Alpha   1
Alpha   4
Alpha   7
Beta    5
Beta    10

I'm trying to add a column like time.difference below:

id  time    time.difference
Alpha   1   NA
Alpha   4   3
Alpha   7   4
Beta    5   NA
Beta    10  5

Is there a clean way to do this using dplyr? (or tidyr or something else that's easier to read than vanilla R?)

like image 751
Thalecress Avatar asked Jul 11 '15 22:07

Thalecress


People also ask

How to calculate difference between rows in R?

diff() method in base R is used to find the difference among all the pairs of consecutive rows in the R dataframe. It returns a vector with the length equivalent to the length of the input column – 1.

How do I find the difference between values in R?

The difference is calculated by using the particular row of the specified column and subtracting from it the previous value computed using the shift() method.

Does Dplyr include Tidyr?

dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.


2 Answers

Like this:

dat %>% 
  group_by(id) %>% 
  mutate(time.difference = time - lag(time))
like image 160
bergant Avatar answered Sep 21 '22 21:09

bergant


using data.table

library(data.table)
library(dplyr)
setDT(dat)[, time.difference := time - lag(time, 1L), by = id]
like image 36
Veerendra Gadekar Avatar answered Sep 25 '22 21:09

Veerendra Gadekar