Calculate difference between values in consecutive rows by group

Tags:

r

This is a my df (data.frame):

group value 1     10 1     20 1     25 2     5 2     10 2     15

I need to calculate difference between values in consecutive rows by group.

So, I need a that result.

group value diff 1     10    NA # because there is a no previous value 1     20    10 # value[2] - value[1] 1     25    5  # value[3] value[2] 2     5     NA # because group is changed 2     10    5  # value[5] - value[4] 2     15    5  # value[6] - value[5]

Although, I can handle this problem by using ddply, but it takes too much time. This is because I have a lot of groups in my df. (over 1,000,000 groups in my df)

Are there any other effective approaches to handle this problem?

522

asked Feb 13 '13 04:02

kmangyo

2 Answers

The package data.table can do this fairly quickly, using the shift function.

require(data.table) df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15)) #setDT(df) #if df is already a data frame  df[ , diff := value - shift(value), by = group]     #   group value diff #1:     1    10   NA #2:     1    20   10 #3:     1    25    5 #4:     2     5   NA #5:     2    10    5 #6:     2    15    5 setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%     group_by(group) %>%     mutate(Diff = value - lag(value)) #   group value  Diff #   <int> <int> <int> # 1     1    10    NA # 2     1    20    10 # 3     1    25     5 # 4     2     5    NA # 5     2    10     5 # 6     2    15     5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

128

answered Oct 17 '22 12:10

Blue Magister

You can use the base function ave() for this

df <- data.frame(group=rep(c(1,2),each=3),value=c(10,20,25,5,10,15)) df$diff <- ave(df$value, factor(df$group), FUN=function(x) c(NA,diff(x)))

which returns

  group value diff 1     1    10   NA 2     1    20   10 3     1    25    5 4     2     5   NA 5     2    10    5 6     2    15    5

answered Oct 17 '22 12:10

MrFlick

Related questions
                            
                                Extract p-value from aov
                            
                                rbind error: "names do not match previous names"
                            
                                How to divide each row of a matrix by elements of a vector in R
                            
                                How to crash R?
                            
                                Delete rows with blank values in one particular column
                            
                                Compare two character vectors in R
                            
                                dplyr mutate rowSums calculations or custom functions
                            
                                groupby weighted average and sum in pandas dataframe
                            
                                Subset data to contain only columns whose names match a condition
                            
                                Replace multiple letters with accents with gsub
                            
                                Wrap long axis labels via labeller=label_wrap in ggplot2
                            
                                Plot a function with ggplot, equivalent of curve()
                            
                                Confidence intervals for predictions from logistic regression
                            
                                ending "+" prompt in R
                            
                                What specifically are the dangers of eval(parse(...))?
                            
                                Why am I getting X. in my column names when reading a data frame?
                            
                                Merge unequal dataframes and replace missing rows with 0
                            
                                dplyr summarise_each with na.rm
                            
                                How to get row index number in R?
                            
                                How to find the highest value of a column in a data frame in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With