Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtraction on different rows and columns and separated by group

Tags:

dataframe

r

I really hate to ask two questions in a row but this is something that I can’t wrap my head around. So let’s say I have a data frame, as follows:

   df
   Row# User    Morning     Evening     Measure Date
   1    1          NA          NA          2/18/11
   2    1          50          115         2/19/11
   3    1          85          128         2/20/11
   4    1          62          NA          2/25/11
   5    1          48          100.8        3/8/11
   6    1          19          71          3/9/11
   7    1          25          98          3/10/11
   8    1          NA          105         3/11/11
   9    2          48          105         2/18/11
   10   2          28          203         2/19/11
   11   2          35          80.99        2/21/11
   12   2          91          78.25        2/22/11

Is it possible in R to take the difference between the previous consecutive day (and only the previous day, not the previous result) evening value of 1 row and the morning value of a different row for each user group? So my desired results would be this.

   df
   Row# User    Morning     Evening     Date        Difference
   1       1      NA          NA        2/18/11        NA
   2       1      50          115       2/19/11        NA
   3       1      85          129       2/20/11        30
   4       1      62          NA        2/25/11        NA
   5       1      48          100.8     3/8/11         NA
   6       1      19          71        3/9/11         81.8
   7       1      25          98        3/10/11        46
   8       1      10          105       3/11/11        88
   9       2      48          105       2/18/11        NA
   10      2      28          203       2/19/11        77
   11      2      35          80.99     2/21/11        NA
   12      2      91          78.25     2/22/11        -10.01

All I want this to do is to take the morning value and subtract it from the evening value of the previous consecutive day for each user group. As you can see, some parts of my data frame contain NA values in the morning and evening columns, in addition, not all of the dates are in consecutive order for each different user, so naturally, NA should be assigned.

I've tried searching google but there wasn't much information on being able to apply functions to different rows for each group of rows on different columns (if that makes any sense).

My attempts include many variations of this.

df$Difference<-ave((df$Morning,df$Evening),
                    df$User,
                    FUN=function(x){
                        c('NA',diff(df$Evening-df$Morning)),na.rm=T
                   })

Again, any help would be greatly appreciated. Thanks.

like image 905
rj2700 Avatar asked Feb 26 '13 12:02

rj2700


1 Answers

Note: The input data you show and the output data are not the same. There is a NA which is replaced by 10 in output and the last date is 2/14/11 in input and 2/22/11 in output.

I've assumed the output to be the original data to create this answer to match your result.

df$Diff <- c(NA, head(df$Evening, -1) - tail(df$Morning, -1))
df$Diff[which(c(0, diff(as.Date(as.character(df$Measure_Date), 
                 format="%m/%d/%Y"))) != 1)] <- NA

> df

#    Row User Morning Evening Measure_Date   Diff
# 1    1    1      NA      NA      2/18/11     NA
# 2    2    1      50  115.00      2/19/11     NA
# 3    3    1      85  128.00      2/20/11  30.00
# 4    4    1      62      NA      2/25/11     NA
# 5    5    1      48  100.80       3/8/11     NA
# 6    6    1      19   71.00       3/9/11  81.80
# 7    7    1      25   98.00      3/10/11  46.00
# 8    8    1      10  105.00      3/11/11  88.00
# 9    9    2      48  105.00      2/18/11     NA
# 10  10    2      28  203.00      2/19/11  77.00
# 11  11    2      35   80.99      2/21/11     NA
# 12  12    2      91   78.25      2/22/11 -10.01

@user1342086's edit (that got rejected, but was right indeed):

df$Diff[which(diff(df$User) != 0)] <- NA

seems to take care of the grouping by "User".

like image 95
Arun Avatar answered Sep 29 '22 02:09

Arun