I am working with some data in R
. My dataframe DF
looks like this (I add the dput()
version in the final side):
ID S.2014.01.01 S.2014.01.02 S.2014.01.03 S.2014.01.04
1 001 1 10 5 74
2 002 2 15 6 75
3 003 3 23 7 76
4 004 4 31 8 77
5 005 5 39 9 78
6 006 6 47 10 79
7 007 7 55 11 80
8 008 8 63 12 81
9 009 9 71 13 82
10 010 10 79 14 83
DF
contains an ID variable and many columns that explain values by days (In this example I include only 4 variables, real dataframe has more than 100 variables in this style). My goal is to compute the difference between each pair of variables. For example, I would like to compute the difference between variables S.2014.01.02
and S.2014.01.01
and then save the values in a new variable named D.2014.01.02
. It is the same process for the next variables. The next case would be S.2014.01.03
and S.2014.01.02
and then create a new column named D.2014.01.03
.
I have tried different solutions due to the number of columns in my real dataframe. One solution would be to compute one by one but is not optimal. Also, I have tried using mutate_each()
function from dplyr
package but I don't know how to set to take pairs of columns and then create new ones. Moreover, I have tried with lag()
function from the same package but it doesn't work. The reason why I have used this function is because I could need to compute not only differences by pairs of columns, but also I could need a difference between for example each two or three columns instead of one (pairs). I would like to get a dataframe like this:
ID S.2014.01.01 S.2014.01.02 S.2014.01.03 S.2014.01.04 D.2014.01.02 D.2014.01.03 D.2014.01.04
1 001 1 10 5 74 9 -5 69
2 002 2 15 6 75 13 -9 69
3 003 3 23 7 76 20 -16 69
4 004 4 31 8 77 27 -23 69
5 005 5 39 9 78 34 -30 69
6 006 6 47 10 79 41 -37 69
7 007 7 55 11 80 48 -44 69
8 008 8 63 12 81 55 -51 69
9 009 9 71 13 82 62 -58 69
10 010 10 79 14 83 69 -65 69
In this dataframe we can see the new variables that start with D
and they are the result of the difference of pair of columns. I f you could give some advice about this situation with two variables would be fantastic, but if you could help me with a version for the difference each 2 or 3 columns would be marvelous. The dput()
version of DF
is the next:
DF<-structure(list(ID = c("001", "002", "003", "004", "005", "006",
"007", "008", "009", "010"), S.2014.01.01 = c(1, 2, 3, 4, 5,
6, 7, 8, 9, 10), S.2014.01.02 = c(10, 15, 23, 31, 39, 47, 55,
63, 71, 79), S.2014.01.03 = c(5, 6, 7, 8, 9, 10, 11, 12, 13,
14), S.2014.01.04 = c(74, 75, 76, 77, 78, 79, 80, 81, 82, 83)), .Names = c("ID",
"S.2014.01.01", "S.2014.01.02", "S.2014.01.03", "S.2014.01.04"
), row.names = c(NA, -10L), class = "data.frame")
Thanks for your help!
There is no need to transpose or use any vectorisation functions.
DF <- cbind(DF, DF[,3:5] - DF[,2:4])
names(DF)[6: 8] = gsub("S", "D", names(DF)[6: 8])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With