Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Merging multiple columns into one by group (twice in the same dataframe)

Tags:

r

So I have a dataframe that looks like this:

      Day HL.Average D.Average LL.Average noHKB.Average    HL.SD      D.SD    LL.SD noHKB.SD
1 0.00000       8760      8900      10000          8030 2337.844  924.2742 1120.785 1592.646
2 1.90625      13300     11900      12100          3860 1016.291 2308.2661 3581.763 1031.057
3 3.00000      14500      7320      12300          1750 2945.098 1308.0389 4338.897 1793.583
4 4.00000      16200      9160      15100          2710 1006.893  514.2177 4362.261 2691.648

And I'm trying to organize it into this:

(The numbers in this table is a little different because I made this table in excel instead...)

Day Group   Average SD
0   HL  8.76E+03    2337.843612
2   HL  1.33E+04    1016.291341
3   HL  1.45E+04    2945.098059
4   HL  1.62E+04    1006.892976
0   LL  1.00E+04    1120.785009
2   LL  1.21E+04    3581.762692
3   LL  1.23E+04    4338.897375
4   LL  1.51E+04    4362.260616
0   D   8.90E+03    924.2741667
2   D   1.19E+04    2308.266118
3   D   7.32E+03    1308.038881
4   D   9.16E+03    514.2176747

I thought about using stack() and then merge(), but the problem is that my original "xx.Average" and "xx.SD" headings complicates the resulting ind from stack...

Does anyone know of a command/way to easily organize the data?

Thanks so much!

like image 263
Fenrir Avatar asked Aug 04 '15 00:08

Fenrir


2 Answers

Here is my clumsy solution. It works but it requires some package (dplyr as well for the pipe operator %>% but that is just a personal choice.). It has the melt function from the reshape2 package, the separate from tidyr to separate columns (for example LL.Average into LL and Average) and the dcast from reshape2 to spread with a formula. Note that I did not changed the noHKB but this might be done with an easy step, it is just a personal choice where to do it.

library(reshape2)
library(dplyr)
library(tidyr)

df %>% melt(id.vars = "Day") %>%
separate(variable, c("Group", "Measure"), sep = "\\.") %>%
dcast(...~ Measure) -> df1
df1

       Day Group Average        SD
1  0.00000     D    8900  924.2742
2  0.00000    HL    8760 2337.8440
3  0.00000    LL   10000 1120.7850
4  0.00000 noHKB    8030 1592.6460
5  1.90625     D   11900 2308.2661
6  1.90625    HL   13300 1016.2910
7  1.90625    LL   12100 3581.7630
8  1.90625 noHKB    3860 1031.0570
9  3.00000     D    7320 1308.0389
10 3.00000    HL   14500 2945.0980
11 3.00000    LL   12300 4338.8970
12 3.00000 noHKB    1750 1793.5830
13 4.00000     D    9160  514.2177
14 4.00000    HL   16200 1006.8930
15 4.00000    LL   15100 4362.2610
16 4.00000 noHKB    2710 2691.6480
like image 129
SabDeM Avatar answered Oct 23 '22 00:10

SabDeM


@ananda-mahto's splitstackshape package is designed for this type of problem!

Unfortunately it expects the variable names to be prefixes, rather than postfixes, so we will first need to reverse the variable names

library(splitstackshape)
# reverse variable names:
names(df) <- sapply(strsplit(names(df), "\\."), function(x) paste(rev(x), collapse="."))
# Reshape the data frame:
newdf <- merged.stack(df, id.vars=c("Day"), var.stubs=c("Average", "SD"), sep=".")
setnames(newdf, ".time_1", "Group")
newdf
##         Day   Group Average        SD
##  1: 0.00000       D    8900  924.2742
##  2: 0.00000      HL    8760 2337.8440
##  3: 0.00000      LL   10000 1120.7850
##  4: 0.00000   noHKB    8030 1592.6460
##  5: 1.90625       D   11900 2308.2661
##  6: 1.90625      HL   13300 1016.2910
##  7: 1.90625      LL   12100 3581.7630
##  8: 1.90625   noHKB    3860 1031.0570
##  9: 3.00000       D    7320 1308.0389
## 10: 3.00000      HL   14500 2945.0980
## 11: 3.00000      LL   12300 4338.8970
## 12: 3.00000   noHKB    1750 1793.5830
## 13: 4.00000       D    9160  514.2177
## 14: 4.00000      HL   16200 1006.8930
## 15: 4.00000      LL   15100 4362.2610
## 16: 4.00000   noHKB    2710 2691.6480
like image 34
Scott Ritchie Avatar answered Oct 22 '22 23:10

Scott Ritchie