Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically stack every nth column of a dataframe

I have a date frame called DF with, say, three variables that repeat each other cyclically:

      A      B      C      A      B      C 
1    a1     b1     c1     a5     b5     c5
2    a2     b2     c2     a6     b6     c6
3    a3     b3     c3     a7     b7     c7
4    a4     b4     c4     a8     b8     c8

I want to stack the first A column on the second A column (and on the third, and fourth and so on, if they exist), and do the same with the other variables, and then save the result as new objects (as vectors, for example). So what I want to obtain is

V_A <- c(a1,a2,a3,a4,a5,a6,a7,a8)
V_B <- c(b1,b2,b3,b4,b5,b6,b7,b8)
V_C <- c(c1,c2,c3,c4,c5,c6,c7,c8)

While it's very easy to do it manually, like this

V_A <- DF[,seq(1, ncol(DF), 3]
V_A <- stack(DF)
V_B <- DF[,seq(2, ncol(DF), 3]
V_B <- stack(DF)
V_C <- DF[,seq(3, ncol(DF), 3]
V_C <- stack(DF)

what I'm looking for is a code that does this automatically, so that it will work for data frames with every number of variables without having to write ad-hoc codes every time. To sum up, the code should: 1) select every nth column in the data frame 2) stack this columns 3) save the result in new objects automatically created

I feel there must be a way to do this but I haven't succeeded so far. Thanks very much in advance.

EDIT Let's say I am in a slightly different situation, in which the columns repeat but not with exactly the same name, and I still want to do the same thing. So I have:

     A1      B1      C1      A2      B2      C2 
1    a11     b11     c11     a25     b25     c25
2    a12     b12     c12     a26     b26     c26
3    a13     b13     c13     a27     b27     c27
4    a14     b14     c14     a28     b28     c28

and I want:

V_A <- c(a11,a12,a13,a14,a25,a26,a27,a28)
V_B <- c(b11,b12,b13,b14,b25,b26,b27,b28)
V_C <- c(c11,c12,c13,c14,c25,c26,c27,c28)

How can I do it?

like image 564
Hanx70 Avatar asked Oct 20 '17 01:10

Hanx70


3 Answers

Here are some alternatives. No packages are used.

1) aperm Create a 3d array a, permute the dimensions and reshape into a matrix m and then convert that to a data frame. This one only works if all values are of the same type. (2) and (3) do not have this limitation.

k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
unames <- unique(names(DF))

a <- array(as.matrix(DF), c(nr, k, nc/k))
m <- matrix(aperm(a, c(1, 3, 2)),, k, dimnames = list(NULL, unames))
as.data.frame(m, stringsAsFactors = FALSE)

giving:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

If we are in the situation given in the question's EDIT then replace unames with the following where DF2 is DF with the revised names as per Note at end:

unames <- unique(sub("\\d*$", "", names(DF2)))

2) lapply This generalizes the code in the question. unames is defined above:

L <- lapply(split(as.list(DF), names(DF)), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

giving:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

With the input shown in the question's EDIT it could be done like this where DF2 is given reproducibly in the Note at the end.

names0 <- sub("\\d*$", "", names(DF2))   # names without the trailing digits
L <- lapply(split(as.list(DF2), names0), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

3) reshape nc and unames are from above. varying is a list with k components such as that the ith component contains the index vector c(i, i+k, ...). It seems that reshape does not like duplicated names so we have given it setNames(DF, 1:nc) as the input. This solution does have the advantage of also generating the index vectors time and id which relate the output to the input data.

varying <- split(1:nc, names(DF))
reshape(setNames(DF, 1:nc), dir = "long", varying = varying, v.names = unames)

giving:

    time  A  B  C id
1.1    1 a1 b1 c1  1
2.1    1 a2 b2 c2  2
3.1    1 a3 b3 c3  3
4.1    1 a4 b4 c4  4
1.2    2 a5 b5 c5  1
2.2    2 a6 b6 c6  2
3.2    2 a7 b7 c7  3
4.2    2 a8 b8 c8  4

With the input shown in the question's EDIT it actually simplifies. We no longer need to use setNames(DF, 1:nc) but can just use the data frame as is as input. Also, we can use varying=TRUE (also see @thelatemail's comment) instead of calculating a complex argument for varying. The input DF2 is as shown in the Note at the end and names0 is as in (2) above.

reshape(DF2, dir = "long", varying = TRUE, v.names = unique(names0))

Note:

Lines <- "      A      B      C      A      B      C 
1    a1     b1     c1     a5     b5     c5
2    a2     b2     c2     a6     b6     c6
3    a3     b3     c3     a7     b7     c7
4    a4     b4     c4     a8     b8     c8"
DF <- read.table(text = Lines, as.is = TRUE, check.names = FALSE)

DF2 <- setNames(DF, c("A1", "B1", "C1", "A2", "B2", "C2")) # test input

Upate: A number of simplifications. Also added DF2 in Note at end and discuss in each alternative how to modify the code to deal with it. (A general method might be just to reduce DF2 to DF as I discussed in the comments below.)

like image 72
G. Grothendieck Avatar answered Oct 21 '22 16:10

G. Grothendieck


stack and unstack

NEWDF=stack(DF)
NEWDF$ind=as.character(NEWDF$ind)
NEWDF$ind=rep(NEWDF$ind[1:(dim(NEWDF)[1]/2)],2)
unstack(NEWDF)
   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
like image 34
BENY Avatar answered Oct 21 '22 15:10

BENY


We can split column-wise then row bind (using G. Grothendieck's example data, DF and DF2):

library(data.table) #rbindlist for binding columns with different names.

k <- 3
nr <- nrow(DF)
nc <- ncol(DF)

rbindlist(split.default(DF, rep(1:(nc/k), each = k)))
#     A  B  C
# 1: a1 b1 c1
# 2: a2 b2 c2
# 3: a3 b3 c3
# 4: a4 b4 c4
# 5: a5 b5 c5
# 6: a6 b6 c6
# 7: a7 b7 c7
# 8: a8 b8 c8

rbindlist(split.default(DF2, rep(1:(nc/k), each = k)), use.names = FALSE)
#    A1 B1 C1
# 1: a1 b1 c1
# 2: a2 b2 c2
# 3: a3 b3 c3
# 4: a4 b4 c4
# 5: a5 b5 c5
# 6: a6 b6 c6
# 7: a7 b7 c7
# 8: a8 b8 c8
like image 1
zx8754 Avatar answered Oct 21 '22 16:10

zx8754