I have a date frame called DF with, say, three variables that repeat each other cyclically:
A B C A B C
1 a1 b1 c1 a5 b5 c5
2 a2 b2 c2 a6 b6 c6
3 a3 b3 c3 a7 b7 c7
4 a4 b4 c4 a8 b8 c8
I want to stack the first A column on the second A column (and on the third, and fourth and so on, if they exist), and do the same with the other variables, and then save the result as new objects (as vectors, for example). So what I want to obtain is
V_A <- c(a1,a2,a3,a4,a5,a6,a7,a8)
V_B <- c(b1,b2,b3,b4,b5,b6,b7,b8)
V_C <- c(c1,c2,c3,c4,c5,c6,c7,c8)
While it's very easy to do it manually, like this
V_A <- DF[,seq(1, ncol(DF), 3]
V_A <- stack(DF)
V_B <- DF[,seq(2, ncol(DF), 3]
V_B <- stack(DF)
V_C <- DF[,seq(3, ncol(DF), 3]
V_C <- stack(DF)
what I'm looking for is a code that does this automatically, so that it will work for data frames with every number of variables without having to write ad-hoc codes every time. To sum up, the code should: 1) select every nth column in the data frame 2) stack this columns 3) save the result in new objects automatically created
I feel there must be a way to do this but I haven't succeeded so far. Thanks very much in advance.
EDIT Let's say I am in a slightly different situation, in which the columns repeat but not with exactly the same name, and I still want to do the same thing. So I have:
A1 B1 C1 A2 B2 C2
1 a11 b11 c11 a25 b25 c25
2 a12 b12 c12 a26 b26 c26
3 a13 b13 c13 a27 b27 c27
4 a14 b14 c14 a28 b28 c28
and I want:
V_A <- c(a11,a12,a13,a14,a25,a26,a27,a28)
V_B <- c(b11,b12,b13,b14,b25,b26,b27,b28)
V_C <- c(c11,c12,c13,c14,c25,c26,c27,c28)
How can I do it?
Here are some alternatives. No packages are used.
1) aperm Create a 3d array a
, permute the dimensions and reshape into a matrix m
and then convert that to a data frame. This one only works if all values are of the same type. (2) and (3) do not have this limitation.
k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
unames <- unique(names(DF))
a <- array(as.matrix(DF), c(nr, k, nc/k))
m <- matrix(aperm(a, c(1, 3, 2)),, k, dimnames = list(NULL, unames))
as.data.frame(m, stringsAsFactors = FALSE)
giving:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
If we are in the situation given in the question's EDIT then replace unames
with the following where DF2 is DF with the revised names as per Note at end:
unames <- unique(sub("\\d*$", "", names(DF2)))
2) lapply This generalizes the code in the question. unames
is defined above:
L <- lapply(split(as.list(DF), names(DF)), unlist)
as.data.frame(L, stringsAsFactors = FALSE)
giving:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
With the input shown in the question's EDIT it could be done like this where DF2
is given reproducibly in the Note at the end.
names0 <- sub("\\d*$", "", names(DF2)) # names without the trailing digits
L <- lapply(split(as.list(DF2), names0), unlist)
as.data.frame(L, stringsAsFactors = FALSE)
3) reshape nc
and unames
are from above. varying
is a list with k
components such as that the ith component contains the index vector c(i, i+k, ...)
. It seems that reshape
does not like duplicated names so we have given it setNames(DF, 1:nc)
as the input. This solution does have the advantage of also generating the index vectors time
and id
which relate the output to the input data.
varying <- split(1:nc, names(DF))
reshape(setNames(DF, 1:nc), dir = "long", varying = varying, v.names = unames)
giving:
time A B C id
1.1 1 a1 b1 c1 1
2.1 1 a2 b2 c2 2
3.1 1 a3 b3 c3 3
4.1 1 a4 b4 c4 4
1.2 2 a5 b5 c5 1
2.2 2 a6 b6 c6 2
3.2 2 a7 b7 c7 3
4.2 2 a8 b8 c8 4
With the input shown in the question's EDIT it actually simplifies. We no longer need to use setNames(DF, 1:nc)
but can just use the data frame as is as input. Also, we can use varying=TRUE
(also see @thelatemail's comment) instead of calculating a complex argument for varying
. The input DF2
is as shown in the Note at the end and names0
is as in (2) above.
reshape(DF2, dir = "long", varying = TRUE, v.names = unique(names0))
Note:
Lines <- " A B C A B C
1 a1 b1 c1 a5 b5 c5
2 a2 b2 c2 a6 b6 c6
3 a3 b3 c3 a7 b7 c7
4 a4 b4 c4 a8 b8 c8"
DF <- read.table(text = Lines, as.is = TRUE, check.names = FALSE)
DF2 <- setNames(DF, c("A1", "B1", "C1", "A2", "B2", "C2")) # test input
Upate: A number of simplifications. Also added DF2
in Note at end and discuss in each alternative how to modify the code to deal with it. (A general method might be just to reduce DF2 to DF as I discussed in the comments below.)
stack
and unstack
NEWDF=stack(DF)
NEWDF$ind=as.character(NEWDF$ind)
NEWDF$ind=rep(NEWDF$ind[1:(dim(NEWDF)[1]/2)],2)
unstack(NEWDF)
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
We can split column-wise then row bind (using G. Grothendieck's example data, DF and DF2):
library(data.table) #rbindlist for binding columns with different names.
k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
rbindlist(split.default(DF, rep(1:(nc/k), each = k)))
# A B C
# 1: a1 b1 c1
# 2: a2 b2 c2
# 3: a3 b3 c3
# 4: a4 b4 c4
# 5: a5 b5 c5
# 6: a6 b6 c6
# 7: a7 b7 c7
# 8: a8 b8 c8
rbindlist(split.default(DF2, rep(1:(nc/k), each = k)), use.names = FALSE)
# A1 B1 C1
# 1: a1 b1 c1
# 2: a2 b2 c2
# 3: a3 b3 c3
# 4: a4 b4 c4
# 5: a5 b5 c5
# 6: a6 b6 c6
# 7: a7 b7 c7
# 8: a8 b8 c8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With