I have a quite large dataframe structured like this:
id x1 x2 x3 y1 y2 y3 z1 z2 z3 v
1 2 4 5 10 20 15 200 150 170 2.5
2 3 7 6 25 35 40 300 350 400 4.2
I need to create a dataframe like this:
id xsource xvalue yvalue zvalue v
1 x1 2 10 200 2.5
1 x2 4 20 150 2.5
1 x3 5 15 170 2.5
2 x1 3 25 300 4.2
2 x2 7 35 350 4.2
2 x3 6 40 400 4.2
I'm quite sure I have to do it with the reshape package, but I'm not able to get what I want.
Could you help me?
Thanks
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
reshape2 is an R package written by Hadley Wickham that makes it easy to transform data between wide and long formats.
Here's the reshape()
solution.
The key bit is that the varying=
argument can take a list of vectors of column names in the wide format that correspond to single variables in the long format. In this case, columns "x1", "x2", "x3"
in the original data frame are sent to one column in the long data frame, columns "y1, y2, y3"
will go into a second column, and so on.
# Read in the original data, x, from Andrie's answer
res <- reshape(x, direction = "long", idvar = "id",
varying = list(c("x1","x2", "x3"),
c("y1", "y2", "y3"),
c("z1", "z2", "z3")),
v.names = c("xvalue", "yvalue", "zvalue"),
timevar = "xsource", times = c("x1", "x2", "x3"))
# id v xsource xvalue yvalue zvalue
# 1.x1 1 2.5 x1 2 10 200
# 2.x1 2 4.2 x1 3 25 300
# 1.x2 1 2.5 x2 4 20 150
# 2.x2 2 4.2 x2 7 35 350
# 1.x3 1 2.5 x3 5 15 170
# 2.x3 2 4.2 x3 6 40 400
Finally, a couple of purely cosmetic steps are needed to get the results looking exactly as shown in your question:
res <- res[order(res$id, res$xsource), c(1,3,4,5,6,2)]
row.names(res) <- NULL
res
# id xsource xvalue yvalue zvalue v
# 1 1 x1 2 10 200 2.5
# 2 1 x2 4 20 150 2.5
# 3 1 x3 5 15 170 2.5
# 4 2 x1 3 25 300 4.2
# 5 2 x2 7 35 350 4.2
# 6 2 x3 6 40 400 4.2
Here's one approach that use reshape2
and is described in depth in my paper on tidy data.
Step 1: identify the variables that are already in columns. In this case: id, and v. These are the variables we melt by
library(reshape2)
xm <- melt(x, c("id", "v"))
Step 2: split up variables that are currently combined in one column. In this case that's source (the character part) and rep (the integer part):
There are lots of ways to do this, I'm going to use string extraction with the stringr
package
library(stringr)
xm$source <- str_sub(xm$variable, 1, 1)
xm$rep <- str_sub(xm$variable, 2, 2)
xm$variable <- NULL
Step 3: rearrange the variables that currently in the rows but we want in columns:
dcast(xm, ... ~ source)
# id v rep x y z
# 1 1 2.5 1 2 10 200
# 2 1 2.5 2 4 20 150
# 3 1 2.5 3 5 15 170
# 4 2 4.2 1 3 25 300
# 5 2 4.2 2 7 35 350
# 6 2 4.2 3 6 40 400
Somebody please prove me wrong, but I don't think it's easy to solve this problem using either the reshape
package or the base reshape
function.
However, it's easy enough using lapply
and do.call
:
Replicate the data:
x <- read.table(text="
id x1 x2 x3 y1 y2 y3 z1 z2 z3 v
1 2 4 5 10 20 15 200 150 170 2.5
2 3 7 6 25 35 40 300 350 400 4.2
", header=TRUE)
Do the analysis
chunks <- lapply(1:nrow(x),
function(i)cbind(x[i, 1], 1:3, matrix(x[i, 2:10], ncol=3), x[i, 11]))
res <- do.call(rbind, chunks)
colnames(res) <- c("id", "source", "x", "y", "z", "v")
res
id source x y z v
[1,] 1 1 2 10 200 2.5
[2,] 1 2 4 20 150 2.5
[3,] 1 3 5 15 170 2.5
[4,] 2 1 3 25 300 4.2
[5,] 2 2 7 35 350 4.2
[6,] 2 3 6 40 400 4.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With