Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reshape this dataframe with the reshape package [duplicate]

Tags:

r

reshape

I have a quite large dataframe structured like this:

id    x1    x2    x3    y1    y2    y3    z1    z2    z3     v 
 1     2     4     5    10    20    15   200   150   170   2.5
 2     3     7     6    25    35    40   300   350   400   4.2

I need to create a dataframe like this:

id   xsource   xvalue   yvalue   zvalue       v 
 1        x1        2       10      200     2.5
 1        x2        4       20      150     2.5
 1        x3        5       15      170     2.5
 2        x1        3       25      300     4.2
 2        x2        7       35      350     4.2
 2        x3        6       40      400     4.2

I'm quite sure I have to do it with the reshape package, but I'm not able to get what I want.

Could you help me?

Thanks

like image 303
corrado Avatar asked Jan 13 '12 15:01

corrado


People also ask

How do I reshape my data frame?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.

What is reshape package in R?

reshape2 is an R package written by Hadley Wickham that makes it easy to transform data between wide and long formats.


3 Answers

Here's the reshape() solution.

The key bit is that the varying= argument can take a list of vectors of column names in the wide format that correspond to single variables in the long format. In this case, columns "x1", "x2", "x3" in the original data frame are sent to one column in the long data frame, columns "y1, y2, y3" will go into a second column, and so on.

# Read in the original data, x, from Andrie's answer

res <- reshape(x, direction = "long", idvar = "id",
               varying = list(c("x1","x2", "x3"), 
                              c("y1", "y2", "y3"), 
                              c("z1", "z2", "z3")),
               v.names = c("xvalue", "yvalue", "zvalue"), 
               timevar = "xsource", times = c("x1", "x2", "x3"))
#      id   v xsource xvalue yvalue zvalue
# 1.x1  1 2.5      x1      2     10    200
# 2.x1  2 4.2      x1      3     25    300
# 1.x2  1 2.5      x2      4     20    150
# 2.x2  2 4.2      x2      7     35    350
# 1.x3  1 2.5      x3      5     15    170
# 2.x3  2 4.2      x3      6     40    400

Finally, a couple of purely cosmetic steps are needed to get the results looking exactly as shown in your question:

res <- res[order(res$id, res$xsource), c(1,3,4,5,6,2)]
row.names(res) <- NULL
res
#   id xsource xvalue yvalue zvalue   v
# 1  1      x1      2     10    200 2.5
# 2  1      x2      4     20    150 2.5
# 3  1      x3      5     15    170 2.5
# 4  2      x1      3     25    300 4.2
# 5  2      x2      7     35    350 4.2
# 6  2      x3      6     40    400 4.2
like image 162
Josh O'Brien Avatar answered Oct 22 '22 01:10

Josh O'Brien


Here's one approach that use reshape2 and is described in depth in my paper on tidy data.

Step 1: identify the variables that are already in columns. In this case: id, and v. These are the variables we melt by

library(reshape2)
xm <- melt(x, c("id", "v"))

Step 2: split up variables that are currently combined in one column. In this case that's source (the character part) and rep (the integer part):

There are lots of ways to do this, I'm going to use string extraction with the stringr package

library(stringr)
xm$source <- str_sub(xm$variable, 1, 1)
xm$rep <- str_sub(xm$variable, 2, 2)
xm$variable <- NULL

Step 3: rearrange the variables that currently in the rows but we want in columns:

dcast(xm, ... ~ source)

#   id   v rep x  y   z
# 1  1 2.5     1 2 10 200
# 2  1 2.5     2 4 20 150
# 3  1 2.5     3 5 15 170
# 4  2 4.2     1 3 25 300
# 5  2 4.2     2 7 35 350
# 6  2 4.2     3 6 40 400
like image 37
hadley Avatar answered Oct 22 '22 01:10

hadley


Somebody please prove me wrong, but I don't think it's easy to solve this problem using either the reshape package or the base reshape function.

However, it's easy enough using lapply and do.call:

Replicate the data:

x <- read.table(text="
id    x1    x2    x3    y1    y2    y3    z1    z2    z3     v 
1     2     4     5    10    20    15   200   150   170   2.5
2     3     7     6    25    35    40   300   350   400   4.2
", header=TRUE)

Do the analysis

chunks <- lapply(1:nrow(x), 
    function(i)cbind(x[i, 1], 1:3, matrix(x[i, 2:10], ncol=3), x[i, 11]))
res <- do.call(rbind, chunks)
colnames(res) <- c("id", "source", "x", "y", "z", "v")
res

     id source x y  z   v  
[1,] 1  1      2 10 200 2.5
[2,] 1  2      4 20 150 2.5
[3,] 1  3      5 15 170 2.5
[4,] 2  1      3 25 300 4.2
[5,] 2  2      7 35 350 4.2
[6,] 2  3      6 40 400 4.2
like image 40
Andrie Avatar answered Oct 22 '22 02:10

Andrie