Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to "unmelt" data with reshape r

I have a data frame that I melted using the reshape package that I would like to "un melt".

here is a toy example of the melted data (real data frame is 500x100 or larger) :

variable<-c(rep("X1",3),rep("X2",3),rep("X3",3))
value<-c(rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3))
dat <-data.frame(variable,value)
dat
 variable     value
1       X1 0.5285376
2       X1 0.5285376
3       X1 0.5285376
4       X2 0.1694908
5       X2 0.1694908
6       X2 0.1694908
7       X3 0.7446906
8       X3 0.7446906
9       X3 0.7446906

Each variable (X1, X2,X3) has values estimated at 3 different times (which in this toy example happen to be the same, but this is never the case).

I would like to get it (back) in the form of :

     X1        X2        X3
1 0.5285376 0.1694908 0.7446906
2 0.5285376 0.1694908 0.7446906
3 0.5285376 0.1694908 0.7446906

Basically, I would like the variable column to be sorted on ID (X1, X2 etc) and become column headings. I have tried various permutations of cast, dcast, recast, etc.. and cant seem to get the data in the format that I want. It was easy enough to 'melt' data from the wide form to the longer form (e.g. the dat datset), but getting it back is proving difficult. Any ideas? I know this is relatively simple, but I am having a hard time conceptualizing how to do this in reshape or reshape2.

Thanks, LP

like image 763
LP_640 Avatar asked Sep 19 '14 13:09

LP_640


People also ask

What is melting and unmelting data in pandas?

In this case, the remaining data variables are treated as data values, and there are only two columns: variable and value. Unmelting, on the other hand, is performed on the data variables to return the values to their original format. After we’ve covered Melting and Unmelting data, let’s look at the Pandas functions that allow us to do the same.

How do I unmelt a Dataframe in Python?

The Exit of the Program. The pivot () function can be used to unmelt a DataFrame object and retrieve the original dataframe. The ‘index’ argument value of the pivot () function should be the same as the ‘id vars’ value. The value of ‘columns’ should be used as the name of the ‘variable’ column.

What is unmelting and how do I use it?

Pivoting, Unmelting or Reverse Melting is used to convert a column with multiple values into several columns of their own. Create a dataframe that contains the data on ID, Name, Marks and Sports of 6 students. Unmelting around the column Sports:

What is reshaping in pandas Dataframe?

Reshaping plays a crucial role in data analysis. Pandas provide function like melt and unmelt for reshaping. melt () is used to convert a wide dataframe into a longer form. This function can be used when there are requirements to consider a specific column as an identifier.


2 Answers

I typically do this by creating an id column and then using dcast:

> dat
  variable     value
1       X1 0.4299397
2       X1 0.4299397
3       X1 0.4299397
4       X2 0.2531551
5       X2 0.2531551
6       X2 0.2531551
7       X3 0.3972119
8       X3 0.3972119
9       X3 0.3972119
> dat$id <- rep(1:3,times = 3)
> dcast(data = dat,formula = id~variable,fun.aggregate = sum,value.var = "value")
  id        X1        X2        X3
1  1 0.4299397 0.2531551 0.3972119
2  2 0.4299397 0.2531551 0.3972119
3  3 0.4299397 0.2531551 0.3972119
like image 185
joran Avatar answered Sep 20 '22 17:09

joran


Depending on how robust you need this to be , the following will correctly cast for varying number of occurrences of variables (and in any order).

> variable<-c(rep("X1",5),rep("X2",4),rep("X3",3))
> value<-c(rep(rnorm(1,.5,.2),5),rep(rnorm(1,.5,.2),4),rep(rnorm(1,.5,.2),3))
> dat <-data.frame(variable,value)
> dat <- dat[order(rnorm(nrow(dat))),]
> dat
   variable     value
11       X3 1.0294454
8        X2 0.6147509
2        X1 0.3537012
7        X2 0.6147509
9        X2 0.6147509
5        X1 0.3537012
4        X1 0.3537012
12       X3 1.0294454
3        X1 0.3537012
1        X1 0.3537012
10       X3 1.0294454
6        X2 0.6147509
> dat$id = numeric(nrow(dat))
> for (i in 1:nrow(dat)){
+   dat_temp <- dat[1:i,]
+   dat[i,]$id <- nrow(dat_temp[dat_temp$variable == dat[i,]$variable,])
+ }
> cast(dat, id~variable, value = 'value')
  id        X1        X2       X3
1  1 0.3537012 0.6147509 1.029445
2  2 0.3537012 0.6147509 1.029445
3  3 0.3537012 0.6147509 1.029445
4  4 0.3537012 0.6147509       NA
5  5 0.3537012        NA       NA
like image 22
Leo Avatar answered Sep 19 '22 17:09

Leo