Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape multiple values at once

I have a long data set I would like to make wide and I'm curious if there is a way to do this all in one step using the reshape2 or tidyr packages in R.

The data frame df looks like this:

id  type    transactions    amount
20  income       20          100
20  expense      25          95
30  income       50          300
30  expense      45          250

I'd like to get to this:

id  income_transactions expense_transactions    income_amount   expense_amount
20       20                           25                 100             95
30       50                           45                 300             250

I know I can get part of the way there with reshape2 via for example:

dcast(df, id ~  type, value.var="transactions")

But is there a way to reshape the entire df in one shot addressing both the "transactions" and "amount" variables at once? And ideally with new more appropriate column names?

like image 807
Dirk Calloway Avatar asked Dec 02 '14 10:12

Dirk Calloway


People also ask

Can you reshape multiple variables Stata?

The reshape command can work on more than one variable at a time. In the example above, we just reshaped the age variable.

How do you reshape a data set?

You can reshape a stacked DataFrame back to its unstacked format with the unstack() function. By default, the innermost level is unstacked. In our example, it was a number. However, you can unstack a different level by passing a level number or name as a parameter to the unstack() method.

How do I convert long data to wide data in R?

To convert long data back into a wide format, we can use the cast function. There are many cast functions, but we will use the dcast function because it is used for data frames.


2 Answers

In "reshape2", you can use recast (though in my experience, this isn't a widely known function).

library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
#   id transactions_expense transactions_income amount_expense amount_income
# 1 20                   25                  20             95           100
# 2 30                   45                  50            250           300

You can also use base R's reshape:

reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
#   id transactions.income amount.income transactions.expense amount.expense
# 1 20                  20           100                   25             95
# 3 30                  50           300                   45            250

Or, you can melt and dcast, like this (here with "data.table"):

library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")), 
                 id ~ variable + type, value.var = "value")
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300

In later versions of dcast.data.table from "data.table" (1.9.8) you will be able to do this directly. If I understand correctly, what @Arun is trying to implement would be doing the reshaping without first having to melt the data, which is what happens presently with recast, which is essentially a wrapper for a melt + dcast sequence of operations.


And, for thoroughness, here's the tidyr approach:

library(dplyr)
library(tidyr)
mydf %>% 
  gather(var, val, transactions:amount) %>% 
  unite(var2, type, var) %>% 
  spread(var2, val)
#   id expense_amount expense_transactions income_amount income_transactions
# 1 20             95                   25           100                  20
# 2 30            250                   45           300                  50
like image 78
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 23 '22 08:10

A5C1D2H2I1M1N2O1R2T1


With data.table v1.9.6+, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast for more and also the examples section.

require(data.table) # v1.9.6+
dcast(dt, id ~ type, value.var=names(dt)[3:4])
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300
like image 6
Arun Avatar answered Oct 23 '22 10:10

Arun