I have a long data set I would like to make wide and I'm curious if there is a way to do this all in one step using the reshape2 or tidyr packages in R.
The data frame df
looks like this:
id type transactions amount
20 income 20 100
20 expense 25 95
30 income 50 300
30 expense 45 250
I'd like to get to this:
id income_transactions expense_transactions income_amount expense_amount
20 20 25 100 95
30 50 45 300 250
I know I can get part of the way there with reshape2 via for example:
dcast(df, id ~ type, value.var="transactions")
But is there a way to reshape the entire df in one shot addressing both the "transactions" and "amount" variables at once? And ideally with new more appropriate column names?
The reshape command can work on more than one variable at a time. In the example above, we just reshaped the age variable.
You can reshape a stacked DataFrame back to its unstacked format with the unstack() function. By default, the innermost level is unstacked. In our example, it was a number. However, you can unstack a different level by passing a level number or name as a parameter to the unstack() method.
To convert long data back into a wide format, we can use the cast function. There are many cast functions, but we will use the dcast function because it is used for data frames.
In "reshape2", you can use recast
(though in my experience, this isn't a widely known function).
library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
# id transactions_expense transactions_income amount_expense amount_income
# 1 20 25 20 95 100
# 2 30 45 50 250 300
You can also use base R's reshape
:
reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
# id transactions.income amount.income transactions.expense amount.expense
# 1 20 20 100 25 95
# 3 30 50 300 45 250
Or, you can melt
and dcast
, like this (here with "data.table"):
library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")),
id ~ variable + type, value.var = "value")
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300
In later versions of dcast.data.table
from "data.table" (1.9.8) you will be able to do this directly. If I understand correctly, what @Arun is trying to implement would be doing the reshaping without first having to melt
the data, which is what happens presently with recast
, which is essentially a wrapper for a melt
+ dcast
sequence of operations.
And, for thoroughness, here's the tidyr
approach:
library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, transactions:amount) %>%
unite(var2, type, var) %>%
spread(var2, val)
# id expense_amount expense_transactions income_amount income_transactions
# 1 20 95 25 100 20
# 2 30 250 45 300 50
With data.table v1.9.6+, we can cast multiple value.var
columns simultaneously (and also use multiple aggregation functions in fun.aggregate
). Please see ?dcast
for more and also the examples section.
require(data.table) # v1.9.6+
dcast(dt, id ~ type, value.var=names(dt)[3:4])
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With