Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy way to convert long to wide format with counts [duplicate]

Tags:

r

reshape

I have the following data set:

sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
                          Case = c(1,1,1,1,2,2,3,3,3,4,5),
                          Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"))

sample.data

   Step Case Decision
1     1    1 Referred
2     2    1 Referred
3     3    1 Referred
4     4    1 Approved
5     1    2 Referred
6     2    2 Declined
7     1    3 Referred
8     2    3 Referred
9     3    3 Declined
10    1    4 Approved
11    1    5 Declined

Is it possible in R to translate this into a wide table format, with the decisions on the header, and the value of each cell being the count of the occurrence, for example:

Case    Referred    Approved    Declined
1          3           1            0
2          1           0            1
3          2           0            1
4          0           1            0
5          0           0            1
like image 946
dGecko Avatar asked Dec 22 '15 14:12

dGecko


People also ask

How do you change a data frame from long to wide?

To summarize, if you need to reshape a Pandas dataframe from long to wide, use pd. pivot() . If you need to reshape a Pandas dataframe from wide to long, use pd. melt() .

How do I convert long format to wide format in R?

The easiest way to reshape data between these formats is to use the following two functions from the tidyr package in R: pivot_longer(): Reshapes a data frame from wide to long format. pivot_wider(): Reshapes a data frame from long to wide format.

What is wide format vs long format?

A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column. Notice that in the wide dataset, each value in the first column is unique.


2 Answers

The aggregation parameter in the dcast function of the reshape2-package defaults to length (= count). In the data.table-package an improved version of the dcastfunction is implemented. So in your case this would be:

library('reshape2') # or library('data.table')
newdf <- dcast(sample.data, Case ~ Decision)

or with using the parameters explicitly:

newdf <- dcast(sample.data, Case ~ Decision,
               value.var = "Decision", fun.aggregate = length)

This gives the following dataframe:

> newdf
  Case Approved Declined Referred
1    1        1        0        3
2    2        0        1        1
3    3        0        1        2
4    4        1        0        0
5    5        0        1        0

If you don't specify an aggregation function, you get a warning telling you that dcast is using lenght as a default.

like image 196
Jaap Avatar answered Oct 28 '22 06:10

Jaap


You can accomplish this with a simple table() statement. You can play with setting factor levels to get your responses the way you want.

sample.data$Decision <- factor(x = sample.data$Decision,
                               levels = c("Referred","Approved","Declined"))

table(Case = sample.data$Case,sample.data$Decision)

Case Referred Approved Declined
   1        3        1        0
   2        1        0        1
   3        2        0        1
   4        0        1        0
   5        0        0        1
like image 44
TARehman Avatar answered Oct 28 '22 04:10

TARehman