I have the following data set:
sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
Case = c(1,1,1,1,2,2,3,3,3,4,5),
Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"))
sample.data
Step Case Decision
1 1 1 Referred
2 2 1 Referred
3 3 1 Referred
4 4 1 Approved
5 1 2 Referred
6 2 2 Declined
7 1 3 Referred
8 2 3 Referred
9 3 3 Declined
10 1 4 Approved
11 1 5 Declined
Is it possible in R to translate this into a wide table format, with the decisions on the header, and the value of each cell being the count of the occurrence, for example:
Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1
To summarize, if you need to reshape a Pandas dataframe from long to wide, use pd. pivot() . If you need to reshape a Pandas dataframe from wide to long, use pd. melt() .
The easiest way to reshape data between these formats is to use the following two functions from the tidyr package in R: pivot_longer(): Reshapes a data frame from wide to long format. pivot_wider(): Reshapes a data frame from long to wide format.
A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column. Notice that in the wide dataset, each value in the first column is unique.
The aggregation parameter in the dcast
function of the reshape2
-package defaults to length
(= count). In the data.table
-package an improved version of the dcast
function is implemented. So in your case this would be:
library('reshape2') # or library('data.table')
newdf <- dcast(sample.data, Case ~ Decision)
or with using the parameters explicitly:
newdf <- dcast(sample.data, Case ~ Decision,
value.var = "Decision", fun.aggregate = length)
This gives the following dataframe:
> newdf
Case Approved Declined Referred
1 1 1 0 3
2 2 0 1 1
3 3 0 1 2
4 4 1 0 0
5 5 0 1 0
If you don't specify an aggregation function, you get a warning telling you that dcast
is using lenght
as a default.
You can accomplish this with a simple table()
statement. You can play with setting factor levels to get your responses the way you want.
sample.data$Decision <- factor(x = sample.data$Decision,
levels = c("Referred","Approved","Declined"))
table(Case = sample.data$Case,sample.data$Decision)
Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With