I have a data frame of factors called questions
q1 q2 q3
A A B
C A A
A B C
That I want to reshape into
question answer freq
1 A 2
1 B 0
1 C 1
2 A 2
2 B 1
2 C 0
3 A 1
3 B 1
3 C 1
I feel like there should be a way to this with reshape2 or plyr, but I couldn't figure it out.
Instead, I did the following:
tbl <- data.frame()
for(i in 1:dim(questions)[2]){
subtable <- cbind(question = rep(i, 3),
as.data.frame(table(questions[i])))
tbl <- rbind(tbl, subtable)
}
Is there a cleaner approach to reshaping this table?
The easiest way to reshape data between these formats is to use the following two functions from the tidyr package in R: pivot_longer(): Reshapes a data frame from wide to long format. pivot_wider(): Reshapes a data frame from long to wide format.
We can use pivot() function to unmelt a DataFrame object and get the original dataframe. The pivot() function 'index' parameter value should be same as the 'id_vars' value. The 'columns' value should be passed as the name of the 'variable' column. The unmelted DataFrame values are the same as the original DataFrame.
Data Visualization using R Programming Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame.
Here's a base R approach that is similar in concept to what @akrun has posted. I haven't bothered with the cleaning up since that's mostly cosmetic and isn't related to the concept of the question.
The general approach would be:
data.frame(table(stack(mydf))
However, stack
won't work with factor
s, so if your data are factor
s and not character
s, you will have to convert using as.character
first, like this:
data.frame(table(stack(lapply(mydf, as.character))))
# values ind Freq
# 1 A q1 2
# 2 B q1 0
# 3 C q1 1
# 4 A q2 2
# 5 B q2 1
# 6 C q2 0
# 7 A q3 1
# 8 B q3 1
# 9 C q3 1
Moving away from "plyr" and "reshape2" and instead towards "dplyr" and "tidyr", you can try:
library(dplyr)
library(tidyr)
mydf %>%
gather(question, answer, everything()) %>% ## Get the data into a long form
group_by(question, answer) %>% ## Group by both question and answer columns
summarise(freq = n()) %>% ## Calculate the relevant frequency
right_join(expand(., question, answer)) ## Merge with all combinations of Qs and As
# Joining by: c("question", "answer")
# Source: local data frame [9 x 3]
# Groups: question
#
# question answer freq
# 1 q1 A 2
# 2 q1 B NA
# 3 q1 C 1
# 4 q2 A 2
# 5 q2 B 1
# 6 q2 C NA
# 7 q3 A 1
# 8 q3 B 1
# 9 q3 C 1
Yeah, it's a little tricky because of the zeros. After melting, instead of casting directly into the form you need, cast into wide form and then melt again. There's probably a just as easy was using base R and table
though.
d <- read.table(text="q1 q2 q3
A A B
C A A
A B C", header=TRUE, as.is=TRUE)
melt(dcast(melt(d, measure.vars=1:3), value ~ variable))
## Aggregation function missing: defaulting to length
## Using value as id variables
## value variable value
## 1 A q1 2
## 2 B q1 0
## 3 C q1 1
## 4 A q2 2
## 5 B q2 1
## 6 C q2 0
## 7 A q3 1
## 8 B q3 1
## 9 C q3 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With