I am working on a research project, and one of the tables is entered in a way that is not quite suitable for analysis yet, so I am trying to reorganize it. Currently, each row is a test-taker, and each column is a question they have answered wrongly, entered in ascending order. So, for the first row, entries may read "Q1" "Q3" "Q9" etc. for the first, second and third columns respectively. There are 25 questions in total.
My goal is to reorganize the data such that there is a column for every question. If a test-taker has answered the question correctly, the entry for the respective column reads 1, and 0 if otherwise.
There is a brute-force way that appears to work. It is possible to mutate each column separately and check for each value in each column. However, there are 25 questions, and all this typing seems extremely inefficient, so I suspect there must be a better way.
The brute-force code looks something like:
df %>%
mutate(Q3 == ifelse(col1 == "Q3" | col2 == "Q3" | col3 == "Q3", 0, 1))
Here, col1, col2, col3 are all columns that might contain Q3, which might be a question the test-taker got wrong. If any of them do, we enter 0. Else, we enter 1.
With 25 questions, the code becomes too long.
EDIT: A sample of the dataframe looks like this.
sample <- "ID Col1 Col2 Col3 Col4
1 100 Q1
2 101 Q3 Q4
3 102 Q2 Q3 Q4
4 103
5 104 Q4
6 105 Q1 Q2 Q3 Q4 "
The desired output is as follows:
sample <- "ID Q1 Q2 Q3 Q4
1 100 0 1 1 1
2 101 1 1 0 0
3 102 1 0 0 0
4 103 1 1 1 1
5 104 1 1 1 0
6 105 0 0 0 0 "
this was my solution- turning the data from wide to long and back again
s <- reshape2::melt(sample, id.vars = "ID", value.name = "Q")
s$variable <- 1
s <- subset(s, complete.cases(s))
s <- reshape(s, idvar = "ID", timevar = "Q", direction = "wide")
s <- apply(s, 2, function(x) ifelse(is.na(x), 0, x))
1) Assuming DF
is as shown reproducibly in the Note at the end use sapply
to create an matrix of indicators and then cbind
it to the ID
column. Finally make the names nicer. No packages are used.
ques <- function(i) paste0("Q", 1:25) %in% unlist(DF[i, -1])
DFout <- cbind(DF[1], +t(sapply(1:nrow(DF), ques)))
names(DFout)[-1] <- paste0("Q", names(DFout[-1]))
The first 5 columns are:
> DFout[1:5]
ID Q1 Q2 Q3 Q4
1 100 1 0 0 0
2 101 0 0 1 1
3 102 0 1 1 1
4 103 0 0 0 0
5 104 0 0 0 1
6 105 1 1 1 1
2) Another possibility is to convert the input to long form and then use xtabs
to create a table from it.
library(dplyr)
library(tidyr)
tab <- DF %>%
gather(key, Question, -ID) %>%
filter(nzchar(Question)) %>%
mutate(Question = factor(Question, paste0("Q", 1:25))) %>%
xtabs(~ ID + Question, .)
giving this table. We show the first 5 columns:
> tab[, 1:5]
Question
ID Q1 Q2 Q3 Q4
100 1 0 0 0
101 0 0 1 1
102 0 1 1 1
104 0 0 0 1
105 1 1 1 1
If it is important that the result be a data frame then add:
library(tibble)
tab %>%
as.data.frame.matrix %>%
rownames_to_column(var = "ID")
sample <- "rows ID Col1 Col2 Col3 Col4
1 100 Q1
2 101 Q3 Q4
3 102 Q2 Q3 Q4
4 103
5 104 Q4
6 105 Q1 Q2 Q3 Q4"
DF <- read.table(text = sample, header = TRUE, fill = TRUE, as.is = TRUE,
strip.white = TRUE)[-1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With