Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to loop through columns, check if a particular value exists in any of the columns, mutate a new column and enter 1 if it exists, 0 if not?

I am working on a research project, and one of the tables is entered in a way that is not quite suitable for analysis yet, so I am trying to reorganize it. Currently, each row is a test-taker, and each column is a question they have answered wrongly, entered in ascending order. So, for the first row, entries may read "Q1" "Q3" "Q9" etc. for the first, second and third columns respectively. There are 25 questions in total.

My goal is to reorganize the data such that there is a column for every question. If a test-taker has answered the question correctly, the entry for the respective column reads 1, and 0 if otherwise.

There is a brute-force way that appears to work. It is possible to mutate each column separately and check for each value in each column. However, there are 25 questions, and all this typing seems extremely inefficient, so I suspect there must be a better way.

The brute-force code looks something like:

df %>%
  mutate(Q3 == ifelse(col1 == "Q3" | col2 == "Q3" | col3 == "Q3", 0, 1))

Here, col1, col2, col3 are all columns that might contain Q3, which might be a question the test-taker got wrong. If any of them do, we enter 0. Else, we enter 1.

With 25 questions, the code becomes too long.

EDIT: A sample of the dataframe looks like this.

sample <- "ID   Col1  Col2  Col3  Col4
1          100   Q1     
2          101   Q3    Q4
3          102   Q2    Q3    Q4   
4          103   
5          104   Q4
6          105   Q1    Q2    Q3    Q4 "

The desired output is as follows:

sample <- "ID    Q1    Q2    Q3    Q4
1          100   0     1     1     1
2          101   1     1     0     0
3          102   1     0     0     0   
4          103   1     1     1     1
5          104   1     1     1     0 
6          105   0     0     0     0 "
like image 901
Johe Houston Avatar asked Sep 18 '19 15:09

Johe Houston


2 Answers

this was my solution- turning the data from wide to long and back again

s <- reshape2::melt(sample, id.vars = "ID", value.name = "Q")
s$variable <- 1
s <- subset(s, complete.cases(s))
s <- reshape(s, idvar = "ID", timevar = "Q", direction = "wide")
s <- apply(s, 2, function(x) ifelse(is.na(x), 0, x))
like image 124
m070ch Avatar answered Sep 20 '22 15:09

m070ch


1) Assuming DF is as shown reproducibly in the Note at the end use sapply to create an matrix of indicators and then cbind it to the ID column. Finally make the names nicer. No packages are used.

ques <- function(i) paste0("Q", 1:25) %in% unlist(DF[i, -1])
DFout <- cbind(DF[1], +t(sapply(1:nrow(DF), ques)))
names(DFout)[-1] <- paste0("Q", names(DFout[-1]))

The first 5 columns are:

> DFout[1:5]

   ID Q1 Q2 Q3 Q4
1 100  1  0  0  0
2 101  0  0  1  1
3 102  0  1  1  1
4 103  0  0  0  0
5 104  0  0  0  1
6 105  1  1  1  1

2) Another possibility is to convert the input to long form and then use xtabs to create a table from it.

library(dplyr)
library(tidyr)

tab <- DF %>% 
  gather(key, Question, -ID) %>%
  filter(nzchar(Question)) %>%
  mutate(Question = factor(Question, paste0("Q", 1:25))) %>%
  xtabs(~ ID + Question, .)

giving this table. We show the first 5 columns:

> tab[, 1:5]

     Question
ID    Q1 Q2 Q3 Q4
  100  1  0  0  0
  101  0  0  1  1
  102  0  1  1  1
  104  0  0  0  1
  105  1  1  1  1

If it is important that the result be a data frame then add:

library(tibble)

tab %>% 
  as.data.frame.matrix %>% 
  rownames_to_column(var = "ID")    

Note

sample <- "rows ID   Col1  Col2  Col3  Col4
1          100   Q1     
2          101   Q3    Q4
3          102   Q2    Q3    Q4   
4          103   
5          104   Q4
6          105   Q1    Q2    Q3    Q4"
DF <- read.table(text = sample, header = TRUE, fill = TRUE, as.is = TRUE,
  strip.white = TRUE)[-1]
like image 36
G. Grothendieck Avatar answered Sep 21 '22 15:09

G. Grothendieck