Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check for the existence of a certain value in a set of variables only when there is no NA?

I have a dataframe with over hundreds of variables, grouped in different factors ("Happy_","Sad_", etc) and I want to create a set new variables indicating whether a participant put a rating of 4 in any of the variables in one factor. However, if any of the variable in that factor is NA, then the new variable will also output NA.

I have tried the following, but it didn't work:

library(tidyverse)
df <- data.frame(Subj = c("A", "B", "C", "D"),
                 Happy_1_Num = c(4,2,2,NA),
                 Happy_2_Num = c(4,2,2,1),
                 Happy_3_Num = c(1,NA,2,4),
                 Sad_1_Num = c(2,1,4,3),
                 Sad_2_Num = c(NA,1,2,4),
                 Sad_3_Num = c(4,2,2,1))

# Don't work
df <- df %>% mutate(Happy_Any4 = ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ is.na(.)), NA,
                                                                 ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ . == 4),1,0)),
                    Sad_Any4 = ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ is.na(.)), NA,
                                      ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ . == 4),1,0)))

I tried a workaround by first generating a set of variables to indicate if that factor has any NA, and after that check if participant put any rating of "4". it works; but since I have many factors, I was wondering if there is a more elegant way of doing it.

# workaround
df <- df %>% mutate(
  NA_Happy = ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ is.na(.)), 1,0),
  NA_Sad = ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ is.na(.)), 1,0))

df <- df %>% mutate(
  Happy_Any4 = ifelse(NA_Happy == 1, NA,
                        ifelse(if_any(matches("^Happy_") & matches("_Num$"), ~ . == 4),1,0)),
  Sad_Any4 = ifelse(NA_Sad == 1, NA,
                        ifelse(if_any(matches("^Sad_") & matches("_Num$"), ~ . == 4),1,0)))
like image 957
Emma Loke Avatar asked Dec 06 '25 02:12

Emma Loke


1 Answers

Here is a base R option using split.default -

tmp <- df[-1]
cbind(df, sapply(split.default(tmp, sub('_.*', '', names(tmp))), 
                 function(x) as.integer(rowSums(x== 4) > 0)))

#  Subj Happy_1_Num Happy_2_Num Happy_3_Num Sad_1_Num Sad_2_Num Sad_3_Num Happy Sad
#1    A           4           4           1         2        NA         4     1  NA
#2    B           2           2          NA         1         1         2    NA   0
#3    C           2           2           2         4         2         2     0   1
#4    D          NA           1           4         3         4         1    NA   1

sub would keep only either "Happy" or "Sad" part of the names, split.default splits the data based on that and use sapply to calculate if any value of 4 is present in a row.


If you can afford to write each and every factor manually you can do -

library(dplyr)

df %>%
  mutate(Happy = as.integer(rowSums(select(., starts_with('Happy')) == 4) > 0), 
         Sad = as.integer(rowSums(select(., starts_with('Sad')) == 4) > 0))
like image 149
Ronak Shah Avatar answered Dec 08 '25 15:12

Ronak Shah