Creating one variable from a list of variables in R?

Tags:

I have a sequence of variables in a dataframe (over 100) and I would like to create an indicator variable for if particular text patterns are present in any of the variables. Below is an example with three variables. One solution I've found is using tidyr::unite() followed by dplyr::mutate(), but I'm interested in a solution where I do not have to unite the variables.

c1<-c("T1", "X1", "T6", "R5")
c2<-c("R4", "C6", "C7", "X3")
c3<-c("C5", "C2", "X4", "T2")

df<-data.frame(c1, c2, c3)

  c1 c2 c3
1 T1 R4 C5
2 X1 C6 C2
3 T6 C7 X4
4 R5 X3 T2

code.vec<-c("T1", "T2", "T3", "T4") #Text patterns of interest
code_regex<-paste(code.vec, collapse="|")

new<-df %>% 
  unite(all_c, c1:c3, remove=FALSE) %>% 
  mutate(indicator=if_else(grepl(code_regex, all_c), 1, 0)) %>% 
  select(-(all_c))

  c1 c2 c3 indicator
1 T1 R4 C5 1
2 X1 C6 C2 0
3 T6 C7 X4 0
4 R5 X3 T2 1

Above is an example that produces the desired result, however I feel as if there should be a way of doing this in tidyverse without having to unite the variables. This is something that SAS handles very easily using an ARRAY statement and a DO loop, and I'm hoping R has a good way of handling this.

The real dataframe has many additional variables besides from the "c" fields to search, so a solution that involves searching every column would require subsetting the dataframe to first only contain the variables I want to search, and then joining the data back with the other variables.

427

asked Apr 22 '19 14:04

patward5656

Video Answer

2 Answers

Using base R, we can use sapply and use grepl to find pattern in every column and assign 1 to rows where there is more than 0 matches.

df$indicator <- as.integer(rowSums(sapply(df, grepl, pattern = code_regex)) > 0)

df
#  c1 c2 c3 indicator
#1 T1 R4 C5         1
#2 X1 C6 C2         0
#3 T6 C7 X4         0
#4 R5 X3 T2         1

If there are few other columns and we are interested to apply it only for columns which start with "c" we can use grep to filter them.

cols <- grep("^c", names(df))
as.integer(rowSums(sapply(df[cols], grepl, pattern = code_regex)) > 0)

Using dplyr we can do

library(dplyr)

df$indicator <- as.integer(df %>%
              mutate_at(vars(c1:c3), ~grepl(code_regex, .)) %>%
              rowSums() > 0)

139

answered Oct 16 '22 09:10

Ronak Shah

We can use tidyverse

library(tidyverse)
df %>%
    mutate_all(str_detect, pattern = code_regex) %>%
    reduce(`+`) %>% 
    mutate(df, indicator = .)
#  c1 c2 c3 indicator
#1 T1 R4 C5         1
#2 X1 C6 C2         0
#3 T6 C7 X4         0
#4 R5 X3 T2         1

Or using base R

Reduce(`+`, lapply(df, grepl, pattern = code_regex))
#[1] 1 0 0 1

answered Oct 16 '22 10:10

akrun

Related questions
                            
                                Converting Images to Black and White for Image Recognition in R
                            
                                How to store the returned value from a Shiny module in reactiveValues?
                            
                                Plot divergent stacked bar chart with ggplot2
                            
                                How can I increase precision in R when calculating with probabilities close to 0 and 1?
                            
                                Dummy code categorical / ordinal variables in the tidyverse r
                            
                                Built Family nested tree parent / children relationship in R
                            
                                avoid ggplot2 to partially cut axis text
                            
                                Log axis labels in ggplot2: Show only necessary digits?
                            
                                How to train a ML model in sparklyr and predict new values on another dataframe?
                            
                                indented bullet point after R chunk in Rmarkdown
                            
                                Solve a function in R similar to Goal Seeker in Excel
                            
                                Plotting a kernel map based on points with geom_sf
                            
                                R split array into Data frame
                            
                                Split string every n characters new column
                            
                                Display p-values under 0.1 in r stargazer
                            
                                Reshape but expand the data in R
                            
                                R, generate pretty plot by dfSummary
                            
                                Stacked barplot in UpSetR
                            
                                r igraph find all cycles
                            
                                Change line width of leaflet's stroke in leaflet

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating one variable from a list of variables in R?

Tags:

r

dplyr

tidyverse