How to subset a dataframe using multiple regular expressions in variable names?

Question

I have a dataframe with over 500 named variables, and I want to select only the columns whose names include the strings "xyz" and "abc". The first letter is sometimes capitalized, sometimes not, so I'm using a regular expression "[Aa]bc" etc.

I have the full dataset in a dataframe called df, and I'm building a new dataframe called df2 by selecting variables out of df using grep(). I can do it one at a time and stick them together with cbind(), but I'd like to know how to do it all in one go.

I thought I could pass multiple conditions to grep(), but seem to be getting stuck here.

With a really simplified example:

df <- data.frame(abc=1:3, def=4:6, Xyz=7:9, Abc=10:12, xyz=13:15)

  abc def Xyz Abc xyz
1   1   4   7  10  13
2   2   5   8  11  14
3   3   6   9  12  15

I successfully got the columns I needed using two separate lines:

df2 <- df[,grep("[Aa]bc", names(df), value=TRUE)]
df3 <- df[,grep("[Xx]yz", names(df), value=TRUE)]
df4 <- cbind(df2, df3)

When I try to do all this at once using:

df2 <- df[,grep("[Aa]bc" | "[Xx]yz", names(df), value=TRUE)]

I got the following error:

Error in "[Aa]bc" | "[Xx]yz" : operations are possible only for numeric, logical or complex types

I also tried passing the conditions to grep as a list of the strings, but it didn't work:

df2 <- df[,grep(c("[Aa]bc", "[Xx]yz"), names(df), value=TRUE)]

It only used the first list item with a warning:

In grep(c("[Aa]bc", "[Xx]yz"), names(df), value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used

so it only selected the columns with "[Aa]bc", and skipped "[Xx]yz".

Is there an easier way to do this?

Sotos · Accepted Answer

There is an argument ignore.case which If set to TRUE, well, it ignores (upper/lower) case, i.e.

df[grepl('xyz|abc', names(df), ignore.case = TRUE)]

#   abc Xyz Abc
#1   1   7  10
#2   2   8  11
#3   3   9  12

How to subset a dataframe using multiple regular expressions in variable names?

Tags:

r

lschoen

1 Answers

Sotos

Recent Activity

Donate For Us

How to subset a dataframe using multiple regular expressions in variable names?

Tags:

r

lschoen

1 Answers

Sotos

Related questions

Recent Activity

Donate For Us