I have a dataframe with over 500 named variables, and I want to select only the columns whose names include the strings "xyz" and "abc". The first letter is sometimes capitalized, sometimes not, so I'm using a regular expression "[Aa]bc" etc.
I have the full dataset in a dataframe called df, and I'm building a new dataframe called df2 by selecting variables out of df using grep(). I can do it one at a time and stick them together with cbind(), but I'd like to know how to do it all in one go.
I thought I could pass multiple conditions to grep(), but seem to be getting stuck here.
With a really simplified example:
df <- data.frame(abc=1:3, def=4:6, Xyz=7:9, Abc=10:12, xyz=13:15)
abc def Xyz Abc xyz
1 1 4 7 10 13
2 2 5 8 11 14
3 3 6 9 12 15
I successfully got the columns I needed using two separate lines:
df2 <- df[,grep("[Aa]bc", names(df), value=TRUE)]
df3 <- df[,grep("[Xx]yz", names(df), value=TRUE)]
df4 <- cbind(df2, df3)
When I try to do all this at once using:
df2 <- df[,grep("[Aa]bc" | "[Xx]yz", names(df), value=TRUE)]
I got the following error:
Error in "[Aa]bc" | "[Xx]yz" : operations are possible only for numeric, logical or complex types
I also tried passing the conditions to grep as a list of the strings, but it didn't work:
df2 <- df[,grep(c("[Aa]bc", "[Xx]yz"), names(df), value=TRUE)]
It only used the first list item with a warning:
In grep(c("[Aa]bc", "[Xx]yz"), names(df), value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used
so it only selected the columns with "[Aa]bc", and skipped "[Xx]yz".
Is there an easier way to do this?
There is an argument ignore.case
which If set to TRUE
, well, it ignores (upper/lower) case, i.e.
df[grepl('xyz|abc', names(df), ignore.case = TRUE)]
# abc Xyz Abc
#1 1 7 10
#2 2 8 11
#3 3 9 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With