Subset data to contain only columns whose names match a condition

Example

#  Data
df <- data.frame( ABC_1 = runif(3),
            ABC_2 = runif(3),
            XYZ_1 = runif(3),
            XYZ_2 = runif(3) )

#      ABC_1     ABC_2     XYZ_1     XYZ_2
#1 0.3792645 0.3614199 0.9793573 0.7139381
#2 0.1313246 0.9746691 0.7276705 0.0126057
#3 0.7282680 0.6518444 0.9531389 0.9673290

#  Use grepl
df[ , grepl( "ABC" , names( df ) ) ]
#      ABC_1     ABC_2
#1 0.3792645 0.3614199
#2 0.1313246 0.9746691
#3 0.7282680 0.6518444

#  grepl returns logical vector like this which is what we use to subset columns
grepl( "ABC" , names( df ) )
#[1]  TRUE  TRUE FALSE FALSE

To answer the second part, I'd make the subset data.frame and then make a vector that indexes the rows to keep (a logical vector) like this...

set.seed(1)
df <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),
            ABC_2 = sample(0:1,3,repl = TRUE),
            XYZ_1 = sample(0:1,3,repl = TRUE),
            XYZ_2 = sample(0:1,3,repl = TRUE) )

# We will want to discard the second row because 'all' ABC values are 0:
#  ABC_1 ABC_2 XYZ_1 XYZ_2
#1     0     1     1     0
#2     0     0     1     0
#3     1     1     1     0


df1 <- df[ , grepl( "ABC" , names( df ) ) ]

ind <- apply( df1 , 1 , function(x) any( x > 0 ) )

df1[ ind , ]
#  ABC_1 ABC_2
#1     0     1
#3     1     1

You can also use starts_with and dplyr's select() like so:

df <- df %>% dplyr:: select(starts_with("ABC"))

Just in case for data.table users, the following works for me:

df[, grep("ABC", names(df)), with = FALSE]

Using dplyr you can:

df <- df %>% dplyr:: select(grep("ABC", names(df)), grep("XYZ", names(df)))

This worked for me:

df[,names(df) %in% colnames(df)[grepl(str,colnames(df))]]

Simplest solution, given to me by my statistics professor:

df[,grep("pattern", colnames(df))]

That's it. It doesn't give you booleans or anything, it just gives you your dataset that follows that pattern.

Related questions
                            
                                Add a prefix to column names
                            
                                List all column except for one in R [duplicate]
                            
                                knitr/Rmd: page break after n lines/n distance
                            
                                Restart mixed effect model estimation with previously estimated values
                            
                                How to efficiently use Rprof in R?
                            
                                "%%" and "%/%" for the remainder and the quotient
                            
                                Plot size and resolution with R markdown, knitr, pandoc, beamer
                            
                                Comparing gather (tidyr) to melt (reshape2)
                            
                                Applying group_by and summarise on data while keeping all the columns' info
                            
                                ggplot2 bar plot, no space between bottom of geom and x axis keep space above
                            
                                Creating a Plot Window of a Particular Size
                            
                                dplyr::select function clashes with MASS::select
                            
                                Extract p-value from aov
                            
                                rbind error: "names do not match previous names"
                            
                                How to divide each row of a matrix by elements of a vector in R
                            
                                How to crash R?
                            
                                Delete rows with blank values in one particular column
                            
                                Compare two character vectors in R
                            
                                dplyr mutate rowSums calculations or custom functions
                            
                                groupby weighted average and sum in pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Subset data to contain only columns whose names match a condition

Tags:

r

subset

People also ask

Example

Recent Activity

Donate For Us