Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I select variables in an R dataframe whose names contain a particular string?

Tags:

regex

r

Two examples would be very helpful for me.

How would I select: 1) variables whose names start with b or B (i.e. case-insensitive) or 2) variables whose names contain a 3

df <- data.frame(a1 = factor(c("Hi", "Med", "Hi", "Low"), 
  levels = c("Low", "Med", "Hi"), ordered = TRUE),
  a2 = c("A", "D", "A", "C"), a3 = c(8, 3, 9, 9),
  b1 = c(1, 1, 1, 2), b2 = c( 5, 4, 3,2), b3 = c(3, 4, 3, 4),
  B1 = c(3, 6, 4, 4))
like image 387
Michael Bishop Avatar asked Sep 26 '11 22:09

Michael Bishop


People also ask

How do I select a column with certain names in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.

How do I select specific data in R?

To select a specific column, you can also type in the name of the dataframe, followed by a $ , and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.

How do I find a specific variable in R?

Accessing a particular variable (i.e., column) in a data object is simple: DataObject$VarName , where DataObject is the data object and VarName the variable desired. The $ (dollar) symbol is how R links the requested variable to the data object. A single accessed variable is returned as a vector.


2 Answers

If you just want the variable names:

grep("^[Bb]", names(df), value=TRUE)

grep("3", names(df), value=TRUE)

If you are wanting to select those columns, then either

df[,grep("^[Bb]", names(df), value=TRUE)]
df[,grep("^[Bb]", names(df))]

The first uses selecting by name, the second uses selecting by a set of column numbers.

like image 77
Brian Diggs Avatar answered Oct 22 '22 23:10

Brian Diggs


While I like the answer above, I wanted to give a "tidyverse" solution as well. If you are doing a lot of pipes and trying to do several things at once, as I often do, you may like this answer. Also, I find this code more "humanly" readable.

The function tidyselect::vars_select will select variables from a character vector in the first argument, which should contain the names of the corresponding data frame, based on a select helper function like starts_with or matches

library(dplyr)
library(tidyselect)


df <- data.frame(a1 = factor(c("Hi", "Med", "Hi", "Low"), 
                         levels = c("Low", "Med", "Hi"), ordered = TRUE),
             a2 = c("A", "D", "A", "C"), a3 = c(8, 3, 9, 9),
             b1 = c(1, 1, 1, 2), b2 = c( 5, 4, 3,2), b3 = c(3, 4, 3, 4),
             B1 = c(3, 6, 4, 4))

# will select the names starting with a "b" or a "B"
tidyselect::vars_select(names(df), starts_with('b', ignore.case = TRUE)) 

# use select in conjunction with the previous code
df %>%
  select(vars_select(names(df), starts_with('b', ignore.case = TRUE)))

# Alternatively
tidyselect::vars_select(names(df), matches('^[Bb]'))

Note that the default for ignore.case is TRUE, but I put it here to show explicitly, and in case future readers are curious how to adjust the code. The include and exclude arguments are also very useful. For example, you could use vars_select(names(df), matches('^[Bb]'), include = 'a1') if you wanted everything that starts with a "B" or a "b", and you wanted to include "a1" as well.

like image 12
justin1.618 Avatar answered Oct 22 '22 21:10

justin1.618