I am using the matches()
helper function as part of an argument to select()
in the dplyr
function.
The function looks like this for a hypothetical df
data frame:
select(df, variable_1_name, matches("variable_2_name"))
At least as I'm currently using it, variable_2_name
must be passed as a string to select()
.
However, if there is another variable in df
that matches "variable_2_name"
, such as "variable_2_name_recode"
, then matches()
will match both of those variables. Is it possible to match only exact matches with a dplyr
function, or with a different approach?
In our first example using filter () function in dplyr, we used the pipe operator “%>%” while using filter () function to select rows. Like other dplyr functions, we can also use filter () function without the pipe operator as shown below. And we will get the same results as shown above.
Then we can use the match R function as follows: The match function returns the value 2; The value 5 was found at the second position of our example vector. Note: The match command returned only the first match, even though the value 5 matches also the fourth element of our example vector.
MATCH can find exact matches or approximate matches. In this video, we'll look at how to use MATCH to find an exact match. The MATCH function takes three arguments: the lookup_value, which is the value you're looking up, the lookup_array, which is the list to look in, and match_type, which specifies exact or approximate matching.
The MATCH function finds the relative position of an item in a list. MATCH can find exact matches or approximate matches. In this video, we'll look at how to use MATCH to find an exact match.
You can of course just do the following when a string is not required:
select(df, variable_1_name, variable_2_name)
matches
takes a pattern so you can try
# '^' anchors the match at the beginning of the string and
# '$' anchors the match at the end of the string.
select(df, variable_1_name, matches("^variable_2_name$"))
this should just match variable_2_name
exactly.
If you have a function doing the select based on a string for the column name you could do the following (as mentioned by Psidom in a comment). The first example is simpler and the second is more of what you are looking for.
### Example 1
### Given function and the 'df' with the column 'variable_2_name'
my_func <- function(df, colname) { df %>% select_(colname) }
my_func(df, 'variable_2_name') # Call with column name string
### Example 2
### Using one column name that is not a string with a string column name string.
### 'df' has columns 'variable_1_name' and 'variable_2_name'
my_func <- function(df, colname) {
df %>% select_(quote(variable_1_name), colname)
}
### Call with column name returns 2 columns of data
### 'variable_1_name' and 'variable_2_name'
my_func(df, 'variable_2_name')
Edit
dplyr::select_
is now deprecated, but the code above should be changeable to use dplyr::select
instead of dplyr::select_
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With