Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional replacement of column name in tibble using dplyr

Tags:

r

dplyr

tidyverse

I have the following tibble:

    df <- structure(list(gene_symbol = c("0610005C13Rik", "0610007P14Rik", 
"0610009B22Rik", "0610009L18Rik", "0610009O20Rik", "0610010B08Rik"
), foo.control.cv = c(1.16204038288333, 0.120508045270669, 0.205712615954009, 
0.504508040948641, 0.333956330117591, 0.543693011377001), foo.control.mean = c(2.66407458486012, 
187.137728870855, 142.111269303428, 16.7278587043453, 69.8602872478098, 
4.77769028710622), foo.treated.cv = c(0.905769898934564, 0.186441944401973, 
0.158552512842753, 0.551955061149896, 0.15743983656006, 0.290447431974039
), foo.treated.mean = c(2.40658723367692, 180.846795140269, 139.054032348287, 
11.8584348984435, 76.8141734599118, 2.24088124240385)), .Names = c("gene_symbol", 
"foo.control.cv", "foo.control.mean", "foo.treated.cv", "foo.treated.mean"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
6L))

Which looks like this:

# A tibble: 6 × 5
    gene_symbol foo.control.cv foo.control.mean foo.treated.cv foo.treated.mean
*         <chr>          <dbl>            <dbl>          <dbl>            <dbl>
1 0610005C13Rik      1.1620404         2.664075      0.9057699         2.406587
2 0610007P14Rik      0.1205080       187.137729      0.1864419       180.846795
3 0610009B22Rik      0.2057126       142.111269      0.1585525       139.054032
4 0610009L18Rik      0.5045080        16.727859      0.5519551        11.858435
5 0610009O20Rik      0.3339563        69.860287      0.1574398        76.814173
6 0610010B08Rik      0.5436930         4.777690      0.2904474         2.240881

What I want to do is to replace all column names with mean in it into mean_expr. Resulting in

    gene_symbol foo.control.cv foo.control.mean_expr foo.treated.cv foo.treated.mean_expr

1 0610005C13Rik      1.1620404         2.664075      0.9057699         2.406587
2 0610007P14Rik      0.1205080       187.137729      0.1864419       180.846795
3 0610009B22Rik      0.2057126       142.111269      0.1585525       139.054032
4 0610009L18Rik      0.5045080        16.727859      0.5519551        11.858435
5 0610009O20Rik      0.3339563        69.860287      0.1574398        76.814173
6 0610010B08Rik      0.5436930         4.777690      0.2904474         2.240881

How can I achieve that?

like image 528
neversaint Avatar asked Apr 24 '17 02:04

neversaint


People also ask

How do I change column names in R with dplyr?

rename() function from dplyr takes a syntax rename(new_column_name = old_column_name) to change the column from old to a new name. The following example renames the column from id to c1 . The operator – %>% is used to load the renamed column names to the data frame.

How do I change the value of a column in R using dplyr?

Use mutate() and its other verbs mutate_all() , mutate_if() and mutate_at() from R dplyr package to replace/update the values of the column (string, integer, or any type) in DataFrame (data. frame). For more methods of this package refer to the R dplyr tutorial.

Can you rename columns with select R?

Rename columns with the select() functionYou can actually use the select() function from dplyr to rename variables. Syntactically, this is almost exactly the same as our code using rename() . We just supply the dataframe and the pair of variable names – the new variable name and the old variable name.


3 Answers

With current versions of dplyr, you can use rename_at:

library(dplyr)

df %>% rename_at(vars(contains('mean')), funs(sub('mean', 'mean_expr', .)))
#> # A tibble: 6 × 5
#>     gene_symbol foo.control.cv foo.control.mean_expr foo.treated.cv
#> *         <chr>          <dbl>                 <dbl>          <dbl>
#> 1 0610005C13Rik      1.1620404              2.664075      0.9057699
#> 2 0610007P14Rik      0.1205080            187.137729      0.1864419
#> 3 0610009B22Rik      0.2057126            142.111269      0.1585525
#> 4 0610009L18Rik      0.5045080             16.727859      0.5519551
#> 5 0610009O20Rik      0.3339563             69.860287      0.1574398
#> 6 0610010B08Rik      0.5436930              4.777690      0.2904474
#> # ... with 1 more variables: foo.treated.mean_expr <dbl>

Really, you could use rename_all, as well, as names that don't match would be unaffected anyway. Further, you can use a quosure or anything that can be coerced to a function by rlang::as_function for .funs, so you can use purrr-style notation:

df %>% rename_all(~sub('mean', 'mean_expr', .x))

Since a data frame is a list, purrr's set_names can do the same thing:

library(purrr)    # or library(tidyverse)

df %>% set_names(~sub('mean', 'mean_expr', .x))

All return the same thing.

like image 185
alistaire Avatar answered Sep 28 '22 10:09

alistaire


Another option is to paste in rename_at (using the devel version of dplyr)

library(dplyr)
df %>%
    rename_at(vars(matches('mean')), funs(sprintf('%s_expr', .)))
# A tibble: 6 × 5
#    gene_symbol foo.control.cv foo.control.mean_expr foo.treated.cv foo.treated.mean_expr
#*         <chr>          <dbl>                 <dbl>          <dbl>                 <dbl>
#1 0610005C13Rik      1.1620404              2.664075      0.9057699              2.406587    
#2 0610007P14Rik      0.1205080            187.137729      0.1864419            180.846795
#3 0610009B22Rik      0.2057126            142.111269      0.1585525            139.054032
#4 0610009L18Rik      0.5045080             16.727859      0.5519551             11.858435
#5 0610009O20Rik      0.3339563             69.860287      0.1574398             76.814173
#6 0610010B08Rik      0.5436930              4.777690      0.2904474              2.240881

Or using rename_if

df %>%
   rename_if(grepl("mean", names(.)), funs(sprintf("%s_expr", .)))
like image 40
akrun Avatar answered Sep 28 '22 10:09

akrun


Here is a non-dplyr base R method:

names(df) <- sub("mean$", "mean_expr", names(df))
# or names(df) <- sub("mean", "mean_expr", names(df)) if the mean doesn't have to be at the 
# end of the string

names(df)
#[1] "gene_symbol"           "foo.control.cv"        "foo.control.mean_expr"
#[4] "foo.treated.cv"        "foo.treated.mean_expr"

If you want it to be a part of the pipe, you can make use of setNames function:

df %>% setNames(sub("mean", "mean_expr", names(.))) %>% names(.)
#[1] "gene_symbol"           "foo.control.cv"        "foo.control.mean_expr"
#[4] "foo.treated.cv"        "foo.treated.mean_expr"
like image 35
Psidom Avatar answered Sep 28 '22 09:09

Psidom