Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a new column that maps one character string onto a new character string based on a "Rosetta Stone" data frame?

Tags:

r

dplyr

gsub

I have a data frame in R.

I'm trying to add/mutate a new column that maps several old character strings to new character strings using a map/translation/Rosetta Stone data frame that defines what strings I want to be replaced.

I was thinking something involving dplyr::mutate and some kind of function that applies gsub, but I just can't put it all together.

Starting Data Frame:

  starting_df <- read.table(header=TRUE, text="
  ID   Genotype
  VIT_123_1    0
  ROM_456_2    0
  VIT_78_1     1
  BELG_910_1   1
")

Rosetta Stone Data Frame:

  map_df <- read.table(header=TRUE, text="
  ID   New_ID
  VIT   VCO1
  ROM   VRO1
  BELG  VBE2
")

Desired Output Data Frame:

  >head(updated_df)
    ID           Genotype    New_ID
    VIT_123_1    0           VCO1_123_1
    ROM_456_2    0           VRO1_456_2
    VIT_78_1     1           VCO1_78_1
    BELG_910_1   1           VBE2_910_1
like image 486
Gen Avatar asked Nov 29 '25 07:11

Gen


1 Answers

You can use str_replace_all from the stringr package.

First of all convert your map_df dataframe into a named vector:

map_v = as.character(map_df$New_ID)
names(map_v) = map_df$ID

Then replace the old values with new values:

library(stringr)
res = starting_df
res$New_ID = str_replace_all(starting_df$ID,map_v)

          ID Genotype     New_ID
1  VIT_123_1        0 VCO1_123_1
2  ROM_456_2        0 VRO1_456_2
3   VIT_78_1        1  VCO1_78_1
4 BELG_910_1        1 VBE2_910_1
like image 123
Lamia Avatar answered Dec 01 '25 20:12

Lamia