Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R extract string between nth and ith instance of delimiter

Tags:

regex

r

I have a vector of strings, similar to this one, but with many more elements:

s <- c("CGA-DV-558_T_90.67.0_DV_1541_07", "TC-V-576_T_90.0_DV_151_0", "TCA-DV-X_T_6.0_D_A2_07", "T-V-Z_T_2_D_A_0", "CGA-DV-AW0_T.1_24.4.0_V_A6_7", "ACGA-DV-A4W0_T_274.46.0_DV_A266_07")

And I would like to use a function that extracts the string between the nth and ith instances of the delimiter "_". For example, the string between the 2nd (n = 2) and 3rd (i = 3) instances, to get this:

[1] "90.67.0"  "90.0"     "6.0"      "2"        "24.4.0"   "274.46.0"

Or if n = 4 and i = 5"

[1] "1541" "151"  "A2"   "A"    "A"    "A266"

Any suggestions? Thank you for your help!

like image 270
arielle Avatar asked Dec 14 '22 22:12

arielle


1 Answers

You can do this with gsub

n = 2
i = 3

pattern1 = paste0("(.*?_){", n,  "}")
temp = gsub(pattern1, "", s)
pattern2 = paste0("((.*?_){", i-n,  "}).*")
temp = gsub(pattern2, "\\1", temp)
temp = gsub("_$", "", temp)
[1] "1541" "151"  "A2"   "A"    "A6"   "A266"
like image 171
G5W Avatar answered Jan 18 '23 23:01

G5W