Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regular expression to obtain all text before the second underscore

Tags:

regex

split

r

s <- "1-343-43Hello_2_323.14_fdh-99H"

In R I want to use a regex to get the substring before the, say 2nd, underscore. How can this be done with one regex ? The alternative would be to split by '_' and then paste the first two - something along;

paste(sapply(strsplit(s, "_"),"[", 1:2), collapse = "_")

Gives:

[1] "1-343-43Hello_2"

But how can I make a regex expression to do the same ?

like image 454
user3375672 Avatar asked Jul 14 '16 12:07

user3375672


2 Answers

In general, for answering to the question in title, is

sub("^(([^_]*_){n}[^_]*).*", "\\1", s)

where n is the number of _ you are allowing.

like image 142
horcrux Avatar answered Oct 19 '22 13:10

horcrux


You can use a sub:

sub("^([^_]*_[^_]*).*", "\\1", s)

See the regex demo

R code demo:

s <- "1-343-43Hello_2_323.14_fdh-99H"
sub("^([^_]*_[^_]*).*", "\\1", s)
## => [1] "1-343-43Hello_2"

Pattern details:

  • ^ - start of string
  • ([^_]*_[^_]*) - Group 1 capturing 0+ characters other than _, then a _ and again 0+ non-_s
  • .* - rest of the string (note that the TRE regex . matches newlines, too).

The \\1 replacement only returns the value inside Group 1.

like image 30
Wiktor Stribiżew Avatar answered Oct 19 '22 15:10

Wiktor Stribiżew