Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep only numbers before the FIRST hyphen AND the hyphen itself

Tags:

regex

r

I am trying to get rid of all the numbers/characters coming in AFTER the FIRST hyphen. here are some examples:

15-103025-01
800-40170-02
68-4974-01

My desired output:

15-
800-
68-

I've read through posts like these:

  1. Using gsub to extract character string before white space in R
  2. truncate string from a certain character in R
  3. Truncating the end of a string in R after a character that can be present zero or more times

But they are not what I'm looking for as the methods mentioned in those will get rid of my hyphen as well (leaving me only the first 2 or 3 numbers).

Here's what I've tried so far:

gsub(pattern = '[0-9]*-$', replacement = "", x = data$id)
grep(pattern = '[0-9]*-', replacement = "", x = data$id)
regexpr(pattern = '[0-9]*-', text = data$id)

but not really working as I expected.

like image 859
alwaysaskingquestions Avatar asked May 27 '16 22:05

alwaysaskingquestions


1 Answers

Several ways to achieve this, here is one:

have <- c("15-103025-01", "800-40170-02", "68-4974-01")
want <- sub(pattern = "(^\\d+\\-).*", replacement = "\\1", x = have)

So in your regular expression, you'll have one group created with ()'s, which matches the start of the string (^) followed by one or more numbers (\\d+) and the hyphen (\\-). Outside the group is any other character(s) that follow (.*).

In the replacement part, you specify \\1 to refer to the first (and only) group of the regular expression. Not adding anything else means dropping all the rest.

like image 137
Dominic Comtois Avatar answered Nov 14 '22 23:11

Dominic Comtois