Using gsub or sub function to only get part of a string?

Question

      Col
WBU-ARGU*06:03:04
WBU-ARDU*08:01:01
WBU-ARFU*11:03:05
WBU-ARFU*03:456

I have a column which has 75 rows of variables such as the col above. I am not quite sure how to use gsub or sub in order to get up until the integers after the first colon.

Expected output:

      Col
WBU-ARGU*06:03
WBU-ARDU*08:01
WBU-ARFU*11:03
WBU-ARFU*03:456

I tried this but it doesn't seem to work:

gsub("*..:","", df$col)

RavinderSingh13 · Accepted Answer

Following may help you here too.

sub("([^:]*):([^:]*).*","\1:\2",df$dat)

Output will be as follows.

> sub("([^:]*):([^:]*).*","\1:\2",df$dat)
[1] "WBU-ARGU*06:03"   "WBU-ARDU*08:01"   "WBU-ARFU*11:03"   "WBU-ARFU*03:456b"

Where Input for data frame is as follows.

dat <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456b")
df <- data.frame(dat)

Explanation: Following is only for explanation purposes.

sub("      ##using sub for global subtitution function of R here.
([^:]*)     ##By mentioning () we are keeping the matched values from vector's element into 1st place of memory(which we could use later), which is till next colon comes it will match everything.
:           ##Mentioning letter colon(:) here.
([^:]*)     ##By mentioning () making 2nd place in memory for matched values in vector's values which is till next colon comes it will match everything.
.*"         ##Mentioning .* to match everything else now after 2nd colon comes in value.
,"\1:\2"  ##Now mentioning the values of memory holds with whom we want to substitute the element values \1 means 1st memory place \2 is second memory place's value.
,df$dat)    ##Mentioning df$dat dataframe's dat value.

Wiktor Stribiżew · Answer

You may use

df$col <- sub("(\d:\d+):\d+$", "\1", df$col)

See the regex demo

Details

(\d:\d+) - Capturing group 1 (its value will be accessible via \1 in the replacement pattern): a digit, a colon and 1+ digits.
: - a colon
\d+ - 1+ digits
$ - end of string.

R Demo:

col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("(\d:\d+):\d+$", "\1", col)
## => [1] "WBU-ARGU*06:03"  "WBU-ARDU*08:01"  "WBU-ARFU*11:03"  "WBU-ARFU*03:456"

Alternative approach:

df$col <- sub("^(.*?:\d+).*", "\1", df$col)

See the regex demo

Here,

^ - start of string
(.*?:\d+) - Group 1: any 0+ chars, as few as possible (due to the lazy *? quantifier), then : and 1+ digits
.* - the rest of the string.

However, it should be used with the PCRE regex engine, pass perl=TRUE:

col <- c("WBU-ARGU*06:03:04","WBU-ARDU*08:01:01","WBU-ARFU*11:03:05","WBU-ARFU*03:456")
sub("^(.*?:\d+).*", "\1", col, perl=TRUE)
## => [1] "WBU-ARGU*06:03"  "WBU-ARDU*08:01"  "WBU-ARFU*11:03"  "WBU-ARFU*03:456"

See the R online demo.

Using gsub or sub function to only get part of a string?

Tags:

regex

r

gsub

nathan

2 Answers

RavinderSingh13

Wiktor Stribiżew

Recent Activity

Donate For Us

Using gsub or sub function to only get part of a string?

Tags:

regex

r

gsub

nathan

2 Answers

RavinderSingh13

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us