Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude everything after the second occurrence of a certain string

Tags:

regex

r

I have the following string

string <- c('a - b - c - d',
            'z - c - b',
            'y',
            'u - z')

I would like to subset it such that everything after the second occurrence of ' - ' is thrown away.

The result would be this:

> string
[1]  "a - b" "z - c" "y"     "u - z"

I used substr(x = string, 1, regexpr(string, pattern = '[^ - ]*$') - 4), but it excludes the last occurrence of ' - ', which is not what I want .

like image 349
D Pinto Avatar asked Mar 06 '17 14:03

D Pinto


People also ask

How to remove everything after the last occurrence of a string?

If you need to remove everything after the last occurrence of a specific character, use the lastIndexOf method. We have two sets of brackets in the string and we only want to remove everything after the last set of brackets.

How to remove everything after the 2nd occurrence of a pattern?

What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut? Everything after the 2nd - should be stripped out. The regex should also match zero occurrences of the pattern, so zero or one occurrence should be ignored and from the 2nd occurrence everything should be removed.

How to remove everything after a specific character in a string?

Use the String.slice () method to remove everything after a specific character, e.g. const removed = str.slice (0, str.indexOf (' ['));. The slice method will return the part of the string before the specified character. We passed the following arguments to the String.slice method:

How to return only the part after the last occurrence?

To force it to return only the part after the last occurrence of the letter a, we need to add an end-of-string character ($) at the end: The final method makes use of a positive look-behind, which might not be supported in all regex engines:


1 Answers

Note that you cannot use a negated character class to negate a sequence of characters. [^ - ]*$ matches any 0+ chars other than a space (yes, it matches -, too, because the - created a range between a space and a space) followed by the end of the string marker ($).

You may use a sub function with the following regex:

^(.*? - .*?) - .*

to replace with \1. See the regex demo.

R code:

> string <- c('a - b - c - d', 'z - c - b', 'y', 'u - z')
> sub("^(.*? - .*?) - .*", "\\1", string)
[1] "a - b" "z - c" "y"     "u - z"

Details:

  • ^ - start of a string
  • (.*? - .*?) - Group 1 (referred to with the \1 backreference in the replacement pattern) capturing any 0+ chars lazily up to the first space, hyphen, space and then again any 0+ chars up to the next leftmost occurrence of space, hyphen, space
  • - - a space, hyphen and a space
  • .* - any zero or more chars up to the end of the string.
like image 73
Wiktor Stribiżew Avatar answered Nov 11 '22 03:11

Wiktor Stribiżew