Regex extraction data before vs after comma in R

Question

I am a regex beginner, as I don't usually process text. I have a very simple question. I managed to construct the following regex to extract data after a comma:

sub('.*,\s*','', X)

where X is the column I am searching.

I now separately want to extract the data before the comma, but am struggling with the regex syntax. Appreciate the help.

MrFreezer · Accepted Answer

The following expression:

sub('\s*,.*','', X)

replaces everything from the last comma to the end of line with an empty string. Therefore, it will return the text before the last comma in the string.

Shawn Mehan · Answer

Your regex

sub('.*,\s*','', X)

is not extracting text, it is substituting the second param for what is matched by the first. So, everything that matches a bunch of characters followed by a comma followed by a space character in X gets replaced with nothing in this regex.

You can see what you are hitting in the demo linked above. I am not certain what you are trying to achieve, but if you want to match the text that sits before a comma in your text, this regex will match it and here is how you would also replace it with your previous replacement in your sub

In R

X2 = "here is another test string, with following text"
Y <- sub('.*(,.*)','', X2)

yielding

> Y
[1] ", with following text"

In R, your code produces:

X = "here is a test string, "
Y <- sub('.*,\s*','\1', X)

yielding

> Y
[1] ""

Regex extraction data before vs after comma in R

Tags:

regex

r

gsub

RichS

2 Answers

MrFreezer

Shawn Mehan

Recent Activity

Donate For Us

Regex extraction data before vs after comma in R

Tags:

regex

r

gsub

RichS

2 Answers

MrFreezer

Shawn Mehan

Related questions

Recent Activity

Donate For Us