I am a regex beginner, as I don't usually process text. I have a very simple question. I managed to construct the following regex to extract data after a comma:
sub('.*,\\s*','', X)
where X is the column I am searching.
I now separately want to extract the data before the comma, but am struggling with the regex syntax. Appreciate the help.
The following expression:
sub('\\s*,.*','', X)
replaces everything from the last comma to the end of line with an empty string. Therefore, it will return the text before the last comma in the string.
Your regex
sub('.*,\\s*','', X)
is not extracting text, it is substituting the second param for what is matched by the first. So, everything that matches a bunch of characters followed by a comma followed by a space character in X
gets replaced with nothing in this regex.
You can see what you are hitting in the demo linked above. I am not certain what you are trying to achieve, but if you want to match the text that sits before a comma in your text, this regex will match it and here is how you would also replace it with your previous replacement in your sub
In R
X2 = "here is another test string, with following text"
Y <- sub('.*(,.*)','', X2)
yielding
> Y
[1] ", with following text"
In R, your code produces:
X = "here is a test string, "
Y <- sub('.*,\\s*','\\1', X)
yielding
> Y
[1] ""
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With