Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding second space after each comma

Tags:

regex

r

strsplit

This is a follow up to this question: Concatenate previous and latter words to a word that match a condition in R

I am looking for a regex which splits the string at the second space that happens after comma. Look at the example below:

vector <- c("Paulsen", "Kehr,", "Diego", 
            "Schalper", "Sepúlveda,", "Alejandro",
             "Von Housen", "Kush,", "Terry")

X <- paste(vector, collapse = " ")
X

## this is the string I am looking to split:
"Paulsen Kehr, Diego Schalper Sepúlveda, Diego Von Housen Kush, Terry"

Second space after each comma is the criterion for my regex. So, my output will be:

"Paulsen Kehr, Diego"
"Schalper Sepúlveda, Alejandro"
"Von Housen Kush, Terry"

I came up with a pattern but it is not quite working.

[^ ]+ [^ ]+, [^ ]+( )

Using it with strsplit removes all the words instead of splitting at group-1 (i.e. [^ ]+ [^ ]+, [^ ]+(group-1)) only. I think I just needs to exclude the full match and match with the space afterwards only. -- regex demo

strsplit(X, "[^ ]+ [^ ]+, [^ ]+( )")

# [1] "" [2] "" [3] "Von Housen Kush, Terry"

Can anyone think of a regex for finding the second space after each comma?

like image 748
M-- Avatar asked Oct 25 '19 14:10

M--


People also ask

How do I extract data after the second comma in Excel?

Note: If you want to extract the text after the second comma or other separators, you just need to replace the space with comma or other delimiters in the formula as you need. Such as: =MID(A2, FIND(",", A2, FIND(",", A2)+1)+1,256).


1 Answers

You may use

> strsplit(X, ",\\s+\\S+\\K\\s+", perl=TRUE)
[[1]]
[1] "Paulsen Kehr, Diego"           "Schalper Sepúlveda, Alejandro" "Von Housen Kush, Terry"

See the regex demo

Details

  • , - a comma
  • \s+ - 1+ whitespaces
  • \S+ - 1+ non-whitespaces
  • \K - match reset operator discarding all text matched so far
  • \s+ - 1+ whitespaces
like image 164
Wiktor Stribiżew Avatar answered Oct 10 '22 04:10

Wiktor Stribiżew