This is a follow up to this question: Concatenate previous and latter words to a word that match a condition in R
I am looking for a regex which splits the string at the second space that happens after comma. Look at the example below:
vector <- c("Paulsen", "Kehr,", "Diego",
"Schalper", "Sepúlveda,", "Alejandro",
"Von Housen", "Kush,", "Terry")
X <- paste(vector, collapse = " ")
X
## this is the string I am looking to split:
"Paulsen Kehr, Diego Schalper Sepúlveda, Diego Von Housen Kush, Terry"
Second space after each comma is the criterion for my regex. So, my output will be:
"Paulsen Kehr, Diego"
"Schalper Sepúlveda, Alejandro"
"Von Housen Kush, Terry"
I came up with a pattern but it is not quite working.
[^ ]+ [^ ]+, [^ ]+( )
Using it with strsplit
removes all the words instead of splitting at group-1 (i.e. [^ ]+ [^ ]+, [^ ]+(group-1)
) only. I think I just needs to exclude the full match and match with the space afterwards only. --
regex demo
strsplit(X, "[^ ]+ [^ ]+, [^ ]+( )")
# [1] "" [2] "" [3] "Von Housen Kush, Terry"
Can anyone think of a regex for finding the second space after each comma?
Note: If you want to extract the text after the second comma or other separators, you just need to replace the space with comma or other delimiters in the formula as you need. Such as: =MID(A2, FIND(",", A2, FIND(",", A2)+1)+1,256).
You may use
> strsplit(X, ",\\s+\\S+\\K\\s+", perl=TRUE)
[[1]]
[1] "Paulsen Kehr, Diego" "Schalper Sepúlveda, Alejandro" "Von Housen Kush, Terry"
See the regex demo
Details
,
- a comma\s+
- 1+ whitespaces\S+
- 1+ non-whitespaces\K
- match reset operator discarding all text matched so far\s+
- 1+ whitespacesIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With