I tried to search for the solution, but it appears that there is no clear one for R.
I try to split the string by the pattern of, let's say, space and capital letter and I use stringr package for that.
x <- "Foobar foobar, Foobar foobar"
str_split(x, " [:upper:]")
Normally I would get:
[[1]]
[1] "Foobar foobar," "oobar foobar"
The output I would like to get, however, should include the letter from the delimiter:
[[1]]
[1] "Foobar foobar," "Foobar foobar"
Probably there is no out of box solution in stringr like back-referencing, so I would be happy to get any help.
split() The method split() splits a String into multiple Strings given the delimiter that separates them. The returned object is an array which contains the split Strings. We can also pass a limit to the number of elements in the returned array.
Using split() The following example defines a function that splits a string into an array of strings using separator .
The Split method extracts the substrings in this string that are delimited by one or more of the strings in the separator parameter, and returns those substrings as elements of an array.
You may split with 1+ whitespaces that are followed with an uppercase letter:
> str_split(x, "\\s+(?=[[:upper:]])")
[[1]]
[1] "Foobar foobar," "Foobar foobar"
Here,
\\s+
- 1 or more whitespaces(?=[[:upper:]])
- a positive lookahead (a non-consuming pattern) that only checks for an uppercase letter immediately to the right of the current location in string without adding it to the match value, thus, preserving it in the output.Note that \s
matches various whitespace chars, not just plain regular spaces. Also, it is safer to use [[:upper:]]
rather than [:upper:]
- if you plan to use the patterns with other regex engines (like PCRE, for example).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With