Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Split Strings based on conditions in R?

Tags:

regex

r

gsub

I would like to split a single string into multiple strings by looking at the words 'split here' only if it exists between '>' & '<' and not remove any others characters except for the words 'split here'

text <- c("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >")

Expected output:

text[1] = "Don't split here > yes here "
text[2] = "and blah blah < again don't (anything could be here) split here >"

I tried

gsub(">(.*split here.*)<","", text)

but that doesn't seem to work. Can someone with regex exp. help me out here?

like image 554
Tejas Bawaskar Avatar asked Apr 24 '19 14:04

Tejas Bawaskar


People also ask

How do I split a string into multiple strings in R?

To split a string in R, use the strsplit() method. The strsplit() is a built-in R function that splits the string vector into sub-strings. The strsplit() method returns the list, where each list item resembles the item of input that has been split.

What is split () function in string?

Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.

Is split faster than regex?

Regex will work faster in execution, however Regex's compile time and setup time will be more in instance creation. But if you keep your regex object ready in the beginning, reusing same regex to do split will be faster.

What is regex in Split method?

split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.


1 Answers

Replace the required strings with \1 and then split on \1:

strsplit(gsub("(>[^<]+) split here ([^<]+<)", "\\1\1\\2", text), "\1")
## [[1]]
## [1] "Don't split here > yes here"             
## [2] "and blah blah < again don't split here >"

If the input is a character vector the output will be a list or if you want to flatten that just use unlist(s) where s is the result of the above line of code.

like image 134
G. Grothendieck Avatar answered Oct 15 '22 03:10

G. Grothendieck