Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split string with regex

Tags:

regex

r

strsplit

I'm looking to split a string of a generic form, where the square brackets denote the "sections" of the string. Ex:

x <- "[a] + [bc] + 1"

And return a character vector that looks like:

"[a]"  " + "  "[bc]" " + 1"

EDIT: Ended up using this:

x <- "[a] + [bc] + 1"
x <- gsub("\\[",",[",x)
x <- gsub("\\]","],",x)
strsplit(x,",")
like image 779
Jeff Keller Avatar asked Mar 22 '13 15:03

Jeff Keller


People also ask

Can you use regex to split a string?

You do not only have to use literal strings for splitting strings into an array with the split method. You can use regex as breakpoints that match more characters for splitting a string.

How do you split a string in regex in Python?

Regex example to split a string into words In this example, we will split the target string at each white-space character using the \s special sequence. Let's add the + metacharacter at the end of \s . Now, The \s+ regex pattern will split the target string on the occurrence of one or more whitespace characters.

How do you split a string?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

How do you split a string in Java?

Split() String method in Java with examples. The string split() method breaks a given string around matches of the given regular expression. After splitting against the given regular expression, this method returns a string array.


3 Answers

I've seen TylerRinker's code and suspect it may be more clear than this but this may serve as way to learn a different set of functions. (I liked his better before I noticed that it split on spaces.) I tried adapting this to work with strsplit but that function always removes the separators. Maybe this could be adapted to make a newstrsplit that splits at the separators but leaves them in? Probably need to not split at first or last position and distinguish between opening and closing separators.

scan(text=   # use scan to separate after insertion of commas
            gsub("\\]", "],",   # put commas in after "]"'s
            gsub(".\\[", ",[",  x)) ,  # add commas before "[" unless at first position
        what="", sep=",")    # tell scan this character argument and separators are ","
#Read 4 items
#[1] "[a]"  " +"   "[bc]" " + 1"
like image 189
IRTFM Avatar answered Oct 04 '22 08:10

IRTFM


This is one lazy approach:

FUN <- function(x) {
    all <- unlist(strsplit(x, "\\s+"))
    last <- paste(c(" ", tail(all, 2)), collapse="")
    c(head(all, -2), last)
}

x <- "[a] + [bc] + 1"    
FUN(x)

## > FUN(x)
## [1] "[a]"  "+"    "[bc]" " +1"
like image 39
Tyler Rinker Avatar answered Oct 04 '22 08:10

Tyler Rinker


You can compute the split points manually and use substring :

split.pos <- gregexpr('\\[.*?]',x)[[1]]
split.length <- attr(split.pos, "match.length")
split.start <- sort(c(split.pos, split.pos+split.length))
split.end <- c(split.start[-1]-1, nchar(x))
substring(x,split.start,split.end)
#  [1] "[a]"  " + "  "[bc]" " + 1"
like image 43
juba Avatar answered Oct 04 '22 06:10

juba