Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String split on a number word pattern

I have a data frame that looks like this:

V1                        V2
peanut butter sandwich    2 slices of bread 1 tablespoon peanut butter

What I'm aiming to get is:

V1                        V2
peanut butter sandwich    2 slices of bread
peanut butter sandwich    1 tablespoon peanut butter

I've tried to split the string using strsplit(df$v2, " "), but I can only split by the " ". I'm not sure if you can split the string only at the first number and then take the characters until the next number.

like image 627
yokota Avatar asked Dec 21 '15 02:12

yokota


1 Answers

You can split the string as follows:

txt <- "2 slices of bread 1 tablespoon peanut butter"

strsplit(txt, " (?=\\d)", perl=TRUE)[[1]]
#[1] "2 slices of bread"          "1 tablespoon peanut butter"

The regex being used here is looking for spaces followed by a digit. It uses a zero-width positive lookahead (?=) to say that if the space is followed by a digit (\\d), then it's the type of space we want to split on. Why the zero-width lookahead? It's because we don't want to use the digit as a splitting character, we just want match any space that is followed by a digit.

To use that idea and construct your data frame, see this example:

item <- c("peanut butter sandwich", "onion carrot mix", "hash browns")
txt <- c("2 slices of bread 1 tablespoon peanut butter", "1 onion 3 carrots", "potato")
df <- data.frame(item, txt, stringsAsFactors=FALSE)

# thanks to Ananda for recommending setNames
split.strings <- setNames(strsplit(df$txt, " (?=\\d)", perl=TRUE), df$item) 
# alternately: 
#split.strings <- strsplit(df$txt, " (?=\\d)", perl=TRUE)
#names(split.strings) <- df$item

stack(split.strings)
#                      values                    ind
#1          2 slices of bread peanut butter sandwich
#2 1 tablespoon peanut butter peanut butter sandwich
#3                    1 onion       onion carrot mix
#4                  3 carrots       onion carrot mix
#5                     potato            hash browns
like image 113
Jota Avatar answered Oct 20 '22 09:10

Jota