I have a data frame that looks like this:
V1 V2
peanut butter sandwich 2 slices of bread 1 tablespoon peanut butter
What I'm aiming to get is:
V1 V2
peanut butter sandwich 2 slices of bread
peanut butter sandwich 1 tablespoon peanut butter
I've tried to split the string using strsplit(df$v2, " ")
, but I can only split by the " "
. I'm not sure if you can split the string only at the first number and then take the characters until the next number.
You can split the string as follows:
txt <- "2 slices of bread 1 tablespoon peanut butter"
strsplit(txt, " (?=\\d)", perl=TRUE)[[1]]
#[1] "2 slices of bread" "1 tablespoon peanut butter"
The regex being used here is looking for spaces followed by a digit. It uses a zero-width positive lookahead (?=)
to say that if the space is followed by a digit (\\d
), then it's the type of space we want to split on. Why the zero-width lookahead? It's because we don't want to use the digit as a splitting character, we just want match any space that is followed by a digit.
To use that idea and construct your data frame, see this example:
item <- c("peanut butter sandwich", "onion carrot mix", "hash browns")
txt <- c("2 slices of bread 1 tablespoon peanut butter", "1 onion 3 carrots", "potato")
df <- data.frame(item, txt, stringsAsFactors=FALSE)
# thanks to Ananda for recommending setNames
split.strings <- setNames(strsplit(df$txt, " (?=\\d)", perl=TRUE), df$item)
# alternately:
#split.strings <- strsplit(df$txt, " (?=\\d)", perl=TRUE)
#names(split.strings) <- df$item
stack(split.strings)
# values ind
#1 2 slices of bread peanut butter sandwich
#2 1 tablespoon peanut butter peanut butter sandwich
#3 1 onion onion carrot mix
#4 3 carrots onion carrot mix
#5 potato hash browns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With