How can I split a character column into 3 columns using %, -, and + as the possible delimiters, keeping the delimiters in the new columns?
Example Data:
data <- data.table(x=c("92.1%+100-200","90.4%-1000+200", "92.8%-200+100", "99.2%-500-200","90.1%+500-200"))
Example desired data:
data.desired <- data.table(x1=c("92.1%", "90.4%", "92.8%","99.2%","90.1%")
, x2=c("+100","-1000","-200","-500","+500")
, x3=c("-200","+200","+100","-200","-200"))
Happy to award the points for a good answer and some help on this one!
Use the String. split() method to split a string with multiple separators, e.g. str. split(/[-_]+/) . The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.
We may use separate
from tidyr
for splitting and a positive lookahead as to keep the delimiters:
data %>% separate(x, c("x1", "x2", "x3"), sep = "(?=\\+|-)")
# x1 x2 x3
# 1: 92.1% +100 -200
# 2: 90.4% -1000 +200
# 3: 92.8% -200 +100
# 4: 99.2% -500 -200
# 5: 90.1% +500 -200
That is, note that splitting simply by \\+|-
we would get
data %>% separate(x, c("x1", "x2", "x3"), sep = "\\+|-")
# x1 x2 x3
# 1: 92.1% 100 200
# 2: 90.4% 1000 200
# 3: 92.8% 200 100
# 4: 99.2% 500 200
# 5: 90.1% 500 200
Using (?=\\+|-)
split at "nothing" in case right after that we have +
or -
(which are not matched).
In data.table
the equivalent is tstrsplit
:
data[, c("x1","x2","x3") := tstrsplit(x, "(?<=.)(?=[+-])", perl=TRUE) ]
data
# x x1 x2 x3
#1: 92.1%+100-200 92.1% +100 -200
#2: 90.4%-1000+200 90.4% -1000 +200
#3: 92.8%-200+100 92.8% -200 +100
#4: 99.2%-500-200 99.2% -500 -200
#5: 90.1%+500-200 90.1% +500 -200
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With