I have a data.frame and I want to split one of its columns to two based on a regular expression. More specifically the strings have a suffix in parentheses that needs to be extracted to a column of its own.
So e.g. I want to get from here:
dfInit <- data.frame(VAR = paste0(c(1:10),"(",c("A","B"),")"))
to here:
dfFinal <- data.frame(VAR1 = c(1:10), VAR2 = c("A","B"))
1) gsubfn::read.pattern read.pattern in the gsubfn package can do that. The matches to the parenthesized portions of the regular rexpression are regarded as the fields:
library(gsubfn)
read.pattern(text = as.character(dfInit$VAR), pattern = "(.*)[(](.*)[)]$")
giving:
V1 V2
1 1 A
2 2 B
3 3 A
4 4 B
5 5 A
6 6 B
7 7 A
8 8 B
9 9 A
10 10 B
2) sub Another way is to use sub:
data.frame(V1=sub("\\(.*", "", dfInit$VAR), V2=sub(".*\\((.)\\)$", "\\1", dfInit$VAR))
giving the same result.
3) read.table This solution does not use a regular expression:
read.table(text = as.character(dfInit$VAR), sep = "(", comment = ")")
giving the same result.
You could also use extract from tidyr
library(tidyr)
extract(dfInit, VAR, c("VAR1", "VAR2"), "(\\d+).([[:alpha:]]+).", convert=TRUE) # edited and added `convert=TRUE` as per @aosmith's comments.
# VAR1 VAR2
#1 1 A
#2 2 B
#3 3 A
#4 4 B
#5 5 A
#6 6 B
#7 7 A
#8 8 B
#9 9 A
#10 10 B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With