How to split R data.frame column based regular expression condition

Question

I have a data.frame and I want to split one of its columns to two based on a regular expression. More specifically the strings have a suffix in parentheses that needs to be extracted to a column of its own.

So e.g. I want to get from here:

dfInit <- data.frame(VAR = paste0(c(1:10),"(",c("A","B"),")"))

to here:

dfFinal <- data.frame(VAR1 = c(1:10), VAR2 = c("A","B"))

G. Grothendieck · Accepted Answer

1) gsubfn::read.pattern read.pattern in the gsubfn package can do that. The matches to the parenthesized portions of the regular rexpression are regarded as the fields:

library(gsubfn)
read.pattern(text = as.character(dfInit$VAR), pattern = "(.*)[(](.*)[)]$")

giving:

2) sub Another way is to use sub:

data.frame(V1=sub("$.*", "", dfInit$VAR), V2=sub(".*\((.)$$", "\1", dfInit$VAR))

giving the same result.

3) read.table This solution does not use a regular expression:

read.table(text = as.character(dfInit$VAR), sep = "(", comment = ")")

giving the same result.

akrun · Answer

You could also use extract from tidyr

library(tidyr)
extract(dfInit, VAR, c("VAR1", "VAR2"), "(\d+).([[:alpha:]]+).", convert=TRUE) # edited and added `convert=TRUE` as per @aosmith's comments.



#    VAR1 VAR2
#1     1    A
#2     2    B
#3     3    A
#4     4    B
#5     5    A
#6     6    B
#7     7    A
#8     8    B
#9     9    A
#10   10    B

How to split R data.frame column based regular expression condition

Tags:

regex

dataframe

r

Antti

2 Answers

G. Grothendieck

akrun

Recent Activity

Donate For Us

How to split R data.frame column based regular expression condition

Tags:

regex

dataframe

r

Antti

2 Answers

G. Grothendieck

akrun

Related questions

Recent Activity

Donate For Us