Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split column at delimiter in data frame [duplicate]

Tags:

dataframe

r

People also ask

How do you split a column in a Dataframe on a delimiter?

Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

How do you split a delimiter in R?

Use str_split to Split String by Delimiter in R Alternatively, the str_split function can also be utilized to split string by delimiter. str_split is part of the stringr package. It almost works in the same way as strsplit does, except that str_split also takes regular expressions as the pattern.

How do I split a column into two Dataframe in R?

To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.

Which splits a data frame and returns a data frame?

The Pandas provide the feature to split Dataframe according to column index, row index, and column values, etc.


@Taesung Shin is right, but then just some more magic to make it into a data.frame. I added a "x|y" line to avoid ambiguities:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
foo <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

Or, if you want to replace the columns in the existing data.frame:

within(df, FOO<-data.frame(do.call('rbind', strsplit(as.character(FOO), '|', fixed=TRUE))))

Which produces:

  ID FOO.X1 FOO.X2
1 11      a      b
2 12      b      c
3 13      x      y

The newly popular tidyr package does this with separate. It uses regular expressions so you'll have to escape the |

df <- data.frame(ID=11:13, FOO=c('a|b', 'b|c', 'x|y'))
separate(data = df, col = FOO, into = c("left", "right"), sep = "\\|")

#   ID left right
# 1 11    a     b
# 2 12    b     c
# 3 13    x     y

though in this case the defaults are smart enough to work (it looks for non-alphanumeric characters to split on).

separate(data = df, col = FOO, into = c("left", "right"))

Hadley has a very elegant solution to do this inside data frames in his reshape package, using the function colsplit.

require(reshape)
> df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
> df
  ID FOO
1 11 a|b
2 12 b|c
3 13 x|y
> df = transform(df, FOO = colsplit(FOO, split = "\\|", names = c('a', 'b')))
> df
  ID FOO.a FOO.b
1 11     a     b
2 12     b     c
3 13     x     y

Just came across this question as it was linked in a recent question on SO.

Shameless plug of an answer: Use cSplit from my "splitstackshape" package:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
library(splitstackshape)
cSplit(df, "FOO", "|")
#   ID FOO_1 FOO_2
# 1 11     a     b
# 2 12     b     c
# 3 13     x     y

This particular function also handles splitting multiple columns, even if each column has a different delimiter:

df <- data.frame(ID=11:13, 
                 FOO=c('a|b','b|c','x|y'), 
                 BAR = c("A*B", "B*C", "C*D"))
cSplit(df, c("FOO", "BAR"), c("|", "*"))
#   ID FOO_1 FOO_2 BAR_1 BAR_2
# 1 11     a     b     A     B
# 2 12     b     c     B     C
# 3 13     x     y     C     D

Essentially, it's a fancy convenience wrapper for using read.table(text = some_character_vector, sep = some_sep) and binding that output to the original data.frame. In other words, another A base R approach could be:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
cbind(df, read.table(text = as.character(df$FOO), sep = "|"))
  ID FOO V1 V2
1 11 a|b  a  b
2 12 b|c  b  c
3 13 x|y  x  y

strsplit(c('a|b','b|c'),'|',fixed=TRUE)