Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tidyr use separate_rows over multiple columns

Tags:

r

apply

tidyr

I have a data.frame where some cells contain strings of comma separate values:

d <- data.frame(a=c(1:3), 
       b=c("name1, name2, name3", "name4", "name5, name6"),
       c=c("name7","name8, name9", "name10" ))

I want to separate those strings where each name is split into its own cell. This is easy with

tidyr::separate_rows(d, b, sep=",") 

if it is done for one column a time. But I can't do this for both columns "b" and "c" at the same time, since it requires that the number of names in each string is the same. Instead of writing

tidyr::separate_rows(d, b, sep=",") 
tidyr::separate_rows(d, c, sep=",") 

Is there a way to do this in a one-liner, for e.g. with apply? Something like

apply(d, 2, separate_rows(...)) 

Not sure how to pass the arguments to the separate_rows() function.

like image 899
user23413 Avatar asked Oct 07 '16 17:10

user23413


People also ask

How do I split a string into multiple columns in R?

To split a column into multiple columns in the R Language, We use the str_split_fixed() function of the stringr package library. The str_split_fixed() function splits up a string into a fixed number of pieces.

How do I convert multiple rows to one column?

Convert columns to rows with the Transpose tool Select any single cell within your table, go to the Ablebits tab > Transform group, and click the Transpose button.

How do you separate rows?

In the table, click the cell that you want to split. Click the Layout tab. In the Merge group, click Split Cells. In the Split Cells dialog, select the number of columns and rows that you want and then click OK.

How do I split a column in a row in R?

The easiest way to separate a column into rows using R is using the seperate_rows function from the tidyr package. Be careful with the delimiter if there is also whitespace.


2 Answers

Here's an alternative approach using splitstackshape::cSplit and zoo::na.locf.

library(splitstackshape)
library(zoo)

df <- cSplit(d, 1:ncol(d), "long", sep = ",")
na.locf(df[rowSums(is.na(df)) != ncol(df),])
#    a     b      c
#1:  1 name1  name7
#2:  1 name2  name7
#3:  1 name3  name7
#4:  2 name4  name8
#5:  2 name4  name9
#6:  3 name5 name10
#7:  3 name6 name10
like image 171
mtoto Avatar answered Oct 25 '22 01:10

mtoto


You can use a pipe. Note that sep = ", " is automatically detected.

d %>% separate_rows(b) %>% separate_rows(c)
#   a     b      c
# 1 1 name1  name7
# 2 1 name2  name7
# 3 1 name3  name7
# 4 2 name4  name8
# 5 2 name4  name9
# 6 3 name5 name10
# 7 3 name6 name10

Note: Using tidyr version 0.6.0, where the %>% operator is included in the package.


Update: Using @doscendodiscimus comment, we could use a for() loop and reassign d in each iteration. This way we can have as many columns as we like. We will use a character vector of column names, so we'll need to switch to the standard evaluation version, separate_rows_.

cols <- c("b", "c")
for(col in cols) {
    d <- separate_rows_(d, col)
}

which gives the updated d

  a     b      c
1 1 name1  name7
2 1 name2  name7
3 1 name3  name7
4 2 name4  name8
5 2 name4  name9
6 3 name5 name10
7 3 name6 name10
like image 30
Rich Scriven Avatar answered Oct 25 '22 02:10

Rich Scriven