tidyr use separate_rows over multiple columns

Tags:

I have a data.frame where some cells contain strings of comma separate values:

d <- data.frame(a=c(1:3), 
       b=c("name1, name2, name3", "name4", "name5, name6"),
       c=c("name7","name8, name9", "name10" ))

I want to separate those strings where each name is split into its own cell. This is easy with

Click to copy

tidyr::separate_rows(d, b, sep=",")

if it is done for one column a time. But I can't do this for both columns "b" and "c" at the same time, since it requires that the number of names in each string is the same. Instead of writing

Click to copy

tidyr::separate_rows(d, b, sep=",") 
tidyr::separate_rows(d, c, sep=",")

Is there a way to do this in a one-liner, for e.g. with apply? Something like

Click to copy

apply(d, 2, separate_rows(...))

Not sure how to pass the arguments to the separate_rows() function.

899

asked Oct 07 '16 17:10

2 Answers

Here's an alternative approach using splitstackshape::cSplit and zoo::na.locf.

Click to copy

library(splitstackshape)
library(zoo)

df <- cSplit(d, 1:ncol(d), "long", sep = ",")
na.locf(df[rowSums(is.na(df)) != ncol(df),])
#    a     b      c
#1:  1 name1  name7
#2:  1 name2  name7
#3:  1 name3  name7
#4:  2 name4  name8
#5:  2 name4  name9
#6:  3 name5 name10
#7:  3 name6 name10

171

answered Oct 25 '22 01:10

mtoto

You can use a pipe. Note that sep = ", " is automatically detected.

Click to copy

d %>% separate_rows(b) %>% separate_rows(c)
#   a     b      c
# 1 1 name1  name7
# 2 1 name2  name7
# 3 1 name3  name7
# 4 2 name4  name8
# 5 2 name4  name9
# 6 3 name5 name10
# 7 3 name6 name10

Note: Using tidyr version 0.6.0, where the %>% operator is included in the package.

Update: Using @doscendodiscimus comment, we could use a for() loop and reassign d in each iteration. This way we can have as many columns as we like. We will use a character vector of column names, so we'll need to switch to the standard evaluation version, separate_rows_.

Click to copy

cols <- c("b", "c")
for(col in cols) {
    d <- separate_rows_(d, col)
}

which gives the updated d

Click to copy

  a     b      c
1 1 name1  name7
2 1 name2  name7
3 1 name3  name7
4 2 name4  name8
5 2 name4  name9
6 3 name5 name10
7 3 name6 name10

answered Oct 25 '22 02:10

Rich Scriven

Related questions
                            
                                Generate paired stacked bar charts in ggplot (using position_dodge only on some variables)
                            
                                How to add more margin to a heatmap.2 plot with the png device?
                            
                                How to add bullet points in R Shiny's renderText?
                            
                                is dash a special character in R regex?
                            
                                How to make monotonic (increasing) smooth spline with smooth.spline() function?
                            
                                R: How to : 3d Density plot with gplot and geom_density
                            
                                Seasonal Decomposition of Time Series by Loess with Python
                            
                                R make circle/chord diagram with circlize from dataframe
                            
                                How can I add notes to the bottom of a table using knitr::kable?
                            
                                dplyr summarize with a function of a dataframe
                            
                                How do I change the background color on a shiny dashboard in R
                            
                                How to join data frames based on condition between 2 columns
                            
                                R shiny custom icon/image in selectInput
                            
                                R: argument matches multiple formal arguments
                            
                                How to get the name of each element of a list using lapply()?
                            
                                Removing the levels attribute in the output - R
                            
                                filtering with multiple conditions on many columns using dplyr
                            
                                Heatmap plot by value using ggmap
                            
                                How to do range grouping on a column using dplyr?
                            
                                Error in sending email through Gmail by using mailR

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tidyr use separate_rows over multiple columns

Tags:

r

apply

tidyr

user23413

People also ask

2 Answers

mtoto

Rich Scriven

Recent Activity

Donate For Us