I have a data.frame where some cells contain strings of comma separate values:
d <- data.frame(a=c(1:3),
b=c("name1, name2, name3", "name4", "name5, name6"),
c=c("name7","name8, name9", "name10" ))
I want to separate those strings where each name is split into its own cell. This is easy with
tidyr::separate_rows(d, b, sep=",")
if it is done for one column a time. But I can't do this for both columns "b" and "c" at the same time, since it requires that the number of names in each string is the same. Instead of writing
tidyr::separate_rows(d, b, sep=",")
tidyr::separate_rows(d, c, sep=",")
Is there a way to do this in a one-liner, for e.g. with apply? Something like
apply(d, 2, separate_rows(...))
Not sure how to pass the arguments to the separate_rows()
function.
To split a column into multiple columns in the R Language, We use the str_split_fixed() function of the stringr package library. The str_split_fixed() function splits up a string into a fixed number of pieces.
Convert columns to rows with the Transpose tool Select any single cell within your table, go to the Ablebits tab > Transform group, and click the Transpose button.
In the table, click the cell that you want to split. Click the Layout tab. In the Merge group, click Split Cells. In the Split Cells dialog, select the number of columns and rows that you want and then click OK.
The easiest way to separate a column into rows using R is using the seperate_rows function from the tidyr package. Be careful with the delimiter if there is also whitespace.
Here's an alternative approach using splitstackshape::cSplit
and zoo::na.locf
.
library(splitstackshape)
library(zoo)
df <- cSplit(d, 1:ncol(d), "long", sep = ",")
na.locf(df[rowSums(is.na(df)) != ncol(df),])
# a b c
#1: 1 name1 name7
#2: 1 name2 name7
#3: 1 name3 name7
#4: 2 name4 name8
#5: 2 name4 name9
#6: 3 name5 name10
#7: 3 name6 name10
You can use a pipe. Note that sep = ", "
is automatically detected.
d %>% separate_rows(b) %>% separate_rows(c)
# a b c
# 1 1 name1 name7
# 2 1 name2 name7
# 3 1 name3 name7
# 4 2 name4 name8
# 5 2 name4 name9
# 6 3 name5 name10
# 7 3 name6 name10
Note: Using tidyr version 0.6.0, where the %>%
operator is included in the package.
Update: Using @doscendodiscimus comment, we could use a for()
loop and reassign d
in each iteration. This way we can have as many columns as we like. We will use a character vector of column names, so we'll need to switch to the standard evaluation version, separate_rows_
.
cols <- c("b", "c")
for(col in cols) {
d <- separate_rows_(d, col)
}
which gives the updated d
a b c
1 1 name1 name7
2 1 name2 name7
3 1 name3 name7
4 2 name4 name8
5 2 name4 name9
6 3 name5 name10
7 3 name6 name10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With