I have a data frame with comma-separated strings:
df <- data.frame(x = c("a,b,c", "a", "a,b"))
I'd like to split the strings into separate columns, resulting in 3 new columns. For the rows with fewer than 3 strings, columns should be filled with missing values.
What I have tried so far is to use the strsplit command:
dfb <- strsplit(df, ",")
Returns an error:
non-character argument
I have also tried separate, and this would provide the additional "fill right" feature:
dfnew2 <- separate(df, c("X","Y"), sep = ",", fill = "right")
This Returns Error:
varmust evaluate to a single number or a column name, not a character vector
My expected result should be a data frame like:
X Y Z
a b c
a n/a n/a
a b n/a
Do you have any suggestions? Many thanks!
Use read.table:
read.table(text = as.character(df$x), sep = ",", as.is = TRUE, fill = TRUE,
na.strings = "")
giving:
V1 V2 V3
1 a b c
2 a <NA> <NA>
3 a b <NA>
One tidyverse possibility could be:
df %>%
rowid_to_column() %>%
mutate(x = strsplit(x, ",", fixed = TRUE)) %>%
unnest() %>%
spread(x, x) %>%
select(-rowid)
a b c
1 a b c
2 a <NA> <NA>
3 a b <NA>
With more compact column names:
df %>%
rowid_to_column() %>%
mutate(x = strsplit(x, ",", fixed = TRUE)) %>%
unnest() %>%
spread(x, x) %>%
rename_at(2:length(.), ~ paste("var", 1:length(.), sep = "_")) %>%
select(-rowid)
var_1 var_2 var_3
1 a b c
2 a <NA> <NA>
3 a b <NA>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With