Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split unequally occurring comma-separated strings to columns and fill with missing values

I have a data frame with comma-separated strings:

df <- data.frame(x = c("a,b,c", "a", "a,b"))

I'd like to split the strings into separate columns, resulting in 3 new columns. For the rows with fewer than 3 strings, columns should be filled with missing values.

What I have tried so far is to use the strsplit command:

dfb <- strsplit(df, ",")

Returns an error:

non-character argument

I have also tried separate, and this would provide the additional "fill right" feature:

dfnew2 <- separate(df, c("X","Y"), sep = ",", fill = "right")

This Returns Error:

var must evaluate to a single number or a column name, not a character vector

My expected result should be a data frame like:

X Y   Z
a b   c
a n/a n/a
a b   n/a

Do you have any suggestions? Many thanks!

like image 232
Zipfer Avatar asked Nov 27 '25 18:11

Zipfer


2 Answers

Use read.table:

read.table(text = as.character(df$x), sep = ",", as.is = TRUE, fill = TRUE,
  na.strings = "")

giving:

  V1   V2   V3
1  a    b    c
2  a <NA> <NA>
3  a    b <NA>
like image 171
G. Grothendieck Avatar answered Nov 29 '25 08:11

G. Grothendieck


One tidyverse possibility could be:

df %>%
 rowid_to_column() %>%
 mutate(x = strsplit(x, ",", fixed = TRUE)) %>%
 unnest() %>%
 spread(x, x) %>%
 select(-rowid)

  a    b    c
1 a    b    c
2 a <NA> <NA>
3 a    b <NA>

With more compact column names:

df %>%
 rowid_to_column() %>%
 mutate(x = strsplit(x, ",", fixed = TRUE)) %>%
 unnest() %>%
 spread(x, x) %>%
 rename_at(2:length(.), ~ paste("var", 1:length(.), sep = "_")) %>%
 select(-rowid)

  var_1 var_2 var_3
1     a     b     c
2     a  <NA>  <NA>
3     a     b  <NA>
like image 39
tmfmnk Avatar answered Nov 29 '25 08:11

tmfmnk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!