How to convert a list consisting of vector of different lengths to a usable data frame in R?

I have a (fairly long) list of vectors. The vectors consist of Russian words that I got by using the strsplit() function on sentences.

The following is what head() returns:

[1] "модно"     "создавать" "резюме"    "в"         "виде"     

[1] "ты"        "начианешь" "работать"  "с"         "этими"    

[1] "модно"            "называть"         "блогер-рилейшенз" "―"                "начинается"       "задолго"         

[1] "видел" "по"    "сыну," "что"   "он"   

[1] "четырнадцать," "я"             "поселился"     "на"            "улице"        

[1] "широко"     "продолжали" "род."

Note the vectors are of different length.

What I want is to be able to read the first words from each sentence, the second word, the third, etc.

The desired result would be something like this:

    P1              P2           P3                 P4    P5           P6
[1] "модно"         "создавать"  "резюме"           "в"   "виде"       NA
[2] "ты"            "начианешь"  "работать"         "с"   "этими"      NA
[3] "модно"         "называть"   "блогер-рилейшенз" "―"   "начинается" "задолго"         
[4] "видел"         "по"         "сыну,"            "что" "он"         NA
[5] "четырнадцать," "я"          "поселился"        "на"  "улице"      NA
[6] "широко"        "продолжали" "род."             NA    NA           NA

I have tried to just use data.frame() but that didn't work because the rows are of different length. I also tried rbind.fill() from the plyr package, but that function can only process matrices.

I found some other questions here (that's where I got the plyr help from), but those were all about combining for instance two data frames of different size.

Thanks for your help.

3 Answers

One liner with plyr

plyr::ldply(word.list, rbind) 
try this:

word.list <- list(letters[1:4], letters[1:5], letters[1:2], letters[1:6]) n.obs <- sapply(word.list, length) seq.max <- seq_len(max(n.obs)) mat <- t(sapply(word.list, "[", i = seq.max)) 

the trick is, that,


returns the vector + two NAs

Another option is stri_list2matrix from library(stringi)

stri_list2matrix(l, byrow=TRUE)
#    [,1] [,2] [,3] [,4]
#[1,] "a"  "b"  "c"  NA  
#[2,] "a2" "b2" NA   NA  
#[3,] "a3" "b3" "c3" "d3"

NOTE: Data from @juba's post.

Or as @Valentin mentioned in the comments

sapply(l, "length<-", max(lengths(l)))

Or using tidyverse

tibble(V = l) %>% 
   unnest_wider(V, names_sep = "")
# A tibble: 3 × 4
  V1    V2    V3    V4   
  <chr> <chr> <chr> <chr>
1 a     b     c     <NA> 
2 a2    b2    <NA>  <NA> 
3 a3    b3    c3    d3   
