I have a (fairly long) list of vectors. The vectors consist of Russian words that I got by using the strsplit()
function on sentences.
The following is what head()
returns:
[[1]]
[1] "модно" "создавать" "резюме" "в" "виде"
[[2]]
[1] "ты" "начианешь" "работать" "с" "этими"
[[3]]
[1] "модно" "называть" "блогер-рилейшенз" "―" "начинается" "задолго"
[[4]]
[1] "видел" "по" "сыну," "что" "он"
[[5]]
[1] "четырнадцать," "я" "поселился" "на" "улице"
[[6]]
[1] "широко" "продолжали" "род."
Note the vectors are of different length.
What I want is to be able to read the first words from each sentence, the second word, the third, etc.
The desired result would be something like this:
P1 P2 P3 P4 P5 P6
[1] "модно" "создавать" "резюме" "в" "виде" NA
[2] "ты" "начианешь" "работать" "с" "этими" NA
[3] "модно" "называть" "блогер-рилейшенз" "―" "начинается" "задолго"
[4] "видел" "по" "сыну," "что" "он" NA
[5] "четырнадцать," "я" "поселился" "на" "улице" NA
[6] "широко" "продолжали" "род." NA NA NA
I have tried to just use data.frame()
but that didn't work because the rows are of different length. I also tried rbind.fill()
from the plyr
package, but that function can only process matrices.
I found some other questions here (that's where I got the plyr
help from), but those were all about combining for instance two data frames of different size.
Thanks for your help.
For example, if we have a vector x then it can be converted to data frame by using as. data. frame(x) and this can be done for a matrix as well.
The Data frame can be converted from vectors in R. To create a data frame in R using the vector, we must first have a series of vectors containing data. The data. frame() function is used to create a data frame from vector in R.
data. frame() can be used to convert a list to R DataFrame or create a data frame from a list. If you want the elements in the list column-wise, then use cbind otherwise you can use rbind.
Lists can have components of the same type or mode, or components of different types or modes. They can hence combine different components (numeric, logical…) in a single object. A Data frame is simply a List of a specified class called “data.
One liner with plyr
plyr::ldply(word.list, rbind)
try this:
word.list <- list(letters[1:4], letters[1:5], letters[1:2], letters[1:6]) n.obs <- sapply(word.list, length) seq.max <- seq_len(max(n.obs)) mat <- t(sapply(word.list, "[", i = seq.max))
the trick is, that,
c(1:2)[1:4]
returns the vector + two NAs
Another option is stri_list2matrix
from library(stringi)
library(stringi)
stri_list2matrix(l, byrow=TRUE)
# [,1] [,2] [,3] [,4]
#[1,] "a" "b" "c" NA
#[2,] "a2" "b2" NA NA
#[3,] "a3" "b3" "c3" "d3"
NOTE: Data from @juba's post.
Or as @Valentin mentioned in the comments
sapply(l, "length<-", max(lengths(l)))
Or using tidyverse
library(purrr)
library(tidyr)
library(dplyr)
tibble(V = l) %>%
unnest_wider(V, names_sep = "")
# A tibble: 3 × 4
V1 V2 V3 V4
<chr> <chr> <chr> <chr>
1 a b c <NA>
2 a2 b2 <NA> <NA>
3 a3 b3 c3 d3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With