Converting a column of type 'list' to multiple columns in a data frame

Question

I have a data frame with one column which is a list, like so:

>head(movies$genre_list)
[[1]]
[1] "drama"   "action"  "romance"
[[2]]
[1] "crime" "drama"
[[3]]
[1] "crime"   "drama"   "mystery"
[[4]]
[1] "thriller" "indie"  
[[5]]
[1] "thriller"
[[6]]
[1] "drama"  "family"

I want to convert this one column to multiple columns, one for each unique element across the lists (in this case, genres), and have them as binary columns. I'm looking for an elegant solution, which doesn't involve first finding out how many genres are there, and then creating a column for each, and then checking each list element to then populate the genre columns. I tried unlist, but it doesn't work with a vector of lists in the way I want.

Thanks!

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

Here are a few approaches:

movies <- data.frame(genre_list = I(list(
   c("drama",   "action",  "romance"),
   c("crime", "drama"),
   c("crime",   "drama",   "mystery"),
   c("thriller", "indie"),  
   c("thriller"),
   c("drama",  "family"))))

Update, years later....

You can use the mtabulate function from "qdapTools" or the unexported charMat function from my "splitstackshape" package.

Syntax would be:

library(qdapTools)
mtabulate(movies$genre_list)
#   action crime drama family indie mystery romance thriller
# 1      1     0     1      0     0       0       1        0
# 2      0     1     1      0     0       0       0        0
# 3      0     1     1      0     0       1       0        0
# 4      0     0     0      0     1       0       0        1
# 5      0     0     0      0     0       0       0        1
# 6      0     0     1      1     0       0       0        0

or

splitstackshape:::charMat(movies$genre_list, fill = 0)
#      action crime drama family indie mystery romance thriller
# [1,]      1     0     1      0     0       0       1        0
# [2,]      0     1     1      0     0       0       0        0
# [3,]      0     1     1      0     0       1       0        0
# [4,]      0     0     0      0     1       0       0        1
# [5,]      0     0     0      0     0       0       0        1
# [6,]      0     0     1      1     0       0       0        0

Update: A couple of more direct approaches

Improved option 1: Use table somewhat directly:

table(rep(1:nrow(movies), sapply(movies$genre_list, length)), 
      unlist(movies$genre_list, use.names=FALSE))

Improved option 2: Use a for loop.

x <- unique(unlist(movies$genre_list, use.names=FALSE))
m <- matrix(0, ncol = length(x), nrow = nrow(movies), dimnames = list(NULL, x))
for (i in 1:nrow(m)) {
  m[i, movies$genre_list[[i]]] <- 1
}
m

Below is the OLD answer

Convert the list to a list of tables (in turn converted to data.frames):

tables <- lapply(seq_along(movies$genre_list), function(x) {
  temp <- as.data.frame.table(table(movies$genre_list[[x]]))
  names(temp) <- c("Genre", paste("Record", x, sep = "_"))
  temp
})

Use Reduce to merge the resulting list. If I understand your end goal correctly, this results in the transposed form of the result you are interested in.

merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables)
merged_tables
#      Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6
# 1   action        1       NA       NA       NA       NA       NA
# 2    drama        1        1        1       NA       NA        1
# 3  romance        1       NA       NA       NA       NA       NA
# 4    crime       NA        1        1       NA       NA       NA
# 5  mystery       NA       NA        1       NA       NA       NA
# 6    indie       NA       NA       NA        1       NA       NA
# 7 thriller       NA       NA       NA        1        1       NA
# 8   family       NA       NA       NA       NA       NA        1

Transposing and converting NA to 0 is pretty straightforward. Just drop the first column and re-use it as the column names for the new data.frame

movie_genres <- setNames(data.frame(t(merged_tables[-1])), merged_tables[[1]])
movie_genres[is.na(movie_genres)] <- 0
movie_genres

Converting a column of type 'list' to multiple columns in a data frame

Tags:

list

dataframe

r

New High Score

1 Answers

Update, years later....

Update: A couple of more direct approaches

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us

Converting a column of type 'list' to multiple columns in a data frame

Tags:

list

dataframe

r

New High Score

1 Answers

Update, years later....

Update: A couple of more direct approaches

A5C1D2H2I1M1N2O1R2T1

Related questions

Recent Activity

Donate For Us