Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R- Convert column of lists into different columns using their values as names (dummy) [duplicate]

Tags:

split

dataframe

r

I have a table containing data of movies, and in the last column, it has the categories the movie belongs to.

  movieId                              title                   category
       1                   Toy Story (1995)  Animation|Children|Comedy
       2                     Jumanji (1995) Adventure|Children|Fantasy
       3            Grumpier Old Men (1995)             Comedy|Romance
       4           Waiting to Exhale (1995)               Comedy|Drama
       5 Father of the Bride Part II (1995)                     Comedy
       6                        Heat (1995)      Action|Crime|Thriller

I want to create one column for each category and put 1 if it was written in the list for that movie and zero if not. Something like:

movieId title   animation   comedy  drama
1        xx        1           0      1
2        xy        1           0      0
3        yy        1           1      0

So far, I have only converted the string to a list with:

f<-function(x) {strsplit(x, split='|', fixed=TRUE)}
movies2$m<-lapply(movies2$category, f)

But I don't know how to do the rest.

I was thinking of Python dictionaries. But I don't know how to do this in R.

Data

df1 <- read.table(header = TRUE, stringsAsFactors = FALSE,
                  text = " movieId                              title                   category
                  1                   'Toy Story (1995)'  Animation|Children|Comedy
                  2                     'Jumanji (1995)' Adventure|Children|Fantasy
                  3            'Grumpier Old Men (1995)'             Comedy|Romance
                  4           'Waiting to Exhale (1995)'               Comedy|Drama
                  5 'Father of the Bride Part II (1995)'                     Comedy
                  6                        'Heat (1995)'      Action|Crime|Thriller")
like image 340
GabyLP Avatar asked Jun 17 '16 17:06

GabyLP


1 Answers

We can use mtabulate from qdapTools after splitting

library(qdapTools)
cbind(df1[-3],mtabulate(strsplit(df1$category, "[|]")))
# movieId                              title Action Adventure Animation Children Comedy Crime Drama Fantasy Romance Thriller
#1       1                   Toy Story (1995)      0         0         1        1      1     0     0       0       0        0
#2       2                     Jumanji (1995)      0         1         0        1      0     0     0       1       0        0
#3       3            Grumpier Old Men (1995)      0         0         0        0      1     0     0       0       1        0
#4       4           Waiting to Exhale (1995)      0         0         0        0      1     0     1       0       0        0
#5       5 Father of the Bride Part II (1995)      0         0         0        0      1     0     0       0       0        0
#6       6                        Heat (1995)      1         0         0        0      0     1     0       0       0        1

Or using base R

cbind(df1[-3], as.data.frame.matrix(table(stack(setNames(strsplit(df1$category,
                           "[|]"), df1$movieId))[2:1])))
like image 168
akrun Avatar answered Oct 17 '22 07:10

akrun