I have a table containing data of movies, and in the last column, it has the categories the movie belongs to.
movieId title category
1 Toy Story (1995) Animation|Children|Comedy
2 Jumanji (1995) Adventure|Children|Fantasy
3 Grumpier Old Men (1995) Comedy|Romance
4 Waiting to Exhale (1995) Comedy|Drama
5 Father of the Bride Part II (1995) Comedy
6 Heat (1995) Action|Crime|Thriller
I want to create one column for each category and put 1 if it was written in the list for that movie and zero if not. Something like:
movieId title animation comedy drama
1 xx 1 0 1
2 xy 1 0 0
3 yy 1 1 0
So far, I have only converted the string to a list with:
f<-function(x) {strsplit(x, split='|', fixed=TRUE)}
movies2$m<-lapply(movies2$category, f)
But I don't know how to do the rest.
I was thinking of Python dictionaries. But I don't know how to do this in R.
Data
df1 <- read.table(header = TRUE, stringsAsFactors = FALSE,
text = " movieId title category
1 'Toy Story (1995)' Animation|Children|Comedy
2 'Jumanji (1995)' Adventure|Children|Fantasy
3 'Grumpier Old Men (1995)' Comedy|Romance
4 'Waiting to Exhale (1995)' Comedy|Drama
5 'Father of the Bride Part II (1995)' Comedy
6 'Heat (1995)' Action|Crime|Thriller")
We can use mtabulate
from qdapTools
after splitting
library(qdapTools)
cbind(df1[-3],mtabulate(strsplit(df1$category, "[|]")))
# movieId title Action Adventure Animation Children Comedy Crime Drama Fantasy Romance Thriller
#1 1 Toy Story (1995) 0 0 1 1 1 0 0 0 0 0
#2 2 Jumanji (1995) 0 1 0 1 0 0 0 1 0 0
#3 3 Grumpier Old Men (1995) 0 0 0 0 1 0 0 0 1 0
#4 4 Waiting to Exhale (1995) 0 0 0 0 1 0 1 0 0 0
#5 5 Father of the Bride Part II (1995) 0 0 0 0 1 0 0 0 0 0
#6 6 Heat (1995) 1 0 0 0 0 1 0 0 0 1
Or using base R
cbind(df1[-3], as.data.frame.matrix(table(stack(setNames(strsplit(df1$category,
"[|]"), df1$movieId))[2:1])))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With