I have a variable which contains the actor names.
(actor=structure(c(4L, 1L, 6L, 2L, 5L, 3L), .Label = c("Christian Bale, Tom Hardy, Anne Hathaway, Gary Oldman",
"Jamie Foxx, Christoph Waltz, Leonardo DiCaprio, Kerry Washington",
"Jennifer Lawrence, Josh Hutcherson, Liam Hemsworth, Stanley Tucci",
"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen Page, Ken Watanabe",
"Leonardo DiCaprio, Mark Ruffalo, Ben Kingsley, Max von Sydow",
"Robert Downey Jr., Chris Evans, Scarlett Johansson, Jeremy Renner"
), class = "factor"))
# [1] Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen Page, Ken Watanabe
# [2] Christian Bale, Tom Hardy, Anne Hathaway, Gary Oldman
# [3] Robert Downey Jr., Chris Evans, Scarlett Johansson, Jeremy Renner
# [4] Jamie Foxx, Christoph Waltz, Leonardo DiCaprio, Kerry Washington
# [5] Leonardo DiCaprio, Mark Ruffalo, Ben Kingsley, Max von Sydow
# [6] Jennifer Lawrence, Josh Hutcherson, Liam Hemsworth, Stanley Tucci
# 6 Levels: Christian Bale, Tom Hardy, Anne Hathaway, Gary Oldman ...
I want to extract all the complete actor names from it (name + surname) and make them columns in an output matrix.
If you wanted to extract the unique names of actors, you can get the indicated actors with the as.character function, split it on the commas with strsplit, combine together all vectors in the resulting list with unlist, and grab the unique names with unique:
(all.actors <- unique(unlist(strsplit(as.character(actor), ", "))))
# [1] "Leonardo DiCaprio" "Joseph Gordon-Levitt" "Ellen Page" "Ken Watanabe"
# [5] "Christian Bale" "Tom Hardy" "Anne Hathaway" "Gary Oldman"
# [9] "Robert Downey Jr." "Chris Evans" "Scarlett Johansson" "Jeremy Renner"
# [13] "Jamie Foxx" "Christoph Waltz" "Kerry Washington" "Mark Ruffalo"
# [17] "Ben Kingsley" "Max von Sydow" "Jennifer Lawrence" "Josh Hutcherson"
# [21] "Liam Hemsworth" "Stanley Tucci"
By using as.character(actor), this code uses only the actors that show up in the the factor actor, even if that factor has many more levels that are unused. If you use levels(actor) instead, you will get all the actors in the factor's levels, regardless of whether they are used in actors. You can use whichever you prefer when defining all.actors.
If you wanted a matrix indicating the inclusion of each actor in each element of actor, you could then do
mat <- sapply(strsplit(as.character(actor), ", "), function(x) all.actors %in% x)
row.names(mat) <- all.actors
mat
# [,1] [,2] [,3] [,4] [,5] [,6]
# Leonardo DiCaprio TRUE FALSE FALSE TRUE TRUE FALSE
# Joseph Gordon-Levitt TRUE FALSE FALSE FALSE FALSE FALSE
# Ellen Page TRUE FALSE FALSE FALSE FALSE FALSE
# Ken Watanabe TRUE FALSE FALSE FALSE FALSE FALSE
# Christian Bale FALSE TRUE FALSE FALSE FALSE FALSE
# Tom Hardy FALSE TRUE FALSE FALSE FALSE FALSE
# Anne Hathaway FALSE TRUE FALSE FALSE FALSE FALSE
# Gary Oldman FALSE TRUE FALSE FALSE FALSE FALSE
# Robert Downey Jr. FALSE FALSE TRUE FALSE FALSE FALSE
# Chris Evans FALSE FALSE TRUE FALSE FALSE FALSE
# Scarlett Johansson FALSE FALSE TRUE FALSE FALSE FALSE
# Jeremy Renner FALSE FALSE TRUE FALSE FALSE FALSE
# Jamie Foxx FALSE FALSE FALSE TRUE FALSE FALSE
# Christoph Waltz FALSE FALSE FALSE TRUE FALSE FALSE
# Kerry Washington FALSE FALSE FALSE TRUE FALSE FALSE
# Mark Ruffalo FALSE FALSE FALSE FALSE TRUE FALSE
# Ben Kingsley FALSE FALSE FALSE FALSE TRUE FALSE
# Max von Sydow FALSE FALSE FALSE FALSE TRUE FALSE
# Jennifer Lawrence FALSE FALSE FALSE FALSE FALSE TRUE
# Josh Hutcherson FALSE FALSE FALSE FALSE FALSE TRUE
# Liam Hemsworth FALSE FALSE FALSE FALSE FALSE TRUE
# Stanley Tucci FALSE FALSE FALSE FALSE FALSE TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With