I've got this data frame with data from IMDb in it. One of the columns has the movie title with the year attached in parentheses. Looks like this:
The Shawshank Redemption (1994)
What I really want is to have the title and year separate. I've tried a couple of different things (split, strsplit), but I've had no success. I try to split on the first parentheses, but the two split functions don't seem to like non-character arguments. Anyone have any thoughts?
The strsplit works on character columns. So, if the column is factor class, we need to convert it to character class (as.character(..)). Here, I matching zero or more space (\\s*) followed by parenetheses (\\() or | the closing parentheses (\\)) to split
strsplit(as.character(d1$v1), '\\s*\\(|\\)')[[1]]
#[1] "The Shawshank Redemption" "1994"
Or we can place the parentheses inside [] so that we don't have to escape \\ (as commented by @Avinash Raj)
strsplit(as.character(d1$v1), '\\s*[()]')[[1]]
v1 <- 'The Shawshank Redemption (1994)'
d1 <- data.frame(v1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With