Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a column in a data frame?

Tags:

dataframe

r

I've got this data frame with data from IMDb in it. One of the columns has the movie title with the year attached in parentheses. Looks like this:

The Shawshank Redemption (1994)

What I really want is to have the title and year separate. I've tried a couple of different things (split, strsplit), but I've had no success. I try to split on the first parentheses, but the two split functions don't seem to like non-character arguments. Anyone have any thoughts?

like image 204
milk Avatar asked May 20 '26 07:05

milk


1 Answers

The strsplit works on character columns. So, if the column is factor class, we need to convert it to character class (as.character(..)). Here, I matching zero or more space (\\s*) followed by parenetheses (\\() or | the closing parentheses (\\)) to split

strsplit(as.character(d1$v1), '\\s*\\(|\\)')[[1]]
#[1] "The Shawshank Redemption" "1994"         

Or we can place the parentheses inside [] so that we don't have to escape \\ (as commented by @Avinash Raj)

strsplit(as.character(d1$v1), '\\s*[()]')[[1]]

data

v1 <- 'The Shawshank Redemption (1994)'
d1 <- data.frame(v1)
like image 63
akrun Avatar answered May 21 '26 21:05

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!