Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string and integer in Pandas series - Python

I have one column in a Pandas dataframe with "title of the movie" and "Year" (ex. "Toy Story (1995)") all in the same string. I have to split them in 2 different columns and of course the year must be an integer. I tried with this method (below) but the year remains a "object" type because it has parenthesis. Also, it doesn't work for one movie (there's still a title)...

split_movie = movies["Movie"].str.rsplit(" ", n = 1, expand=True)
movies["Movie Title"] = split_movie[0]
movies["Movie Year"] = split_movie[1]

I don't know if I can use the pd.year method or if I have to split the string in Python by creating a list...

Thanks for your help!

like image 344
Simone Bucci Avatar asked Jan 21 '26 17:01

Simone Bucci


2 Answers

Use str.extractall:

>>> df.join(df['Movie'].str.extractall(r'\s*(.*\S)\s*\((\d{4})\)') \
                       .rename(columns={0: 'Movie Title', 1: 'Movie Year'}) \
                       .reset_index(drop=True))

              Movie Movie Title Movie Year
0  Toy Story (1995)   Toy Story       1995

Regular expression was enhanced by @Bill.

like image 160
Corralien Avatar answered Jan 23 '26 06:01

Corralien


Keeping closer to your original code...

Try:

movies[['Title', 'Year']] = movies["Movie"].str.rsplit("(", n=1, expand=True)
movies['Year'] = movies['Year'].str.replace(')', '', regex=False)
movies['Year'] = movies['Year'].astype('int64')
print(movies.info())

Outputs:

 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Movie    15 non-null     object 
 1   Title    15 non-null     object
 2   Year     15 non-null     int64 
like image 39
MDR Avatar answered Jan 23 '26 07:01

MDR