Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a function to a string split command in Pandas

Tags:

python

pandas

I have a dataframe that has 20 or so columns in it. One of the columns is called 'director_name' and has values such as 'John Doe' or 'Jane Doe'. I want to split this into 2 columns, 'First_Name' and 'Last_Name'. When I run the following it works as expected and splits the string into 2 columns:

data[['First_Name', 'Last_Name']] = data.director_name.str.split(' ', expand 
= True) 
data

First_Name    Last_Name
John          Doe

It works great, however it does NOT work when I have NULL (NaN) values under 'director_name'. It throws the following error:

'Columns must be same length as key'

I'd like to add a function which checks if the value != null, then do the command listed above, otherwise enter 'NA' for First_Name and 'Last_Name'

Any ideas how I would go about that?

EDIT:

I just checked the file and I'm not sure if NULL is the issue. I have some names that are 3-4 strings long. i.e.

John Allen Doe
John Allen Doe Jr

Maybe I can't split this into First_Name and Last_Name.

Hmmmm

like image 515
JD2775 Avatar asked Dec 08 '22 18:12

JD2775


1 Answers

Here is a way is to split and choose say the first two values as first name and last name

    Id  name
0   1   James Cameron
1   2   Martin Sheen
2   3   John Allen Doe
3   4   NaN


df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]

You get

    Id  name            First_Name  Last_Name
0   1   James Cameron   James       Cameron
1   2   Martin Sheen    Martin      Sheen
2   3   John Allen Doe  John        Allen
3   4   NaN             NaN         None
like image 178
Vaishali Avatar answered Jan 05 '23 00:01

Vaishali