Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a string in a Python DataFrame

I have a DataFrame in Python with a column with names (such as Joseph Haydn, Wolfgang Amadeus Mozart, Antonio Salieri and so forth).

I want to get a new column with the last names: Haydn, Mozart, Salieri and so forth.

I know how to split a string, but I could not find a way to apply it to a series, or a Data Frame column.

like image 512
Rene Decol Avatar asked Sep 06 '15 15:09

Rene Decol


People also ask

How do you split a string into two strings in Python?

Use Newline (\n) Character In Python, the string is split by the use of the newline (\n) character.

How do you split str?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.


1 Answers

if you have:

import pandas
data = pandas.DataFrame({"composers": [ 
    "Joseph Haydn", 
    "Wolfgang Amadeus Mozart", 
    "Antonio Salieri",
    "Eumir Deodato"]})

assuming you want only the first name (and not the middle name like Amadeus):

data.composers.str.split('\s+').str[0]

will give:

0      Joseph
1    Wolfgang
2     Antonio
3       Eumir
dtype: object

you can assign this to a new column in the same dataframe:

data['firstnames'] = data.composers.str.split('\s+').str[0]

Last names would be:

data.composers.str.split('\s+').str[-1]

which gives:

0      Haydn
1     Mozart
2    Salieri
3    Deodato
dtype: object

(see also Python Pandas: selecting element in array column for accessing elements in an 'array' column)

For all but the last names you can apply " ".join(..) to all but the last element ([:-1]) of each row:

data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))

which gives:

0              Joseph
1    Wolfgang Amadeus
2             Antonio
3               Eumir
dtype: object
like image 104
Andre Holzner Avatar answered Oct 07 '22 23:10

Andre Holzner