I have a DataFrame in Python with a column with names (such as Joseph Haydn, Wolfgang Amadeus Mozart, Antonio Salieri and so forth).
I want to get a new column with the last names: Haydn, Mozart, Salieri and so forth.
I know how to split a string, but I could not find a way to apply it to a series, or a Data Frame column.
Use Newline (\n) Character In Python, the string is split by the use of the newline (\n) character.
The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.
if you have:
import pandas
data = pandas.DataFrame({"composers": [
"Joseph Haydn",
"Wolfgang Amadeus Mozart",
"Antonio Salieri",
"Eumir Deodato"]})
assuming you want only the first name (and not the middle name like Amadeus):
data.composers.str.split('\s+').str[0]
will give:
0 Joseph
1 Wolfgang
2 Antonio
3 Eumir
dtype: object
you can assign this to a new column in the same dataframe:
data['firstnames'] = data.composers.str.split('\s+').str[0]
Last names would be:
data.composers.str.split('\s+').str[-1]
which gives:
0 Haydn
1 Mozart
2 Salieri
3 Deodato
dtype: object
(see also Python Pandas: selecting element in array column for accessing elements in an 'array' column)
For all but the last names you can apply " ".join(..)
to all but the last element ([:-1]
) of each row:
data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))
which gives:
0 Joseph
1 Wolfgang Amadeus
2 Antonio
3 Eumir
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With