I have a dataframe:
id info
1 Name: John Age: 12 Sex: Male
2 Name: Sara Age: 22 Sex: Female
3 Name: Mac Donald Age: 32 Sex: Male
I'm looking to split the info column into 3 columns such that i get the final output as:
id Name Age Sex
1 John 12 Male
2 Sara 22 Female
3 Mac Donald 32 Male
I tried using pandas split function.
df[['Name','Age','Sex']] = df.info.split(['Name'])
I might have to do this multiple times to get desired output.
Is there a better way to achieve this?
PS: The info column also contains NaN
values
Using Regex with named groups.
Ex:
df = pd.DataFrame({"Col": ['Name: John Age: 12 Sex: Male', 'Name: Sara Age: 22 Sex: Female', 'Name: Mac Donald Age: 32 Sex: Male']})
df = df['Col'].str.extract(r"Name:\s*(?P<Name>[A-Za-z\s]+)\s*Age:\s*(?P<Age>\d+)\s*Sex:\s*(?P<Sex>Male|Female)") # Or if spacing is standard use df['Col'].str.extract(r"Name: (?P<Name>[A-Za-z\s]+) Age: (?P<Age>\d+) Sex: (?P<Sex>Male|Female)")
print(df)
Output:
Name Age Sex
0 John 12 Male
1 Sara 22 Female
2 Mac Donald 32 Male
The regex is pretty tough to write / read, so you could replace with ,
for where you want separate into new columns and use str.split()
and pass expand=True
. You will need to set the result back to three new columns that you create with df[['Name', 'Age', 'Sex']]
:
df[['Name', 'Age', 'Sex']] = (df['info'].replace(['Name: ', ' Age: ', ' Sex: '], ['',',',','], regex=True)
.str.split(',', expand=True))
df
Out[1]:
id info Name Age Sex
0 1 Name: John Age: 12 Sex: Male John 12 Male
1 2 Name: Sara Age: 22 Sex: Female Sara 22 Female
2 3 Name: Mac Donald Age: 32 Sex: Male Mac Donald 32 Male
A quick oneliner can be
df[['Name', 'Age', 'Sex']] = df['info'].str.split('\s?\w+:\s?', expand=True).iloc[:, 1:]
Split using someword:
and then add new columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With