How do I convert this dataframe
location value 0 (Richmond, Virginia, nan, USA) 100 1 (New York City, New York, nan, USA) 200
to this:
city state region country value 0 Richmond Virginia nan USA 100 1 New York City New York nan USA 200
Note that the location
column in the first dataframe contains tuples. I want to create four columns out of the location
column.
We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.
To split a tuple, just list the variable names separated by commas on the left-hand side of an equals sign, and then a tuple on the right-hand side.
To convert a Python tuple to DataFrame, use the pd. DataFrame() constructor that accepts a tuple as an argument and it returns a DataFrame.
new_col_list = ['city','state','regions','country']
for n,col in enumerate(new_col_list):
df[col] = df['location'].apply(lambda location: location[n])
df = df.drop('location',axis=1)
If you return a Series of the (split) location, you can merge (join
to merge on index) the resulting DF directly with your value column.
addr = ['city', 'state', 'region', 'country']
df[['value']].join(df.location.apply(lambda loc: Series(loc, index=addr)))
value city state region country
0 100 Richmond Virginia NaN USA
1 200 New York City New York NaN USA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With