I have a pandas dataframe with a column named 'City, State, Country'. I want to separate this column into three new columns, 'City, 'State' and 'Country'.
0 HUN 1 ESP 2 GBR 3 ESP 4 FRA 5 ID, USA 6 GA, USA 7 Hoboken, NJ, USA 8 NJ, USA 9 AUS
Splitting the column into three columns is trivial enough:
location_df = df['City, State, Country'].apply(lambda x: pd.Series(x.split(',')))
However, this creates left-aligned data:
0 1 2 0 HUN NaN NaN 1 ESP NaN NaN 2 GBR NaN NaN 3 ESP NaN NaN 4 FRA NaN NaN 5 ID USA NaN 6 GA USA NaN 7 Hoboken NJ USA 8 NJ USA NaN 9 AUS NaN NaN
How would one go about creating the new columns with the data right-aligned? Would I need to iterate through every row, count the number of commas and handle the contents individually?
Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
I'd do something like the following:
foo = lambda x: pd.Series([i for i in reversed(x.split(','))]) rev = df['City, State, Country'].apply(foo) print rev 0 1 2 0 HUN NaN NaN 1 ESP NaN NaN 2 GBR NaN NaN 3 ESP NaN NaN 4 FRA NaN NaN 5 USA ID NaN 6 USA GA NaN 7 USA NJ Hoboken 8 USA NJ NaN 9 AUS NaN NaN
I think that gets you what you want but if you also want to pretty things up and get a City, State, Country column order, you could add the following:
rev.rename(columns={0:'Country',1:'State',2:'City'},inplace=True) rev = rev[['City','State','Country']] print rev City State Country 0 NaN NaN HUN 1 NaN NaN ESP 2 NaN NaN GBR 3 NaN NaN ESP 4 NaN NaN FRA 5 NaN ID USA 6 NaN GA USA 7 Hoboken NJ USA 8 NaN NJ USA 9 NaN NaN AUS
Assume you have the column name as target
df[["City", "State", "Country"]] = df["target"].str.split(pat=",", expand=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With