I have a pandas dataframe with a column named 'City, State, Country'. I want to separate this column into three new columns, 'City, 'State' and 'Country'. <pre class="prettyprint"><code>0 HUN 1 ESP 2 GBR 3 ESP 4 FRA 5 ID, USA 6 GA, USA 7 Hoboken, NJ, USA 8 NJ, USA 9 AUS </code></pre> Splitting the column into three columns is trivial enough: <pre class="prettyprint"><code>location_df = df['City, State, Country'].apply(lambda x: pd.Series(x.split(','))) </code></pre> However, this creates left-aligned data: <pre class="prettyprint"><code> 0 1 2 0 HUN NaN NaN 1 ESP NaN NaN 2 GBR NaN NaN 3 ESP NaN NaN 4 FRA NaN NaN 5 ID USA NaN 6 GA USA NaN 7 Hoboken NJ USA 8 NJ USA NaN 9 AUS NaN NaN </code></pre> How would one go about creating the new columns with the data right-aligned? Would I need to iterate through every row, count the number of commas and handle the contents individually?

Assume you have the column name as target <pre class="prettyprint"><code>df[["City", "State", "Country"]] = df["target"].str.split(pat=",", expand=True) </code></pre>

Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries

Tags:

python

split

pandas

I have a pandas dataframe with a column named 'City, State, Country'. I want to separate this column into three new columns, 'City, 'State' and 'Country'.

0                 HUN 1                 ESP 2                 GBR 3                 ESP 4                 FRA 5             ID, USA 6             GA, USA 7    Hoboken, NJ, USA 8             NJ, USA 9                 AUS

Splitting the column into three columns is trivial enough:

location_df = df['City, State, Country'].apply(lambda x: pd.Series(x.split(',')))

However, this creates left-aligned data:

     0       1       2 0    HUN     NaN     NaN 1    ESP     NaN     NaN 2    GBR     NaN     NaN 3    ESP     NaN     NaN 4    FRA     NaN     NaN 5    ID      USA     NaN 6    GA      USA     NaN 7    Hoboken  NJ     USA 8    NJ      USA     NaN 9    AUS     NaN     NaN

How would one go about creating the new columns with the data right-aligned? Would I need to iterate through every row, count the number of commas and handle the contents individually?

885

asked Apr 26 '14 22:04

jamesbev

2 Answers

I'd do something like the following:

foo = lambda x: pd.Series([i for i in reversed(x.split(','))]) rev = df['City, State, Country'].apply(foo) print rev        0    1        2 0   HUN  NaN      NaN 1   ESP  NaN      NaN 2   GBR  NaN      NaN 3   ESP  NaN      NaN 4   FRA  NaN      NaN 5   USA   ID      NaN 6   USA   GA      NaN 7   USA   NJ  Hoboken 8   USA   NJ      NaN 9   AUS  NaN      NaN

I think that gets you what you want but if you also want to pretty things up and get a City, State, Country column order, you could add the following:

rev.rename(columns={0:'Country',1:'State',2:'City'},inplace=True) rev = rev[['City','State','Country']] print rev       City State Country 0      NaN   NaN     HUN 1      NaN   NaN     ESP 2      NaN   NaN     GBR 3      NaN   NaN     ESP 4      NaN   NaN     FRA 5      NaN    ID     USA 6      NaN    GA     USA 7  Hoboken    NJ     USA 8      NaN    NJ     USA 9      NaN   NaN     AUS

answered Sep 24 '22 04:09

Karl D.

Assume you have the column name as target

df[["City", "State", "Country"]] = df["target"].str.split(pat=",", expand=True)

answered Sep 23 '22 04:09

Dolittle Wang

Related questions
                            
                                Releasing memory of huge numpy array in IPython
                            
                                How should I stop a busy cell in an iPython notebook?
                            
                                How to properly use coverage.py in Python?
                            
                                \text does not work in a matplotlib label
                            
                                Get the column names of a python numpy ndarray
                            
                                Are Python built-in containers thread-safe?
                            
                                TypeError: unhashable type: 'list' when using built-in set function
                            
                                Python debugger: Stepping into a function that you have called interactively
                            
                                Python Pandas: Is Order Preserved When Using groupby() and agg()?
                            
                                selecting attribute values from lxml
                            
                                scikit-learn cross validation, negative values with mean squared error
                            
                                What does sudo -H do?
                            
                                What causes "indexing past lexsort depth" warning in Pandas?
                            
                                Passing double quote shell commands in python to subprocess.Popen()?
                            
                                Deprecation status of the NumPy matrix class
                            
                                Python: if not val, vs if val is None
                            
                                Why is matrix multiplication faster with numpy than with ctypes in Python?
                            
                                Read Bash variables into a Python script
                            
                                How to include a local table of contents into Sphinx doc?
                            
                                How to plot the lines first and points last in matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With