Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split one column into multiple columns in Pandas using regular expression?

Tags:

python

pandas

For example, if I have a home address like this:

71 Pilgrim Avenue, Chevy Chase, MD

in a column named 'address'. I would like to split it into columns 'street', 'city', 'state', respectively.

What is the best way to achieve this using Pandas ?

I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex").

But the error I got is Must have equal len keys and value when setting with an iterable.

Thank you for your help :)

like image 860
designil Avatar asked May 02 '17 05:05

designil


People also ask

How do I split a column into multiple columns in pandas?

Split column by delimiter into multiple columns Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

How do you split a list inside a DataFrame cell into columns in pandas?

To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.

How do I divide columns in pandas?

Method 2: Pandas divide two columns using div() function It divides the columns elementwise. It accepts a scalar value, series, or dataframe as an argument for dividing with the axis. If the axis is 0 the division is done row-wise and if the axis is 1 then division is done column-wise.

How do you split items into multiple columns in a data frame?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.


2 Answers

You can use split by regex ,\s+ (, and one or more whitespaces):

#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
                              address id             street          city  \
0  71 Pilgrim Avenue, Chevy Chase, MD  a  71 Pilgrim Avenue   Chevy Chase   
1         72 Main St, Chevy Chase, MD  b         72 Main St   Chevy Chase   

  state  
0    MD  
1    MD  

And if need remove column address add drop:

df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
  id             street         city state
0  a  71 Pilgrim Avenue  Chevy Chase    MD
1  b         72 Main St  Chevy Chase    MD
like image 122
jezrael Avatar answered Sep 21 '22 16:09

jezrael


df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
      1: '72 Main St, Chevy Chase, MD'},
     'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())
like image 40
Allen Avatar answered Sep 23 '22 16:09

Allen