For example, if I have a home address like this:
71 Pilgrim Avenue, Chevy Chase, MD
in a column named 'address'. I would like to split it into columns 'street', 'city', 'state', respectively.
What is the best way to achieve this using Pandas ?
I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex")
.
But the error I got is Must have equal len keys and value when setting with an iterable
.
Thank you for your help :)
Split column by delimiter into multiple columns Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.
Method 2: Pandas divide two columns using div() function It divides the columns elementwise. It accepts a scalar value, series, or dataframe as an argument for dividing with the axis. If the axis is 0 the division is done row-wise and if the axis is 1 then division is done column-wise.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
You can use split
by regex ,\s+
(,
and one or more whitespaces):
#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
address id street city \
0 71 Pilgrim Avenue, Chevy Chase, MD a 71 Pilgrim Avenue Chevy Chase
1 72 Main St, Chevy Chase, MD b 72 Main St Chevy Chase
state
0 MD
1 MD
And if need remove column address
add drop
:
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
id street city state
0 a 71 Pilgrim Avenue Chevy Chase MD
1 b 72 Main St Chevy Chase MD
df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
1: '72 Main St, Chevy Chase, MD'},
'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With