I have an address column in a dataframe in pandas with 3 types of information namely street, colony and city.
There are three values with two possible delimiters - either a ',' or a white-space e.g it can be either Street1,Colony1,City1
or Street1 Colony1 City1
.
I need to split this column into three with respective labels 'Street'
,'Colony'
and 'City'
with the values from this Address
column split accordingly.
What is the most efficient way to do this as the pandas split
function only allows you with a single delimiter or a regex expression (maybe a regex expression for this as I'm not very good with regex).
If you are certain it is either a comma ,
or a whitespace you could use:
df[['Street','Colony','City']] = df.address.str.split('[ ,]', expand=True)
Explanation: str.split accepts a pat (pattern) parameter: String or regular expression to split on. If not specified, split on whitespace. Using the fact we can pass a regular expression this becomes an easy task as [ ,]
in regex means either or
,
.
An alternative would be to use ' |,'
or if you can have multiple whitespace '\s+|,'
Full example:
import pandas as pd
df = pd.DataFrame({
'address': ['a,b,c','a b c']
})
df[['Street','Colony','City']] = df.address.str.split('[ ,]', expand=True)
print(df)
Returns:
address Street Colony City
0 a,b,c a b c
1 a b c a b c
Try this
df[['Street','Colony','City']] = df.address.apply(lambda x: pd.Series(re.split('\W',x)))
\W
will match any character which is not word character. See docs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With