I need to extract the ZIP code (only the zip code) into a new column for further analysis. I am mostly using pandas within my data cleaning phase. I trying to use this code before:
import pandas as pd
df_participant = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/dqthon-participants.csv')
df_participant['postal_code'] = df_participant['address'].str.extract(r'([0-9]\d+)')
print (df_participant[['address','postal_code']].head())
but it did not work
this is the output:

Any help would be very much appreciated! Thank you
You can use .str.findall method to find all the numbers in address field and then get the last value as Zip code.
Here is an example:
Data:
customer address
0 shovon 1234 56th St, Bham, AL 35222
1 arsho 4th Ave, Dever, NY 25699
2 arshovon 1245 apt 9 69th St, Rio, FL 54444
3 rahman this address has no number
Code:
import pandas as pd
data = {
"customer": [
"shovon", "arsho", "arshovon", "rahman"
],
"address": [
"1234 56th St, Bham, AL 35222",
"4th Ave, Dever, NY 25699",
"1245 apt 9 69th St, Rio, FL 54444",
"this address has no number"
]
}
df = pd.DataFrame(data)
df['postal_code'] = df['address'].str.findall(r'([0-9]\d+)').apply(
lambda x: x[-1] if len(x) >= 1 else '')
print(df)
Output:
customer address postal_code
0 shovon 1234 56th St, Bham, AL 35222 35222
1 arsho 4th Ave, Dever, NY 25699 25699
2 arshovon 1245 apt 9 69th St, Rio, FL 54444 54444
3 rahman this address has no number
Explanation:
This will search for each number group in the address field and set the last number as zip code. If there is no number in the address field, it will set an empty string as zip code.
References:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With