What I am doing seems simple, but I can not figure it out.
I have dataframe with data such as
City State ZIP
Ames IA 50011-3617
Ankeny IA 50021
I want to split the zipcodes by -
and save only the first ones in a new dataframe which has the old data and only the new zipcode. I tried to do the following.
data_short_zip = data
df = data['ZIP'].str.split('-').str[0]
data_short_zip.join(df)
This not only throws an error, but seems unpythonic. Is there a simple way to do this?
The output data would look like
City State ZIP
Ames IA 50011
Ankeny IA 50021
You can use str.split
to split on your delimeter and then str[0]
on the result to return the first split:
In [122]:
df['ZIP'] = df['ZIP'].str.split('-').str[0]
df
Out[122]:
City State ZIP
0 Ames IA 50011
1 Ankeny IA 50021
Ultimately, you want to scrape those first 5 characters and reassign to data.ZIP
. Here are some alternatives to scrape the first 5, all of which return the same thing.
0 50011
1 50021
Name: ZIP, dtype: object
data.ZIP.str.extract(r'^(\d{5})', expand=False)
data.ZIP.str[:5]
data.ZIP.str.split('-').str[0]
data.ZIP.str.split('-').str.get(0)
It's pretty clear to me ;-)
data.ZIP.str[:5]
is the winner.
Then just assign back to data.ZIP
data.ZIP = data.ZIP.str[:5]
data = pd.concat([data for _ in range(10000)])
data = pd.concat([data for _ in range(100)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With