Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove four last digits from string – Convert Zip+4 to Zip code

Tags:

python

pandas

The following piece of code...

data = np.array([['','state','zip_code','collection_status'],
                ['42394','CA','92637-2854', 'NaN'],
                ['58955','IL','60654', 'NaN'],
                ['108365','MI','48021-1319', 'NaN'],
                ['109116','MI','48228', 'NaN'],
                ['110833','IL','60008-4227', 'NaN']])

print(pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))

... gives the following data frame:

         state            zip_code    collection_status
42394       CA          92637-2854                  NaN
58955       IL               60654                  NaN
108365      MI          48021-1319                  NaN
109116      MI               48228                  NaN
110833      IL          60008-4227                  NaN

The goal is to homogenise the "zip_code" column into a 5-digits format–i.e. I want to remove the last four last digits from zip_code when that particular data point has 9 digits instead of 5. BTW, zip_code's type is "object" type.

Any idea?

like image 326
Antonio Serrano Avatar asked Oct 12 '25 08:10

Antonio Serrano


1 Answers

Use indexing with str only, thanks John Galt:

df['collection_status'] = df['zip_code'].str[:5]
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008

If need add conditions use where or numpy.where:

df['collection_status'] = df['zip_code'].where(df['zip_code'].str.len() == 5, 
                                               df['zip_code'].str[:5])
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008

df['collection_status'] = np.where(df['zip_code'].str.len() == 5, 
                                   df['zip_code'],
                                   df['zip_code'].str[:5])
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008
like image 77
jezrael Avatar answered Oct 14 '25 05:10

jezrael