Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get length of values in pandas dataframe column

I'm trying to get the length of each zipCd value in the dataframe mentioned below. When I run the code below I get 958 for every record. I'm expecting to get something more like '4'. Does anyone see what the issue is?

Code:
zipDfCopy['zipCd'].str.len()

Data:
print zipDfCopy[1:5]

   Zip Code  Place Name          State State Abbreviation     County  \
1       544  Holtsville       New York                 NY    Suffolk   
2      1001      Agawam  Massachusetts                 MA    Hampden   
3      1002     Amherst  Massachusetts                 MA  Hampshire   
4      1003     Amherst  Massachusetts                 MA  Hampshire   

   Latitude  Longitude                                              zipCd  
1   40.8154   -73.0451  0          501\n1          544\n2         1001...  
2   42.0702   -72.6227  0          501\n1          544\n2         1001...  
3   42.3671   -72.4646  0          501\n1          544\n2         1001...  
4   42.3919   -72.5248  0          501\n1          544\n2         1001...  
like image 561
modLmakur Avatar asked Dec 13 '22 17:12

modLmakur


1 Answers

One way is to convert to string and use pd.Series.map with len built-in.

pd.Series.str is used for vectorized string functions, while pd.Series.astype is used to change column type.

import pandas as pd

df = pd.DataFrame({'ZipCode': [341, 4624, 536, 123, 462, 4642]})

df['ZipLen'] = df['ZipCode'].astype(str).map(len)

#    ZipCode  ZipLen
# 0      341       3
# 1     4624       4
# 2      536       3
# 3      123       3
# 4      462       3
# 5     4642       4

A more explicit alternative is to use np.log10:

df['ZipLen'] = np.floor(np.log10(df['ZipCode'].values)).astype(int) + 1
like image 63
jpp Avatar answered Dec 16 '22 06:12

jpp