Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas string series return NaN for len() function?

I am working with a power consumption dataset in Pandas that includes ZIP codes as a column, but the datatype for this column is an integer in the original CSV file. I'd like to change this column to a string/object datatype, and here's what I've done so far:

df = pd.read_csv('...kWh_consumption_by_ZIP.csv')
df.head()

The resulting dataframe head looks like this:

enter image description here

As mentioned above, when I check df.dtypes, I see that ZIP is listed as int64 data type, so I run the following code to overwrite the existing series and change it to an object data type:

df['ZIP'] = df.ZIP.astype(object)

Everything looks good when I check the df.ZIP series (at least, it looks good to the naked eye):

Screenshot2

But when I check the length of each row in the series using the len function:

df.ZIP.str.len()

...the resulting series just returns NaN for each row (see screenshot below).

enter image description here

Does anyone know why this is this happening? Thanks in advance for the help.

like image 694
Will Avatar asked May 10 '26 19:05

Will


1 Answers

TL;DR

You have a column of integers, and casting to object has not solved your problem. Instead, typecast to str and you should be good.

df.ZIP.astype(str).str.len()

For some reason, pandas supports the str accessor on object columns. Because object columns can contain any object, and pandas makes no assumptions. If the object is a string or any valid container, a valid result is returned. Otherwise, NaN.

Here's an example:

x = [{'a': 1}, 'abcde', None, 123, 45, [1, 2, 3, 4]]
y = pd.Series(x)

y

0        {'a': 1}
1           abcde
2            None
3             123
4              45
5    [1, 2, 3, 4]
dtype: object

y.str.len()
Out[741]: 
0    1.0
1    5.0
2    NaN
3    NaN
4    NaN
5    4.0
dtype: float64

Contrast with:

y = pd.Series([1, 2, 3, 4, 5])
y

0    1
1    2
2    3
3    4
4    5
dtype: int64

y.dtype
dtype('int64')

y.str.len()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-744-acc1c109a4a4> in <module>()
----> 1 y.str.len()

y.astype(object).str.len()

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
dtype: float64
like image 84
cs95 Avatar answered May 13 '26 10:05

cs95