Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to overcome 'NoneType' object has no attribute 'lower' error?

Tags:

python

pandas

I have DataFrame:

df=pd.DataFrame({'id':[1,2,3],'item1':['AK','CK',None],
'item2':['b','d','e'],'item3':['c','e',np.nan]})

I want to convert all values of the column item1 into lowercase.

I've tried:

df['item1'].apply(lambda x: x.lower())

That gave me an error :

AttributeError: 'NoneType' object has no attribute 'lower'

I know why it happened. One from my column value is None.

I want to anyhow ignore that value and convert the rest of the values into lowercase.

Is there a way to overcome this?

P.S: My original DataFrame may have any number of values as it is returned by another function. Dropping the row is not a case here as those records are important for me.

like image 335
Sociopath Avatar asked Dec 03 '22 11:12

Sociopath


2 Answers

Quite simply:

df['item1'].apply(lambda x: x.lower() if x is not None else x)

If you want to handle other possible types (ints, floats etc) which don't have a lower() method:

df['item1'].apply(lambda x: x.lower() if hasattr(x, "lower") and callable(x.lower)  else x)
like image 92
bruno desthuilliers Avatar answered Dec 22 '22 01:12

bruno desthuilliers


More general solution for None and NaNs values is use notnull function, anothe solution is use list comprehension.

Also pandas string functions working very nice with None and NaNs:

df['new1'] = df['item1'].apply(lambda x: x.lower() if pd.notnull(x) else x)

df['new2'] = [x.lower() if pd.notnull(x) else x for x in df['item1']]

df['new3'] = df['item1'].str.lower()
print (df)
   id item1 item2 item3  new1  new2  new3
0   1    AK     b     c    ak    ak    ak
1   2    CK     d     e    ck    ck    ck
2   3  None     e   NaN  None  None  None

df=pd.DataFrame({'id':[1,2,3],'item1':['AK',np.nan,None],
'item2':['b','d','e'],'item3':['c','e',np.nan]})
print (df)
   id item1 item2 item3
0   1    AK     b     c
1   2   NaN     d     e
2   3  None     e   NaN

df['new1'] = df['item1'].apply(lambda x: x.lower() if pd.notnull(x) else x)
df['new2'] = [x.lower() if pd.notnull(x) else x for x in df['item1']]
df['new3'] = df['item1'].str.lower()
print (df)
   id item1 item2 item3  new1  new2  new3
0   1    AK     b     c    ak    ak    ak
1   2   NaN     d     e   NaN   NaN   NaN
2   3  None     e   NaN  None  None  None

List comprehesnion is faster in big DataFrames if not necessary check missing values:

large = pd.Series([random.choice(string.ascii_uppercase) +
random.choice(string.ascii_uppercase)
for _ in range(100000)])

In [275]: %timeit [x.lower() if pd.notnull(x) else x for x in large]
73.3 ms ± 4.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [276]: %timeit large.str.lower()
28.2 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [277]: %timeit [x.lower() for x in large]
14.1 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 27
jezrael Avatar answered Dec 22 '22 01:12

jezrael