I have DataFrame
:
df=pd.DataFrame({'id':[1,2,3],'item1':['AK','CK',None],
'item2':['b','d','e'],'item3':['c','e',np.nan]})
I want to convert all values of the column item1
into lowercase.
I've tried:
df['item1'].apply(lambda x: x.lower())
That gave me an error :
AttributeError: 'NoneType' object has no attribute 'lower'
I know why it happened. One from my column value is None
.
I want to anyhow ignore that value and convert the rest of the values into lowercase.
Is there a way to overcome this?
P.S: My original DataFrame
may have any number of values as it is returned by another function. Dropping the row is not a case here as those records are important for me.
Quite simply:
df['item1'].apply(lambda x: x.lower() if x is not None else x)
If you want to handle other possible types (ints, floats etc) which don't have a lower()
method:
df['item1'].apply(lambda x: x.lower() if hasattr(x, "lower") and callable(x.lower) else x)
More general solution for None
and NaN
s values is use notnull
function, anothe solution is use list comprehension.
Also pandas string functions working very nice with None
and NaN
s:
df['new1'] = df['item1'].apply(lambda x: x.lower() if pd.notnull(x) else x)
df['new2'] = [x.lower() if pd.notnull(x) else x for x in df['item1']]
df['new3'] = df['item1'].str.lower()
print (df)
id item1 item2 item3 new1 new2 new3
0 1 AK b c ak ak ak
1 2 CK d e ck ck ck
2 3 None e NaN None None None
df=pd.DataFrame({'id':[1,2,3],'item1':['AK',np.nan,None],
'item2':['b','d','e'],'item3':['c','e',np.nan]})
print (df)
id item1 item2 item3
0 1 AK b c
1 2 NaN d e
2 3 None e NaN
df['new1'] = df['item1'].apply(lambda x: x.lower() if pd.notnull(x) else x)
df['new2'] = [x.lower() if pd.notnull(x) else x for x in df['item1']]
df['new3'] = df['item1'].str.lower()
print (df)
id item1 item2 item3 new1 new2 new3
0 1 AK b c ak ak ak
1 2 NaN d e NaN NaN NaN
2 3 None e NaN None None None
List comprehesnion is faster in big DataFrames if not necessary check missing values:
large = pd.Series([random.choice(string.ascii_uppercase) +
random.choice(string.ascii_uppercase)
for _ in range(100000)])
In [275]: %timeit [x.lower() if pd.notnull(x) else x for x in large]
73.3 ms ± 4.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [276]: %timeit large.str.lower()
28.2 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [277]: %timeit [x.lower() for x in large]
14.1 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With