I'm having real trouble converting a column into lowercase. It's not as simple as just using:
df['my_col'] = df['my_col'].str.lower()
because I'm iterating over a lot of dataframes, and some of them (but not all) have both strings and integers in the column of interest. This causes the lower function, if applied like above, to throw an exception:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Rather than forcing the type to be a string, I'd like to assess whether values are strings and then - if they are - convert them to lowercase, and - if they are not strings - leave them as they are. I thought this would work:
df = df.apply(lambda x: x.lower() if(isinstance(x, str)) else x)
But it doesn't work... probably because I'm overlooking something obvious, but I can't see what it is!
My data looks something like this:
OS Count
0 Microsoft Windows 3
1 Mac OS X 4
2 Linux 234
3 Don't have a preference 0
4 I prefer Windows and Unix 3
5 Unix 2
6 VMS 1
7 DOS or ZX Spectrum 2
Convert Pandas Column Names to lowercase with Pandas rename() More compact way to change a data frame's column names to lower case is to use Pandas rename() function. Here we specify columns argument with “str. lower” fucntion.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
The toLowerCase method converts a string to lowercase letters. The toLowerCase() method doesn't take in any parameters. Strings in JavaScript are immutable. The toLowerCase() method converts the string specified into a new one that consists of only lowercase letters and returns that value.
Convert Pandas column to lowercase We accomplish that using the str accessor and then applying the lower() function, which is available for strings.
The test in your lambda function isn't quite right, you weren't far from the truth though:
df.apply(lambda x: x.str.lower() if(x.dtype == 'object') else x)
With the data frame and output:
>>> df = pd.DataFrame(
[
{'OS': 'Microsoft Windows', 'Count': 3},
{'OS': 'Mac OS X', 'Count': 4},
{'OS': 'Linux', 'Count': 234},
{'OS': 'Dont have a preference', 'Count': 0},
{'OS': 'I prefer Windows and Unix', 'Count': 3},
{'OS': 'Unix', 'Count': 2},
{'OS': 'VMS', 'Count': 1},
{'OS': 'DOS or ZX Spectrum', 'Count': 2},
]
)
>>> df = df.apply(lambda x: x.str.lower() if x.dtype=='object' else x)
>>> print(df)
OS Count
0 microsoft windows 3
1 mac os x 4
2 linux 234
3 dont have a preference 0
4 i prefer windows and unix 3
5 unix 2
6 vms 1
7 dos or zx spectrum 2
This also works and is very readable:
for column in df.select_dtypes("object").columns:
df[column] = df[column].str.lower()
A possible drawback might be the for
loop over a subset of columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With