Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert column values to lower case only if they are string

I'm having real trouble converting a column into lowercase. It's not as simple as just using:

df['my_col'] = df['my_col'].str.lower()

because I'm iterating over a lot of dataframes, and some of them (but not all) have both strings and integers in the column of interest. This causes the lower function, if applied like above, to throw an exception:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Rather than forcing the type to be a string, I'd like to assess whether values are strings and then - if they are - convert them to lowercase, and - if they are not strings - leave them as they are. I thought this would work:

df = df.apply(lambda x: x.lower() if(isinstance(x, str)) else x)

But it doesn't work... probably because I'm overlooking something obvious, but I can't see what it is!

My data looks something like this:

                          OS    Count
0          Microsoft Windows     3
1                   Mac OS X     4
2                      Linux     234
3    Don't have a preference     0
4  I prefer Windows and Unix     3
5                       Unix     2
6                        VMS     1
7         DOS or ZX Spectrum     2
like image 643
user4896331 Avatar asked Aug 22 '17 10:08

user4896331


People also ask

How do I convert a column to a lowercase in a data frame?

Convert Pandas Column Names to lowercase with Pandas rename() More compact way to change a data frame's column names to lower case is to use Pandas rename() function. Here we specify columns argument with “str. lower” fucntion.

How do you replace values in a column based on condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

What STR method convert a string to lower case?

The toLowerCase method converts a string to lowercase letters. The toLowerCase() method doesn't take in any parameters. Strings in JavaScript are immutable. The toLowerCase() method converts the string specified into a new one that consists of only lowercase letters and returns that value.

How do you make a column name lowercase in Pandas?

Convert Pandas column to lowercase We accomplish that using the str accessor and then applying the lower() function, which is available for strings.


2 Answers

The test in your lambda function isn't quite right, you weren't far from the truth though:

df.apply(lambda x: x.str.lower() if(x.dtype == 'object') else x)

With the data frame and output:

>>> df = pd.DataFrame(
    [
        {'OS': 'Microsoft Windows', 'Count': 3},
        {'OS': 'Mac OS X', 'Count': 4},
        {'OS': 'Linux', 'Count': 234},
        {'OS': 'Dont have a preference', 'Count': 0},
        {'OS': 'I prefer Windows and Unix', 'Count': 3},
        {'OS': 'Unix', 'Count': 2},
        {'OS': 'VMS', 'Count': 1},
        {'OS': 'DOS or ZX Spectrum', 'Count': 2},
    ]
)
>>> df = df.apply(lambda x: x.str.lower() if x.dtype=='object' else x)
>>> print(df)
                          OS  Count
0          microsoft windows      3
1                   mac os x      4
2                      linux    234
3     dont have a preference      0
4  i prefer windows and unix      3
5                       unix      2
6                        vms      1
7         dos or zx spectrum      2
like image 128
ysearka Avatar answered Sep 22 '22 02:09

ysearka


This also works and is very readable:

for column in df.select_dtypes("object").columns:
    df[column] = df[column].str.lower()

A possible drawback might be the for loop over a subset of columns.

like image 44
schneiderfelipe Avatar answered Sep 21 '22 02:09

schneiderfelipe