Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change multiple Pandas DF columns to categorical without a loop

I have a DataFrame where I want to change several columns from type 'object' to 'category'.

I can change several columns at the same time for float,

dftest[['col3', 'col4', 'col5', 'col6']] = \
    dftest[['col3', 'col4', 'col5', 'col6']].astype(float)

For 'category' I can not do it the same, I need to do one by one (or in a loop like here).

for col in ['col1', 'col2']:
    dftest[col] = dftest[col].astype('category')

Question: Is there any way of doing the change for all wanted columns at once like in the 'float' example?

If I try to do several columns at the same time I have:

dftest[['col1','col2']] = dftest[['col1','col2']].astype('category')
## NotImplementedError: > 1 ndim Categorical are not supported at this time

My current working test code:

import numpy as np
import pandas as pd 

factors= np.array([
        ['a', 'xx'],
        ['a', 'xx'],
        ['ab', 'xx'],
        ['ab', 'xx'],
        ['ab', 'yy'],
        ['cc', 'yy'],
        ['cc', 'zz'],
        ['d', 'zz'],
        ['d', 'zz'],
        ['g', 'zz'] 
        ])

values = np.random.randn(10,4).round(2)

dftest = pd.DataFrame(np.hstack([factors,values]), 
                  columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6'])

#dftest[['col1','col2']] = dftest[['col1','col2']].astype('category')
## NotImplementedError: > 1 ndim Categorical are not supported at this time

## it works with individual astype
#dftest['col2'] = dftest['col2'].astype('category')
#dftest['col1'] = dftest['col1'].astype('category')

print(dftest)

## doing a loop
for col in ['col1', 'col2']:
    dftest[col] = dftest[col].astype('category')


dftest[['col3', 'col4', 'col5', 'col6']] = \
    dftest[['col3', 'col4', 'col5', 'col6']].astype(float)

dftest.dtypes

output:

col1    category
col2    category
col3     float64
col4     float64
col5     float64
col6     float64
dtype: object

== [update] ==

I don't have a problem using the loop now that I know the trick, but I asked the question because I wanted to learn/understand WHY I need to do a loop for 'category' and not for float, if there is no other way of doing it.

like image 463
Pablo Marin-Garcia Avatar asked Aug 29 '16 14:08

Pablo Marin-Garcia


People also ask

How do I change the datatype of multiple columns in pandas?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

How do I replace multiple column values in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.

How do I change multiple columns to float in pandas?

To convert the data type of multiple columns to float, use Pandas' apply(~) method with to_numeric(~) .


1 Answers

It's not immediately clear what the result of dftest[['col1','col2']].astype('category') should be, i.e. whether the resulting columns should share the same categories or not.

Looping over columns make each column have a separate set of categories. (I believe this is a desired result in your example.)

On the other hand, .astype(float) works differently: it ravels the underlying values to a 1d array, casts it to floats, and then reshape it back to the original shape. This way it may be faster than just iterating over columns. You can emulate this behaviour for category with higher level functions:

result = dftest[['col1', 'col2']].stack().astype('category').unstack()

But then you get a single set of categories shared by the both columns:

result['col1']
Out[36]: 
0     a
1     a
2    ab
3    ab
4    ab
5    cc
6    cc
7     d
8     d
9     g
Name: col1, dtype: category
Categories (8, object): [a < ab < cc < d < g < xx < yy < zz]
like image 145
ptrj Avatar answered Sep 30 '22 14:09

ptrj