I have a CSV dataset with 40 features that I am handling with Pandas. 7 features are continuous (int32
) and the rest of them are categorical.
My question is :
Should I use the dtype('category')
of Pandas for the categorical features, or can I let the default dtype('object')
?
The category data type in pandas is a hybrid data type. It looks and behaves like a string in many instances but internally is represented by an array of integers. This allows the data to be sorted in a custom order and to more efficiently store the data.
Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, number of possible values ( categories ; levels in R).
Categorical are a Pandas data type. A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory. The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”).
astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type.
Use a category when there is lots of repetition that you expect to exploit.
For example, suppose I want the aggregate size per exchange for a large table of trades. Using the default object
is totally reasonable:
In [6]: %timeit trades.groupby('exch')['size'].sum() 1000 loops, best of 3: 1.25 ms per loop
But since the list of possible exchanges is pretty small, and because there is lots of repetition, I could make this faster by using a category
:
In [7]: trades['exch'] = trades['exch'].astype('category') In [8]: %timeit trades.groupby('exch')['size'].sum() 1000 loops, best of 3: 702 µs per loop
Note that categories are really a form of dynamic enumeration. They are most useful if the range of possible values is fixed and finite.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With