Using Pandas 'categorical' dtype with sklearn

Tags:

Is there any support in sklearn to use Panda's Categorical datatype directly in fitting models? From what I've seen sklearn does not support this datatype which is unfortunate because the Categorical datatype both encodes categorical data and contains the mapping scheme of the data. In addition categorical encoding is purely a data handling/processing problem so it seems more natural that it would be handled by Pandas.

Note

I realize there are several methods to encode categorical variables in Pandas and sklearn - that's not what I'm asking about.

386

asked Jun 15 '15 18:06

toes

1 Answers

Cross-posting from the issue-tracker:

I think these are at least two separate questions: 1. can / will sklearn support pandas dataframes with categorical features as input 2. can / will sklearn support operating on categorical variables via pandas categorical datatypes.

would be more or less converting all categorical variables into one-hot encoded features, aka dummy columns. That is really easy to do for the user. We could do that "under the hood" in scikit-learn, but it would complicate the code and I don't see a great benefit.
Is basically impossible. Having a categorical datatype would be nice for the trees, but I think pandas has no stable c-level interface, so we can't really tab into that. Even if there was, it would still require a substantial rewrite of the tree code. I don't think it would be helpful for non-tree estimators.

126

answered Sep 22 '22 08:09

Andreas Mueller

Related questions
                            
                                HTML form POST to a python script?
                            
                                Python vs Javascript floating point arithmetic giving very different answers. What am I doing wrong?
                            
                                Conditionally evaluated debug statements in Python
                            
                                Jinja's loop variable is not available in include-d templates
                            
                                'circos' style plots with matplotlib? [closed]
                            
                                Why is it possible to iterate along a string?
                            
                                Jython, use only a method from Python from Java?
                            
                                ImportError: Environment variable DJANGO_SETTINGS_MODULE is undefined
                            
                                Run shell script using fabric and piping script text to shell's stdin
                            
                                Does PEP 412 make __slots__ redundant?
                            
                                Python Django requirements.txt
                            
                                How to test exceptions with doctest in Python 2.x and 3.x?
                            
                                Drawing multiplex graphs with networkx?
                            
                                How to use USING clause in Alembic/SQLAchemy?
                            
                                Python. Redirect stdout to a socket
                            
                                Knowing an item's location in an array [duplicate]
                            
                                High numerical precision floats with MySQL and the SQLAlchemy ORM
                            
                                PostgreSQL + Python: Close connection
                            
                                multiprocessing.Process (with spawn method): which objects are inherited?
                            
                                Is there a way to compare Arabic characters without regard to their initial/medial/final form?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Pandas 'categorical' dtype with sklearn

Tags:

python

pandas

scikit-learn

categorical-data

toes

People also ask

1 Answers

Andreas Mueller

Recent Activity

Donate For Us