I am new to Pandas... I want to a simple and generic way to find which columns are <code>categorical</code> in my <code>DataFrame</code>, when I don't manually specify each column type, unlike in this SO question. The <code>df</code> is created with: <pre class="prettyprint"><code>import pandas as pd df = pd.read_csv("test.csv", header=None) </code></pre> e.g. <pre class="prettyprint"><code> 0 1 2 3 4 0 1.539240 0.423437 -0.687014 Chicago Safari 1 0.815336 0.913623 1.800160 Boston Safari 2 0.821214 -0.824839 0.483724 New York Safari </code></pre> . UPDATE (2018/02/04) The question assumes numerical columns are NOT categorical, @Zero's accepted answer solves this. BE CAREFUL - As @Sagarkar's comment points out that's not always true. The difficulty is that Data Types and Categorical/Ordinal/Nominal types are orthogonal concepts, thus mapping between them isn't straightforward. @Jeff's answer below specifies the precise manner to achieve the manual mapping.

You could use <code>df._get_numeric_data()</code> to get numeric columns and then find out categorical columns <pre class="prettyprint"><code>In [66]: cols = df.columns In [67]: num_cols = df._get_numeric_data().columns In [68]: num_cols Out[68]: Index([u'0', u'1', u'2'], dtype='object') In [69]: list(set(cols) - set(num_cols)) Out[69]: ['3', '4'] </code></pre>

Check which columns in DataFrame are Categorical

Tags:

python

pandas

I am new to Pandas... I want to a simple and generic way to find which columns are categorical in my DataFrame, when I don't manually specify each column type, unlike in this SO question. The df is created with:

import pandas as pd
df = pd.read_csv("test.csv", header=None)

e.g.

           0         1         2         3        4
0   1.539240  0.423437 -0.687014   Chicago   Safari
1   0.815336  0.913623  1.800160    Boston   Safari
2   0.821214 -0.824839  0.483724  New York   Safari

UPDATE (2018/02/04) The question assumes numerical columns are NOT categorical, @Zero's accepted answer solves this.

BE CAREFUL - As @Sagarkar's comment points out that's not always true. The difficulty is that Data Types and Categorical/Ordinal/Nominal types are orthogonal concepts, thus mapping between them isn't straightforward. @Jeff's answer below specifies the precise manner to achieve the manual mapping.

560

asked Apr 22 '15 16:04

pds

3 Answers

You could use df._get_numeric_data() to get numeric columns and then find out categorical columns

In [66]: cols = df.columns  In [67]: num_cols = df._get_numeric_data().columns  In [68]: num_cols Out[68]: Index([u'0', u'1', u'2'], dtype='object')  In [69]: list(set(cols) - set(num_cols)) Out[69]: ['3', '4']

194

answered Sep 22 '22 08:09

Zero

The way I found was updating to Pandas v0.16.0, then excluding number dtypes with:

df.select_dtypes(exclude=["number","bool_","object_"])

Which works, providing no types are changed and no more are added to NumPy. The suggestion in the question's comments by @Jeff suggests include=["category"], but that didn't seem to work.

NumPy Types: link

Numpy Types

answered Sep 24 '22 08:09

pds

For posterity. The canonical method to select dtypes is .select_dtypes. You can specify an actual numpy dtype or convertible, or 'category' which not a numpy dtype.

In [1]: df = DataFrame({'A' : Series(range(3)).astype('category'), 'B' : range(3), 'C' : list('abc'), 'D' : np.random.randn(3) })

In [2]: df
Out[2]: 
   A  B  C         D
0  0  0  a  0.141296
1  1  1  b  0.939059
2  2  2  c -2.305019

In [3]: df.select_dtypes(include=['category'])
Out[3]: 
   A
0  0
1  1
2  2

In [4]: df.select_dtypes(include=['object'])
Out[4]: 
   C
0  a
1  b
2  c

In [5]: df.select_dtypes(include=['object']).dtypes
Out[5]: 
C    object
dtype: object

In [6]: df.select_dtypes(include=['category','int']).dtypes
Out[6]: 
A    category
B       int64
dtype: object

In [7]: df.select_dtypes(include=['category','int','float']).dtypes
Out[7]: 
A    category
B       int64
D     float64
dtype: object

answered Sep 22 '22 08:09

Jeff

Related questions
                            
                                how to run python script without typing 'python ...'
                            
                                mysql error : ERROR 1018 (HY000): Can't read dir of '.' (errno: 13)
                            
                                How to use python-docx to replace text in a Word document and save
                            
                                Remove the default delete action in Django admin
                            
                                How do I properly setup pipenv in PyCharm?
                            
                                How to prevent "ImportError: No module named oauth2client.client" on Google App Engine?
                            
                                How to use Django model inheritance with signals?
                            
                                Multiply several matrices in numpy
                            
                                Cross Entropy in PyTorch
                            
                                IndexError: list index out of range and python
                            
                                How can I make `bin(30)` return `00011110` instead of `0b11110`? [duplicate]
                            
                                How can a non-assigned string in Python have an address in memory?
                            
                                Python Tkinter clearing a frame
                            
                                P-value from Chi sq test statistic in Python
                            
                                python jsonify dictionary in utf-8
                            
                                django static annotation
                            
                                how to read json object in python [duplicate]
                            
                                RequestsDependencyWarning: urllib3 (1.25.2) or chardet (3.0.4) doesn't match a supported version! Fix
                            
                                How to install SciPy on Apple Silicon (ARM / M1)
                            
                                parameterized test with cartesian product of arguments in pytest

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With