I am working my way through Wes's Python For Data Analysis, and I've run into a strange problem that is not addressed in the book. In the code below, based on page 199 of his book, I create a dataframe and then use <code>pd.cut()</code> to create <code>cat_obj</code>. According to the book, <code>cat_obj</code> is <blockquote> "a special Categorical object. You can treat it like an array of strings indicating the bin name; internally it contains a levels array indicating the distinct category names along with a labeling for the ages data in the labels attribute" </blockquote> Awesome! However, if I use the exact same <code>pd.cut()</code> code (In [5] below) to create a new column of the dataframe (called <code>df['cat']</code>), that column is not treated as a special categorical variable but simply as a regular pandas series. How, then, do I create a column in a dataframe that is treated as a categorical variable? <pre class="prettyprint"><code>In [4]: import pandas as pd raw_data = {'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'score': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} df = pd.DataFrame(raw_data, columns = ['name', 'score']) bins = [0, 25, 50, 75, 100] group_names = ['Low', 'Okay', 'Good', 'Great'] In [5]: cat_obj = pd.cut(df['score'], bins, labels=group_names) df['cat'] = pd.cut(df['score'], bins, labels=group_names) In [7]: type(cat_obj) Out[7]: pandas.core.categorical.Categorical In [8]: type(df['cat']) Out[8]: pandas.core.series.Series </code></pre>

It might be happening because of this kind of behaviour by setter-: Sample getter and setter- <pre class="prettyprint"><code>class a: x = 1 @property def p(self): return int(self.x) @p.setter def p(self,v): self.x = v t = 1.32 a().p = 1.32 print type(t) --> <type 'float'> print type(a().p) --> <type 'int'> </code></pre> For now <code>df</code> only accepts <code>Series data</code> and its setter converts <code>Categorial data</code> into <code>Series</code>. <code>df</code> categorial support is due in Next Pandas release.

Categorical Variables In A Pandas Dataframe?

Tags:

python

pandas

categorical-data

I am working my way through Wes's Python For Data Analysis, and I've run into a strange problem that is not addressed in the book.

In the code below, based on page 199 of his book, I create a dataframe and then use pd.cut() to create cat_obj. According to the book, cat_obj is

"a special Categorical object. You can treat it like an array of strings indicating the bin name; internally it contains a levels array indicating the distinct category names along with a labeling for the ages data in the labels attribute"

Awesome! However, if I use the exact same pd.cut() code (In [5] below) to create a new column of the dataframe (called df['cat']), that column is not treated as a special categorical variable but simply as a regular pandas series.

How, then, do I create a column in a dataframe that is treated as a categorical variable?

In [4]:

import pandas as pd

raw_data = {'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'score': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['name', 'score'])

bins = [0, 25, 50, 75, 100]
group_names = ['Low', 'Okay', 'Good', 'Great']

In [5]:
cat_obj = pd.cut(df['score'], bins, labels=group_names)
df['cat'] = pd.cut(df['score'], bins, labels=group_names)
In [7]:

type(cat_obj)
Out[7]:
pandas.core.categorical.Categorical
In [8]:

type(df['cat'])
Out[8]:
pandas.core.series.Series

477

asked May 03 '14 23:05

Anton

1 Answers

It might be happening because of this kind of behaviour by setter-:

Sample getter and setter-

class a:
    x = 1
    @property
    def p(self):
        return int(self.x)

    @p.setter
    def p(self,v):
        self.x = v
t = 1.32
a().p = 1.32


print type(t) --> <type 'float'>
print type(a().p) --> <type 'int'>

For now df only accepts Series data and its setter converts Categorial data into Series. df categorial support is due in Next Pandas release.

134

answered Sep 24 '22 19:09

xrage

Related questions
                            
                                Python: Am I missing something? [closed]
                            
                                How to sum a 2d array in Python?
                            
                                VSCode fails to run python with this error: Error: Session cannot generate requests
                            
                                MVVM pattern with PySide
                            
                                Mix View and ViewSet in a browsable api_root
                            
                                pycharm / intellij - jupyter markdown not rendering
                            
                                dynamic module creation [duplicate]
                            
                                Improving pure Python prime sieve by recurrence formula
                            
                                Active window screenshot with Python PIL and windows API: how to deal with rounded corners?
                            
                                What is the "Default callback URL:" for Tumblr API v2?
                            
                                SQLAlchemy: How to make an integer column auto_increment (and unique) without making it a primary key?
                            
                                itertools.tee on a coroutine?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With