Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Categorical Variables In A Pandas Dataframe?

I am working my way through Wes's Python For Data Analysis, and I've run into a strange problem that is not addressed in the book.

In the code below, based on page 199 of his book, I create a dataframe and then use pd.cut() to create cat_obj. According to the book, cat_obj is

"a special Categorical object. You can treat it like an array of strings indicating the bin name; internally it contains a levels array indicating the distinct category names along with a labeling for the ages data in the labels attribute"

Awesome! However, if I use the exact same pd.cut() code (In [5] below) to create a new column of the dataframe (called df['cat']), that column is not treated as a special categorical variable but simply as a regular pandas series.

How, then, do I create a column in a dataframe that is treated as a categorical variable?

In [4]:

import pandas as pd

raw_data = {'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'score': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['name', 'score'])

bins = [0, 25, 50, 75, 100]
group_names = ['Low', 'Okay', 'Good', 'Great']

In [5]:
cat_obj = pd.cut(df['score'], bins, labels=group_names)
df['cat'] = pd.cut(df['score'], bins, labels=group_names)
In [7]:

type(cat_obj)
Out[7]:
pandas.core.categorical.Categorical
In [8]:

type(df['cat'])
Out[8]:
pandas.core.series.Series
like image 477
Anton Avatar asked May 03 '14 23:05

Anton


People also ask

How do pandas handle categorical data?

The basic strategy is to convert each category value into a new column and assign a 1 or 0 (True/False) value to the column. This has the benefit of not weighting a value improperly. There are many libraries out there that support one-hot encoding but the simplest one is using pandas ' . get_dummies() method.

How do you create a categorical column in pandas?

DataFrame(dtype=”category”) : For creating a categorical dataframe, dataframe() method has dtype attribute set to category. All the columns in data-frame can be converted to categorical either during or after construction by specifying dtype=”category” in the DataFrame constructor.

How do you filter categorical columns in pandas?

For categorical data you can use Pandas string functions to filter the data. The startswith() function returns rows where a given column contains values that start with a certain value, and endswith() which returns rows with values that end with a certain value.


1 Answers

It might be happening because of this kind of behaviour by setter-:

Sample getter and setter-

class a:
    x = 1
    @property
    def p(self):
        return int(self.x)

    @p.setter
    def p(self,v):
        self.x = v
t = 1.32
a().p = 1.32


print type(t) --> <type 'float'>
print type(a().p) --> <type 'int'>

For now df only accepts Series data and its setter converts Categorial data into Series. df categorial support is due in Next Pandas release.

like image 134
xrage Avatar answered Sep 24 '22 19:09

xrage