Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Closest equivalent of a factor variable in Python Pandas

Tags:

python

pandas

r

What is the closest equivalent to an R Factor variable in Python pandas?

like image 616
Amelio Vazquez-Reina Avatar asked Feb 27 '13 23:02

Amelio Vazquez-Reina


People also ask

How do you get categorical variables in pandas?

Categorical(val, categories = None, ordered = None, dtype = None) : It represents a categorical variable. Categorical are a pandas data type that corresponds to the categorical variables in statistics. Such variables take on a fixed and limited number of possible values.

What is categorical variable in Python?

Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, number of possible values ( categories ; levels in R). Examples are gender, social class, blood type, country affiliation, observation time or rating via Likert scales.

How do you convert numerical data to categorical data in Python?

There are many ways in which conversion can be done, one such way is by using Pandas' integrated cut-function. Pandas' cut function is a distinguished way of converting numerical continuous data into categorical data.

What does PD categorical do?

Categorical. Categoricals can only take on only a limited, and usually fixed, number of possible values ( categories ). In contrast to statistical categorical variables, a Categorical might have an order, but numerical operations (additions, divisions, …) are not possible.

What does PD Factorize do?

Encode the object as an enumerated type or categorical variable. This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values. factorize is available as both a top-level function pandas.


1 Answers

This question seems to be from a year back but since it is still open here's an update. pandas has introduced a categorical dtype and it operates very similar to factors in R. Please see this link for more information:

http://pandas-docs.github.io/pandas-docs-travis/categorical.html

Reproducing a snippet from the link above showing how to create a "factor" variable in pandas.

In [1]: s = Series(["a","b","c","a"], dtype="category")  In [2]: s Out[2]:  0    a 1    b 2    c 3    a dtype: category Categories (3, object): [a < b < c] 
like image 70
sriramn Avatar answered Sep 22 '22 09:09

sriramn