Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to read categorical columns with pandas' read_csv?

I have tried passing the dtype parameter with read_csv as dtype={n: pandas.Categorical} but this does not work properly (the result is an Object). The manual is unclear.

like image 696
Emre Avatar asked May 16 '15 05:05

Emre


People also ask

How does pandas get categorical data?

Using the standard pandas Categorical constructor, we can create a category object. Here, the second argument signifies the categories. Thus, any value which is not present in the categories will be treated as NaN. Logically, the order means that, a is greater than b and b is greater than c.

What data type does read_csv return?

Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs.

What is the difference between Read_table and read_csv in pandas?

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .


1 Answers

In version 0.19.0 you can use parameter dtype='category' in read_csv:

data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3' df = pd.read_csv(pd.compat.StringIO(data), dtype='category') print (df)   col1 col2 col3 0    a    b    1 1    a    b    2 2    c    d    3  print (df.dtypes) col1    category col2    category col3    category dtype: object 

If want specify column for category use dtype with dictionary:

df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'}) print (df)   col1 col2  col3 0    a    b     1 1    a    b     2 2    c    d     3  print (df.dtypes) col1    category col2      object col3       int64 dtype: object 
like image 73
jezrael Avatar answered Sep 18 '22 20:09

jezrael