I have tried passing the dtype
parameter with read_csv
as dtype={n: pandas.Categorical}
but this does not work properly (the result is an Object). The manual is unclear.
Using the standard pandas Categorical constructor, we can create a category object. Here, the second argument signifies the categories. Thus, any value which is not present in the categories will be treated as NaN. Logically, the order means that, a is greater than b and b is greater than c.
Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs.
The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .
In version 0.19.0
you can use parameter dtype='category'
in read_csv
:
data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3' df = pd.read_csv(pd.compat.StringIO(data), dtype='category') print (df) col1 col2 col3 0 a b 1 1 a b 2 2 c d 3 print (df.dtypes) col1 category col2 category col3 category dtype: object
If want specify column for category use dtype
with dictionary:
df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'}) print (df) col1 col2 col3 0 a b 1 1 a b 2 2 c d 3 print (df.dtypes) col1 category col2 object col3 int64 dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With