Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify the `dtype` of index when read a csv file to `DataFrame`?

Tags:

python

pandas

In python 3.4.3 and Pandas 0.16, how to specify the dtype of index as str? The following code is what I have tried:

In [1]: from io import StringIO

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: fra = pd.read_csv(StringIO('date,close\n20140101,10.2\n20140102,10.5'), index_col=0, dtype={'date': np.str_, 'close': np.float})

In [5]: fra.index
Out[5]: Int64Index([20140101, 20140102], dtype='int64')
like image 415
Eastsun Avatar asked Apr 22 '15 09:04

Eastsun


1 Answers

It looks like the param index_col=0 is taking precedence over the dtype param, if you drop the index_col param then you can call set_index after:

In [235]:

fra = pd.read_csv(io.StringIO('date,close\n20140101,10.2\n20140102,10.5'), dtype={'date': np.str_, 'close': np.float})
fra
Out[235]:
       date  close
0  20140101   10.2
1  20140102   10.5
In [236]:

fra = fra.set_index('date')
fra.index
Out[236]:
Index(['20140101', '20140102'], dtype='object')

An alternative is to drop the index_col param and just call set_index on the df returned from read_csv so it becomes a one-liner:

In [237]:

fra = pd.read_csv(io.StringIO('date,close\n20140101,10.2\n20140102,10.5'), dtype={'date': np.str_, 'close': np.float}).set_index('date')
fra.index
Out[237]:
Index(['20140101', '20140102'], dtype='object')

Update

This is a bug which is targeted for version 0.17.0

like image 144
EdChum Avatar answered Oct 13 '22 09:10

EdChum