Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument.
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
0.10.1 doesn't really support float32 very much
see this http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dtype-specification
you can do this in 0.11 like this:
# dont' use dtype converters explicity for the columns you care about
# they will be converted to float64 if possible, or object if they cannot
df = pd.read_csv('test.csv'.....)
#### this is optional and related to the issue you posted ####
# force anything that is not a numeric to nan
# columns are the list of columns that you are interesetd in
df[columns] = df[columns].convert_objects(convert_numeric=True)
# astype
df[columns] = df[columns].astype('float32')
see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion
Its not as efficient as doing it directly in read_csv (but that requires
some low-level changes)
I have confirmed that with 0.11-dev, this DOES work (on 32-bit and 64-bit, results are the same)
In [5]: x = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float32}, delim_whitespace=True)
In [6]: x
Out[6]:
a b
0 0.76398 0.81394
1 0.32136 0.91063
In [7]: x.dtypes
Out[7]:
a float32
b float64
dtype: object
In [8]: pd.__version__
Out[8]: '0.11.0.dev-385ff82'
In [9]: quit()
vagrant@precise32:~/pandas$ uname -a
Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux
In [22]: df.a.dtype = pd.np.float32
In [23]: df.a.dtype
Out[23]: dtype('float32')
the above works fine for me under pandas 0.10.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With