For example, the values in '/tmp/test.csv' (namely, 01
, 02
, 03
) are meant to represent strings that happen to match /^\d+$/
, as opposed to integers:
In [10]: print open('/tmp/test.csv').read()
A,B,C
01,02,03
By default, pandas.read_csv
converts these values to integers:
In [11]: import pandas
In [12]: pandas.read_csv('/tmp/test.csv')
Out[12]:
A B C
0 1 2 3
I want to tell pandas.read_csv
to leave all these values alone. I.e., perform no conversions whatsoever. Furthermore, I want this "please do nothing" directive to be applied across-the-board, without my having to specify any column names or numbers.
I tried this, which achieved nothing:
In [13]: import csv
In [14]: pandas.read_csv('/tmp/test.csv', quoting=csv.QUOTE_ALL)
Out[14]:
A B C
0 1 2 3
The only thing that worked was to define a big ol' ConstantDict
class, and use an instance of it that always returns the identity function (lambda x: x
) as the value for the converters
parameter, and thereby trick pandas.read_csv
into doing nothing:
In [15]: %cpaste
class ConstantDict(dict):
def __init__(self, value):
self.__value = value
def get(self, *args):
return self.__value
--
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
::::::
In [16]: pandas.read_csv('/tmp/test.csv', converters=ConstantDict(lambda x: x))
Out[16]:
A B C
0 01 02 03
That's a lot of gymnastics to get such a simple "please do nothing" request across. (It would be even more gymnastics if I were to make ConstantDict
bullet-proof.)
Isn't there a simpler way to achieve this?
df = pd.read_csv('temp.csv', dtype=str)
From the docs:
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’). Use str or object to preserve and not interpret dtype.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With