When I try to import a csv file into a dataframe pandas (0.13.1) is ignoring the dtype parameter. Is there a way to stop pandas from inferring the data type on its own?
I am merging several CSV files and sometimes the customer contains letters and pandas imports as a string. When I try to merge the two dataframes I get an error because I'm trying to merge two different types. I need everything stored as strings.
Data snippet:
|WAREHOUSE|ERROR|CUSTOMER|ORDER NO|
|---------|-----|--------|--------|
|3615 | |03106 |253734 |
|3615 | |03156 |290550 |
|3615 | |03175 |262207 |
|3615 | |03175 |262207 |
|3615 | |03175 |262207 |
|3615 | |03175 |262207 |
|3615 | |03175 |262207 |
|3615 | |03175 |262207 |
|3615 | |03175 |262207 |
Import line:
df = pd.read_csv("SomeFile.csv",
header=1,
skip_footer=1,
usecols=[2, 3],
dtype={'ORDER NO': str, 'CUSTOMER': str})
df.dtypes
outputs this:
ORDER NO int64
CUSTOMER int64
dtype: object
Pandas 0.13.1 silently ignored the dtype
argument because the c engine
does not support skip_footer
. This caused Pandas to fall back to the python engine
which does not support dtype
.
Solution? Use converters
df = pd.read_csv('SomeFile.csv',
header=1,
skip_footer=1,
usecols=[2, 3],
converters={'CUSTOMER': str, 'ORDER NO': str},
engine='python')
Output:
In [1]: df.dtypes
Out[2]:
CUSTOMER object
ORDER NO object
dtype: object
In [3]: type(df['CUSTOMER'][0])
Out[4]: str
In [5]: df.head()
Out[6]:
CUSTOMER ORDER NO
0 03106 253734
1 03156 290550
2 03175 262207
3 03175 262207
4 03175 262207
Leading 0's from the original file are preserved and all data is stored as strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With