Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_csv ignoring column dtypes when I pass skip_footer arg

When I try to import a csv file into a dataframe pandas (0.13.1) is ignoring the dtype parameter. Is there a way to stop pandas from inferring the data type on its own?

I am merging several CSV files and sometimes the customer contains letters and pandas imports as a string. When I try to merge the two dataframes I get an error because I'm trying to merge two different types. I need everything stored as strings.

Data snippet:

|WAREHOUSE|ERROR|CUSTOMER|ORDER NO|
|---------|-----|--------|--------|
|3615     |     |03106   |253734  |
|3615     |     |03156   |290550  |
|3615     |     |03175   |262207  |
|3615     |     |03175   |262207  |
|3615     |     |03175   |262207  |
|3615     |     |03175   |262207  |
|3615     |     |03175   |262207  |
|3615     |     |03175   |262207  |
|3615     |     |03175   |262207  |

Import line:

df = pd.read_csv("SomeFile.csv", 
                 header=1,
                 skip_footer=1, 
                 usecols=[2, 3], 
                 dtype={'ORDER NO': str, 'CUSTOMER': str})

df.dtypes outputs this:

ORDER NO    int64
CUSTOMER    int64
dtype: object
like image 456
Ripster Avatar asked Jul 15 '14 14:07

Ripster


1 Answers

Pandas 0.13.1 silently ignored the dtype argument because the c engine does not support skip_footer. This caused Pandas to fall back to the python engine which does not support dtype.

Solution? Use converters

df = pd.read_csv('SomeFile.csv', 
                 header=1,
                 skip_footer=1, 
                 usecols=[2, 3], 
                 converters={'CUSTOMER': str, 'ORDER NO': str},
                 engine='python')

Output:

In [1]: df.dtypes
Out[2]:
CUSTOMER    object
ORDER NO    object
dtype: object

In [3]: type(df['CUSTOMER'][0])
Out[4]: str

In [5]: df.head()
Out[6]:
  CUSTOMER ORDER NO
0    03106   253734
1    03156   290550
2    03175   262207
3    03175   262207
4    03175   262207

Leading 0's from the original file are preserved and all data is stored as strings.

like image 192
Ripster Avatar answered Oct 14 '22 15:10

Ripster