I have a csv file with no headers. It has around 35 columns. I am reading this file using pandas. Currently, issue is that when it reads the file, it automatically assigns datatype to each columns. How to avoid assigning automatic data types? I have a column C, which I want to store as string instead of int. But pandas automatically assigns it to int I tried 2 things. 1) <pre class="prettyprint"><code>my_df = pd.DataFrame() my_df = pd.read_csv('my_csv_file.csv',names=['A','B','C'...'Z'],converters={'C':str},engine = 'python') </code></pre> Above code gives me error <pre class="prettyprint"><code>ValueError: Expected 37 fields in line 1, saw 35 </code></pre> If I remove, <code>converters={'C':str},engine = 'python'</code> there is no error 2) <pre class="prettyprint"><code>old_df['C'] = old_df['C'].astype(int) </code></pre> Issue with this approach is that, if the value in column is '00123', it has already been converted to 123 and then it converts it to '123'. It would lose initial Zeroes , because it thinks it is integer.

use dtype option or converters in read_csv read_csv doc, works regardless of using python engine or not: <pre class="prettyprint"><code>df = pd.DataFrame({'col1':['00123','00125'],'col2':[1,2],'col3':[1.0,2.0]}) df.to_csv('test.csv',index=False) new_df = pd.read_csv('test.csv',dtype={'col1':str,'col2':np.int64,'col3':np.float64}) </code></pre> If you simply use <code>dtype=str</code> then it will read every column in as a string (object). But you can not do that with <code>converters</code> as it expects a dictionary. You could substitute <code>converters</code> for <code>dtype</code> in above code and get same result.

Avoid converting data to int automatically while reading using pandas data frame

Tags:

python

pandas

csv

I have a csv file with no headers. It has around 35 columns.

I am reading this file using pandas. Currently, issue is that when it reads the file, it automatically assigns datatype to each columns.

How to avoid assigning automatic data types?

I have a column C, which I want to store as string instead of int. But pandas automatically assigns it to int

I tried 2 things.

my_df = pd.DataFrame()
my_df = pd.read_csv('my_csv_file.csv',names=['A','B','C'...'Z'],converters={'C':str},engine = 'python')

Above code gives me error

ValueError: Expected 37 fields in line 1, saw 35

If I remove, converters={'C':str},engine = 'python' there is no error

old_df['C'] = old_df['C'].astype(int)

Issue with this approach is that, if the value in column is '00123', it has already been converted to 123 and then it converts it to '123'. It would lose initial Zeroes , because it thinks it is integer.

986

asked Mar 04 '16 07:03

Neil

1 Answers

use dtype option or converters in read_csv read_csv doc, works regardless of using python engine or not:

df = pd.DataFrame({'col1':['00123','00125'],'col2':[1,2],'col3':[1.0,2.0]})
df.to_csv('test.csv',index=False)
new_df = pd.read_csv('test.csv',dtype={'col1':str,'col2':np.int64,'col3':np.float64})

If you simply use dtype=str then it will read every column in as a string (object). But you can not do that with converters as it expects a dictionary. You could substitute converters for dtype in above code and get same result.

197

answered Oct 16 '22 10:10

Zak Keirn

Related questions
                            
                                More memory efficient way to define many objects of the same type
                            
                                Interacting with live matplotlib plot
                            
                                Changing directory create times (ctime) using python in windows
                            
                                Using % wildcard with pg8000
                            
                                Django/PostgreSQL varchar to UUID
                            
                                Bokeh heatmap usage
                            
                                AWS Elastic BeansTalk Django cronjob post request returning 403 error
                            
                                How to merge two JSON file with pandas
                            
                                Key echo in Python in separate thread doesn't display first key stroke
                            
                                Why does setattr work differently for attributes and methods?
                            
                                Combining Python and Javascript in a chrome plugin
                            
                                Sqlalchemy AttributeError: 'NoneType' object has no attribute '_getter'
                            
                                Google Cloud - oauth2client.client.HttpAccessTokenRefreshError: invalid_grant
                            
                                TextVariable not working
                            
                                Detecting the centre of a curved shape with opencv
                            
                                Unable to patch class instantiated by the tested class using unittest
                            
                                Using different word2vec training data in spaCy
                            
                                Multidimensional symbolic matrix in Python
                            
                                How do I structure my Python project to allow named modules to be imported from sub directories
                            
                                Cannot map ForeignKey due to dual Primary Keys

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With