Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid converting data to int automatically while reading using pandas data frame

Tags:

python

pandas

csv

I have a csv file with no headers. It has around 35 columns.

I am reading this file using pandas. Currently, issue is that when it reads the file, it automatically assigns datatype to each columns.

How to avoid assigning automatic data types?

I have a column C, which I want to store as string instead of int. But pandas automatically assigns it to int

I tried 2 things.

1)

my_df = pd.DataFrame()
my_df = pd.read_csv('my_csv_file.csv',names=['A','B','C'...'Z'],converters={'C':str},engine = 'python')

Above code gives me error

ValueError: Expected 37 fields in line 1, saw 35

If I remove, converters={'C':str},engine = 'python' there is no error

2)

old_df['C'] = old_df['C'].astype(int)

Issue with this approach is that, if the value in column is '00123', it has already been converted to 123 and then it converts it to '123'. It would lose initial Zeroes , because it thinks it is integer.

like image 986
Neil Avatar asked Mar 04 '16 07:03

Neil


People also ask

Which pandas method converts an object to an integer data type?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

Can we pass integer in data frame in pandas?

pandas can represent integer data with possibly missing values using arrays.


1 Answers

use dtype option or converters in read_csv read_csv doc, works regardless of using python engine or not:

df = pd.DataFrame({'col1':['00123','00125'],'col2':[1,2],'col3':[1.0,2.0]})
df.to_csv('test.csv',index=False)
new_df = pd.read_csv('test.csv',dtype={'col1':str,'col2':np.int64,'col3':np.float64})

If you simply use dtype=str then it will read every column in as a string (object). But you can not do that with converters as it expects a dictionary. You could substitute converters for dtype in above code and get same result.

like image 197
Zak Keirn Avatar answered Oct 16 '22 10:10

Zak Keirn