I have a csv file with no headers. It has around 35 columns.
I am reading this file using pandas. Currently, issue is that when it reads the file, it automatically assigns datatype to each columns.
How to avoid assigning automatic data types?
I have a column C, which I want to store as string instead of int. But pandas automatically assigns it to int
I tried 2 things.
1)
my_df = pd.DataFrame()
my_df = pd.read_csv('my_csv_file.csv',names=['A','B','C'...'Z'],converters={'C':str},engine = 'python')
Above code gives me error
ValueError: Expected 37 fields in line 1, saw 35
If I remove, converters={'C':str},engine = 'python'
there is no error
2)
old_df['C'] = old_df['C'].astype(int)
Issue with this approach is that, if the value in column is '00123', it has already been converted to 123 and then it converts it to '123'. It would lose initial Zeroes , because it thinks it is integer.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
pandas can represent integer data with possibly missing values using arrays.
use dtype option or converters in read_csv read_csv doc, works regardless of using python engine or not:
df = pd.DataFrame({'col1':['00123','00125'],'col2':[1,2],'col3':[1.0,2.0]})
df.to_csv('test.csv',index=False)
new_df = pd.read_csv('test.csv',dtype={'col1':str,'col2':np.int64,'col3':np.float64})
If you simply use dtype=str
then it will read every column in as a string (object). But you can not do that with converters
as it expects a dictionary. You could substitute converters
for dtype
in above code and get same result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With