Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas read_csv thousands separator does not work

I use pandas read_csv to read a simple csv file. However, it has ValueError: could not convert string to float: which I do not understand why.

The code is simply

rawdata = pd.read_csv( r'Journal_input.csv' ,
                      dtype = { 'Base Amount' : 'float64' } , 
                      thousands = ',' ,
                      decimal = '.',
                      encoding = 'ISO-8859-1')

But I get this error

pandas\parser.pyx in pandas.parser.TextReader.read (pandas\parser.c:10415)()

pandas\parser.pyx in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691)()

pandas\parser.pyx in pandas.parser.TextReader._read_rows (pandas\parser.c:11728)()

pandas\parser.pyx in pandas.parser.TextReader._convert_column_data (pandas\parser.c:13162)()

pandas\parser.pyx in pandas.parser.TextReader._convert_tokens (pandas\parser.c:14487)()

ValueError: could not convert string to float: '79,026,695.50'

How can it possible to get error when converting a string of '79,026,695.50' to float? I have already specified the two options

thousands = ',' ,
decimal = '.',

Is it some problem our my code or a bug in pandas?

like image 247
palazzo train Avatar asked Apr 16 '26 22:04

palazzo train


1 Answers

It seems there is problem with quoting, because if separator is , and thousands is , too, some quoting has to be in csv:

import pandas as pd
from pandas.compat import StringIO
import csv

temp=u"""'a','Base Amount'
'11','79,026,695.50'"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), 
                 dtype = { 'Base Amount' : 'float64' },
                 thousands = ',' ,
                 quotechar = "'",
                 quoting = csv.QUOTE_ALL,
                 decimal = '.',
                 encoding = 'ISO-8859-1')

print (df)
    a  Base Amount
0  11   79026695.5

temp=u'''"a","Base Amount"
"11","79,026,695.50"'''
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), 
                 dtype = { 'Base Amount' : 'float64' },
                 thousands = ',' ,
                 quotechar = '"',
                 quoting = csv.QUOTE_ALL,
                 decimal = '.',
                 encoding = 'ISO-8859-1')

print (df)
    a  Base Amount
0  11   79026695.5
like image 135
jezrael Avatar answered Apr 18 '26 11:04

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!