I'm trying to read a CSV file using numpy.recfromcsv(...)
where some of the fields have commas in them. The fields that have commas in them are surrounded by quotes i.e., "value1, value2"
. Numpy see's the quoted field as two different fields and it doesn't work very well. The command I'm using right now is
data = numpy.recfromcsv(dataFilename, delimiter=',', autstrip=True)
I found this question
Read CSV file with comma within fields in Python
But it doesn't use numpy
, which I'd really love to use.
So I'm hoping there are at least one of a few options here:
numpy.recfromcsv(...)
that will allow me to read a quoted field as one field instead of multiple comma separated fields?numpy
array.Please advise.
It is possible to do this with pandas:
np_array = pandas.io.parsers.read_csv("file_with_comma_fields_quoted.csv").as_matrix()
If you consider using native Python csv reader, with Python doc here:
Python csv reader defines some optional Dialect.quotechar
options, which defaults to '"'
. In the csv format standard, quotechar is another field delimiter, and the delimiter (comma in your case) may be included in the quoted field. Rules for quoting character in csv format are clear in first section of this page.
So, it seems that with default quoting character to "
, native Python csv reader manages your problem in default mode.
If you want to stick to Python, why not clean your csv file first, using regexp to identify quoted fields, and change delimiter from comma to \t
for instance. But here you are actually parsing csv format by yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With