Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy read CSV file where some fields have commas?

Tags:

python

csv

numpy

I'm trying to read a CSV file using numpy.recfromcsv(...) where some of the fields have commas in them. The fields that have commas in them are surrounded by quotes i.e., "value1, value2". Numpy see's the quoted field as two different fields and it doesn't work very well. The command I'm using right now is

    data = numpy.recfromcsv(dataFilename, delimiter=',', autstrip=True)

I found this question

Read CSV file with comma within fields in Python

But it doesn't use numpy, which I'd really love to use. So I'm hoping there are at least one of a few options here:

  1. What are some options to numpy.recfromcsv(...) that will allow me to read a quoted field as one field instead of multiple comma separated fields?
  2. Should I format my CSV file differently?
  3. (alternatively, but not ideally) Read CSV as in quoted question, with extra steps to create numpy array.

Please advise.

like image 619
jlconlin Avatar asked Oct 22 '22 19:10

jlconlin


2 Answers

It is possible to do this with pandas:

np_array = pandas.io.parsers.read_csv("file_with_comma_fields_quoted.csv").as_matrix()
like image 174
random.me Avatar answered Oct 27 '22 19:10

random.me


If you consider using native Python csv reader, with Python doc here:

Python csv reader defines some optional Dialect.quotechar options, which defaults to '"'. In the csv format standard, quotechar is another field delimiter, and the delimiter (comma in your case) may be included in the quoted field. Rules for quoting character in csv format are clear in first section of this page.

So, it seems that with default quoting character to ", native Python csv reader manages your problem in default mode.

If you want to stick to Python, why not clean your csv file first, using regexp to identify quoted fields, and change delimiter from comma to \t for instance. But here you are actually parsing csv format by yourself.

like image 26
kiriloff Avatar answered Oct 27 '22 20:10

kiriloff