Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting csv.Sniffer to work with quoted values

Tags:

python

csv

I'm trying to use python's CSV sniffer tool as suggested in many StackOverflow answers to guess if a given CSV file is delimited by ; or ,.

It's working fine with basic files, but when a value contains a delimiter, it is surrounded by double quotes (as the standard goes), and the sniffer throws _csv.Error: Could not determine delimiter.

Has anyone experienced that before?

Here is a minimal failing CSV file:

column1,column2
0,"a, b"

And the proof of concept:

Python 3.5.1 (default, Dec  7 2015, 12:58:09) 
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> f = open("example.csv", "r")
>>> f.seek(0);
0
>>> csv.Sniffer().sniff(f.read(), delimiters=';,')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/csv.py", line 186, in sniff
    raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter

I have total control over the generation of input CSV file; but sometimes it is modified by a third party using MS Office and the delimiter is replaced by semicolumns, so I have to use this guessing approach. I know I could stop using commas in the input file, but I would like to know if I'm doing something wrong first.

like image 526
Antoine Bolvy Avatar asked Mar 02 '16 19:03

Antoine Bolvy


1 Answers

You are giving the sniffer too much input. Your sample file does work if you run:

csv.Sniffer().sniff(f.readline())

which uses only the header row to determine the delimiter character. If you want to understand why the Sniffer heuristics fail for more data, there is no substitute for reading the csv.py library source code.

like image 103
msw Avatar answered Sep 20 '22 17:09

msw