I'm trying to use python's CSV sniffer tool as suggested in many StackOverflow answers to guess if a given CSV file is delimited by ;
or ,
.
It's working fine with basic files, but when a value contains a delimiter, it is surrounded by double quotes (as the standard goes), and the sniffer throws _csv.Error: Could not determine delimiter
.
Has anyone experienced that before?
Here is a minimal failing CSV file:
column1,column2
0,"a, b"
And the proof of concept:
Python 3.5.1 (default, Dec 7 2015, 12:58:09)
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> f = open("example.csv", "r")
>>> f.seek(0);
0
>>> csv.Sniffer().sniff(f.read(), delimiters=';,')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/csv.py", line 186, in sniff
raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter
I have total control over the generation of input CSV file; but sometimes it is modified by a third party using MS Office and the delimiter is replaced by semicolumns, so I have to use this guessing approach. I know I could stop using commas in the input file, but I would like to know if I'm doing something wrong first.
You are giving the sniffer too much input. Your sample file does work if you run:
csv.Sniffer().sniff(f.readline())
which uses only the header row to determine the delimiter character. If you want to understand why the Sniffer heuristics fail for more data, there is no substitute for reading the csv.py library source code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With