Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling escaped quotes with Python's csv.reader

Using python's csv module, I'm trying to read some CSV data.

I'm using the code:

dialect = csv.Sniffer().sniff(csv_file.read(1024))
csv_file.seek(0)
reader = csv.reader(csv_file, dialect)

for line in reader:
    ...

Everything works fine except for lines containing escaped quotes:

11837,2,NULL,"\"The Take Over, The Breaks Over\"","Fall Out Boy"

Such a line is tokenized as:

['11837', '2', 'NULL', '\\The Take Over', ' The Breaks Over\\""', 'Fall Out Boy']

The dialect contains the following properties:

dialect.quotechar = "
dialect.quoting = 0
dialect.escapechar = None
dialect.delimiter = ,
dialect.doublequote = False
dialect.lineterminator = \n

Is there anything I can do besides writing my own CSV parser?

like image 434
Tregoreg Avatar asked May 27 '14 19:05

Tregoreg


People also ask

What is Quotechar in CSV Python?

quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ). escapechar specifies the character used to escape the delimiter character, in case quotes aren't used.

What does CSV Quote_none do?

QUOTE_NONE ), the csv module uses the quotechar (which defaults to " ) to quote field. The following listing changes the quote character from double quote ( " ) to a single quote ( ' ). In this case, the csv module uses the single quote ( ' ) instead of ( " ) to quote fields containing quotechar or delimiter.

What is Escapechar in CSV writer?

A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False . On reading, the escapechar removes any special meaning from the following character.

What is CSV Quote_all?

The csv module defines the following constants: csv.QUOTE_ALL. Instructs writer objects to quote all fields. csv.QUOTE_MINIMAL. Instructs writer objects to only quote those fields which contain special characters such as delimiter, quotechar or any of the characters in lineterminator.


1 Answers

If I'm not mistaken, dialect.escapechar = None should be dialect.escapechar = '\\'

If you look at the examples in the docs, it certainly seems to suggest making that alteration

like image 119
cwallenpoole Avatar answered Oct 12 '22 23:10

cwallenpoole