I'm trying to parse CSV files from an external system which I have no control of.
Example CSV:
qw""erty,"a""b""c""d,ef""""g"
Should be parsed as:
[['qw"erty', 'a"b"c"d,ef""g']]
However, I think that Python's csv module does not expect quote characters to be escaped when cell was not wrapped in quote chars in the first place.
csv.reader(my_file) (with default doublequote=True) returns:
['qw""erty', 'a"b"c"d,ef""g']
Is there any way to parse this with python csv module ?
Following on @JackManey comment where he suggested to replace all instances of '""' inside of double quotes with '\\"'.
Recognizing if we are currently inside of double quoted cells turned out to be unnecessary and we can replace all instances of '""' with '\\"'.
Python documentation says:
On reading, the escapechar removes any special meaning from the following character
However this would still break in the case where original cell already contains escape characters, example: 'qw\\\\""erty' producing [['qw\\"erty']]. So we have to escape the escape characters before parsing too.
Final solution:
with open(file_path, 'rb') as f:
content = f.read().replace('\\', '\\\\').replace('""', '\\"')
reader = csv.reader(StringIO(content), doublequote=False, escapechar='\\')
return [row for row in reader]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With