I'm trying to parse CSV files from an external system which I have no control of.
Example CSV:
qw""erty,"a""b""c""d,ef""""g"
Should be parsed as:
[['qw"erty', 'a"b"c"d,ef""g']]
However, I think that Python's csv module does not expect quote characters to be escaped when cell was not wrapped in quote chars in the first place.
csv.reader(my_file)
(with default doublequote=True
) returns:
['qw""erty', 'a"b"c"d,ef""g']
Is there any way to parse this with python csv module ?
Following on @JackManey comment where he suggested to replace all instances of '""'
inside of double quotes with '\\"'
.
Recognizing if we are currently inside of double quoted cells turned out to be unnecessary and we can replace all instances of '""'
with '\\"'
.
Python documentation says:
On reading, the escapechar removes any special meaning from the following character
However this would still break in the case where original cell already contains escape characters, example: 'qw\\\\""erty'
producing [['qw\\"erty']]
. So we have to escape the escape characters before parsing too.
Final solution:
with open(file_path, 'rb') as f:
content = f.read().replace('\\', '\\\\').replace('""', '\\"')
reader = csv.reader(StringIO(content), doublequote=False, escapechar='\\')
return [row for row in reader]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With