I am trying to parse a CSV file using the csv.reader, my data is separated by commas and each value starts and ends with quotation marks. Example:
"This is some data", "New data", "More \"data\" here", "test"
My problem is with the third value, the data I get which has quotation marks within it has an escape character to show it is part of the data. The python CSV reader does not use this escape character so it results in incorrect parsing.
I tried code like below:
with open(filepath) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='\\"')
But I get an error complaining the quotechar is not 1 character.
My current solution is just to replace all characters \" characters with a single quote ' before parsing with csv.reader - however, I would like to know if there is a better way without modifying the original data.
By default, the escape character is a " (double quote) for CSV-formatted files. If you want to use a different escape character, use the ESCAPE clause of COPY , CREATE EXTERNAL TABLE or the hawq load control file to declare a different escape character.
quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ). escapechar specifies the character used to escape the delimiter character, in case quotes aren't used.
CSV Format Since CSV files use the comma character "," to separate columns, values that contain commas must be handled as a special case. These fields are wrapped within double quotation marks. The first double quote signifies the beginning of the column data, and the last double quote marks the end.
Using the csv. reader() Function to Remove Quotation from CSV in Python.
The issue here is that you need to define an escapechar, so that the csv reader knows to treat \" as ".
csv.reader(csv_file, quotechar='"', delimiter=',', escapechar='\\')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With