I am generating a csv string using pandas as such:
df.to_csv(sep=',', encoding='utf-8', index=False, header=False).
I will load the csv in postgres as follows.
COPY table_name FROM STDIN (FORMAT CSV,
DELIMITER ',',
HEADER FALSE,
NULL '',
ENCODING 'utf-8');
Occasionally this will fail when there is carriage return present in the csv, which I remove using csv_str.replace('\r', ''). However I am not sure if this approach might alter some existing data in the csv by any chance.
Is there a more reliable way to remove any carriage returns? I am fine with losing information about the carriage returns, but I want to preserve as much as possible any other data stored in the csv files.
Rather than stripping the carriage returns from your CSV file, ensure that those fields that contain them are quoted. One way is to just quote all fields:
import csv
import pandas as pd
df.to_csv(sep=',', encoding='utf-8', index=False, header=False, quoting=csv.QUOTE_ALL)
Alternatively you can use quoting=csv.QUOTE_NONNUMERIC to quote only those fields likely to contain \r.
One other way is to set the line terminator to \r\n (or just \r) which will indirectly cause any field that contains \r to be quoted. This might be preferred because only those individual "cells" that require it are quoted:
df.to_csv(sep=',', encoding='utf-8', index=False, header=False, line_terminator='\r\n')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With