Suppose I have a csv file like this
a,b,c
1,"drivingme,mad",2
and I want convert it to a TSV
a<tab>b<tab>c
1<tab>drivingme,mad<tab>2
Whilst I can write some Python code to do this. I found this to be slow. Is there a better awk, sed or perl way that is quite fast even if the number of rows runs into the millions?
I need to do this as I can't import the CSV file into a SQLite database with the above csv as SQLite has limited csv import facilities.
You can use alternative "delimiters" like ";" or "|" but simplest might just be quoting which is supported by most (decent) CSV libraries and most decent spreadsheets.
Since CSV files use the comma character "," to separate columns, values that contain commas must be handled as a special case. These fields are wrapped within double quotation marks. The first double quote signifies the beginning of the column data, and the last double quote marks the end.
Quotation marks appear in CSV files as text qualifiers. This means, they function to wrap together text that should be kept as one value, versus what are distinct values that should be separated out.
Text::CSV_XS
(XS is the C version of the module, and is faster than native Perl Text::CSV) is the usual tool of choice. It
handles quoted (and comma containing) fields easily
can be used for both reading and writing
Can switch between delimiters so you can have a writer object using TAB.
Example (sans error handling):
my $csv_in = Text::CSV_XS->new ({ binary => 1 });
my $csv_out = Text::CSV_XS->new ({ binary => 1, sep_char => "\t", eol => "\n" });
open my $fh_in, "<", "file_in.csv" or die "file_in.csv: $!";
open my $fh_out, ">", "file_out.csv" or die "file_out.csv: $!";
while (my $row = $csv_in->getline($fh_in)) {
$csv_out->print ($fh_out, $row)
}
close $fh_in;
close $fh_out;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With