Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is platform independent way of converting csv files to tsv files if the csv files can be quoted with comma inside the quoted strings?

Tags:

csv

sed

awk

perl

tsv

Suppose I have a csv file like this

a,b,c
1,"drivingme,mad",2

and I want convert it to a TSV

a<tab>b<tab>c
1<tab>drivingme,mad<tab>2

Whilst I can write some Python code to do this. I found this to be slow. Is there a better awk, sed or perl way that is quite fast even if the number of rows runs into the millions?

I need to do this as I can't import the CSV file into a SQLite database with the above csv as SQLite has limited csv import facilities.

like image 381
xiaodai Avatar asked Jun 18 '13 06:06

xiaodai


People also ask

What can you use to handle CSV files with quoted identifiers?

You can use alternative "delimiters" like ";" or "|" but simplest might just be quoting which is supported by most (decent) CSV libraries and most decent spreadsheets.

How do you handle double quotes and commas in a CSV file?

Since CSV files use the comma character "," to separate columns, values that contain commas must be handled as a special case. These fields are wrapped within double quotation marks. The first double quote signifies the beginning of the column data, and the last double quote marks the end.

What is CSV quoting?

Quotation marks appear in CSV files as text qualifiers. This means, they function to wrap together text that should be kept as one value, versus what are distinct values that should be separated out.


1 Answers

Text::CSV_XS (XS is the C version of the module, and is faster than native Perl Text::CSV) is the usual tool of choice. It

  • handles quoted (and comma containing) fields easily

  • can be used for both reading and writing

  • Can switch between delimiters so you can have a writer object using TAB.

Example (sans error handling):

my $csv_in = Text::CSV_XS->new ({ binary => 1 });
my $csv_out = Text::CSV_XS->new ({ binary => 1, sep_char => "\t", eol => "\n" });
open my $fh_in, "<", "file_in.csv" or die "file_in.csv: $!";
open my $fh_out, ">", "file_out.csv" or die "file_out.csv: $!";

while (my $row = $csv_in->getline($fh_in)) {
    $csv_out->print ($fh_out, $row)
}
close $fh_in;
close $fh_out;
like image 51
DVK Avatar answered Oct 04 '22 22:10

DVK