Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fastest way convert tab-delimited file to csv in linux

Tags:

linux

csv

I have a tab-delimited file that has over 200 million lines. What's the fastest way in linux to convert this to a csv file? This file does have multiple lines of header information which I'll need to strip out down the road, but the number of lines of header is known. I have seen suggestions for sed and gawk, but I wonder if there is a "preferred" choice.

Just to clarify, there are no embedded tabs in this file.

like image 320
andrewj Avatar asked Mar 29 '10 00:03

andrewj


People also ask

How do I convert a tab delimited File to a csv file?

Again, click the File tab in the Ribbon menu and select the Save As option. In the Save As window, select the CSV (Comma delimited) (*. csv) option in the Save as type drop-down menu. Type a name for the CSV file in the File name field, navigate to where you want to save the file, then click the Save button.

Is tab delimited the same as CSV?

A CSV (Comma Separated Values) or Tab-delimited Text (or Tab Separated Values) file is a text file in which one can identify rows and columns. Rows are represented by the lines in the file and the columns are created by separating the values on each line by a specific character, like a comma or a tab.

How do I convert a CSV file to Linux?

Gnumeric Spreadsheet Program To install Gnumeric in Linux use the apt-get command to install the Gnumeric repository via Linux terminal. Now to convert xlsx format to csv format using ssconvert command of Gnumeric to convert the file. To view the contents of the file using the cat command to check the csv file.

How do I change CSV from pipe to comma?

Click the "Number" tab and in the "List Separator" field, replace the current default separator with the one you want to use (let's say a pipe symbol | ). Click "OK" to save the change and close the window.


1 Answers

If you're worried about embedded commas then you'll need to use a slightly more intelligent method. Here's a Python script that takes TSV lines from stdin and writes CSV lines to stdout:

import sys import csv  tabin = csv.reader(sys.stdin, dialect=csv.excel_tab) commaout = csv.writer(sys.stdout, dialect=csv.excel) for row in tabin:   commaout.writerow(row) 

Run it from a shell as follows:

python script.py < input.tsv > output.csv 
like image 97
Ignacio Vazquez-Abrams Avatar answered Sep 18 '22 17:09

Ignacio Vazquez-Abrams