I have some TSV files that I need to convert to CSV files. Is there any solution in BASH, e.g. using <code>awk</code>, to convert these? I could use <code>sed</code>, like this, but am worried it will make some mistakes: <pre class="prettyprint"><code>sed 's/\t/,/g' file.tsv > file.csv </code></pre> <ul> <li>Quotes needn't be added.</li> </ul> How can I convert a TSV to a CSV?

Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom section for a robust, <code>awk</code>-based solution. <hr> To summarize the options (interestingly, they all perform about the same): tr: devnull's solution (provided in a comment on the question) is the simplest: <pre class="prettyprint"><code>tr '\t' ',' < file.tsv > file.csv </code></pre> sed: The OP's own <code>sed</code> solution is perfectly fine, given that the input contains no quoted strings (with potentially embedded <code>\t</code> chars.): <pre class="prettyprint"><code>sed 's/\t/,/g' file.tsv > file.csv </code></pre> The only caveat is that on some platforms (e.g., macOS) the escape sequence <code>\t</code> is not supported, so a literal tab char. must be spliced into the command string using ANSI quoting (<code>$'\t'</code>): <pre class="prettyprint"><code>sed 's/'$'\t''/,/g' file.tsv > file.csv </code></pre> awk: The caveat with <code>awk</code> is that <code>FS</code> - the input field separator - must be set to <code>\t</code> explicitly - the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single <code>,</code>: <pre class="prettyprint"><code>awk 'BEGIN { FS="\t"; OFS="," } {$1=$1; print}' file.tsv > file.csv </code></pre> Note that simply assigning <code>$1</code> to itself causes <code>awk</code> to rebuild the input line using <code>OFS</code> - the output field separator; this effectively replaces all <code>\t</code> chars. with <code>,</code> chars. <code>print</code> then simply prints the rebuilt line. <hr> Robust <code>awk</code> solution: As A. Rabus points out, the above solutions do not handle unquoted input fields that themselves contain <code>,</code> characters correctly - you'll end up with extra CSV fields. The following <code>awk</code> solution fixes this, by enclosing such fields in <code>"..."</code> on demand (see the non-robust <code>awk</code> solution above for a partial explanation of the approach). If such fields also have embedded <code>"</code> chars., these are escaped as <code>""</code>, in line with RFC 4180.Thanks, Wyatt Israel. <pre class="prettyprint"><code>awk 'BEGIN { FS="\t"; OFS="," } { rebuilt=0 for(i=1; i<=NF; ++i) { if ($i ~ /,/ && $i !~ /^".*"$/) { gsub("\"", "\"\"", $i) $i = "\"" $i "\"" rebuilt=1 } } if (!rebuilt) { $1=$1 } print }' file.tsv > file.csv </code></pre> <ul> <li> <code>$i ~ /[,"]/ && $i !~ /^".*"$/</code> detects any field that contains <code>,</code> and/or <code>"</code> and isn't already enclosed in double quotes </li> <li> <code>gsub("\"", "\"\"", $i)</code> escapes embedded <code>"</code> chars. by doubling them </li> <li> <code>$i = "\"" $i "\""</code> updates the result by enclosing it in double quotes </li> <li> As stated before, updating any field causes <code>awk</code> to rebuild the line from the fields with the <code>OFS</code> value, i.e., <code>,</code> in this case, which amounts to the effective TSV -> CSV conversion; flag <code>rebuilt</code> is used to ensure that each input record is rebuilt at least once. </li> </ul>

How do I convert a tab-separated values (TSV) file to a comma-separated values (CSV) file in BASH?

Tags:

bash

csv

awk

tsv

I have some TSV files that I need to convert to CSV files. Is there any solution in BASH, e.g. using awk, to convert these? I could use sed, like this, but am worried it will make some mistakes:

sed 's/\t/,/g' file.tsv > file.csv

Quotes needn't be added.

How can I convert a TSV to a CSV?

811

asked Mar 15 '14 05:03

Village

2 Answers

Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom section for a robust, awk-based solution.

To summarize the options (interestingly, they all perform about the same):

tr:

devnull's solution (provided in a comment on the question) is the simplest:

tr '\t' ',' < file.tsv > file.csv

sed:

The OP's own sed solution is perfectly fine, given that the input contains no quoted strings (with potentially embedded \t chars.):

sed 's/\t/,/g' file.tsv > file.csv

The only caveat is that on some platforms (e.g., macOS) the escape sequence \t is not supported, so a literal tab char. must be spliced into the command string using ANSI quoting ($'\t'):

sed 's/'$'\t''/,/g' file.tsv > file.csv

awk:

The caveat with awk is that FS - the input field separator - must be set to \t explicitly - the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single ,:

awk 'BEGIN { FS="\t"; OFS="," } {$1=$1; print}' file.tsv > file.csv

Note that simply assigning $1 to itself causes awk to rebuild the input line using OFS - the output field separator; this effectively replaces all \t chars. with , chars. print then simply prints the rebuilt line.

Robust awk solution:

As A. Rabus points out, the above solutions do not handle unquoted input fields that themselves contain , characters correctly - you'll end up with extra CSV fields.

The following awk solution fixes this, by enclosing such fields in "..." on demand (see the non-robust awk solution above for a partial explanation of the approach).

If such fields also have embedded " chars., these are escaped as "", in line with RFC 4180.^{Thanks, Wyatt Israel.}

awk 'BEGIN { FS="\t"; OFS="," } {
  rebuilt=0
  for(i=1; i<=NF; ++i) {
    if ($i ~ /,/ && $i !~ /^".*"$/) { 
      gsub("\"", "\"\"", $i)
      $i = "\"" $i "\""
      rebuilt=1 
    }
  }
  if (!rebuilt) { $1=$1 }
  print
}' file.tsv > file.csv

$i ~ /[,"]/ && $i !~ /^".*"$/ detects any field that contains , and/or " and isn't already enclosed in double quotes
gsub("\"", "\"\"", $i) escapes embedded " chars. by doubling them
$i = "\"" $i "\"" updates the result by enclosing it in double quotes
As stated before, updating any field causes awk to rebuild the line from the fields with the OFS value, i.e., , in this case, which amounts to the effective TSV -> CSV conversion; flag rebuilt is used to ensure that each input record is rebuilt at least once.

153

answered Oct 09 '22 10:10

mklement0

This can also be achieved with Perl:

In order to pipe the results to a new output file you can use the following:
perl -wnlp -e 's/\t/,/g;' input_file.tsv > output_file.csv

If you'd like to edit the file in place, you can invoke the -i option:
perl -wnlpi -e 's/\t/,/g;' input_file.txt

If by some chance you find that what you are dealing with is not actually tabs, but instead multiple spaces, you can use the following to replace each occurrence of two or more spaces with a comma:
perl -wnlpi -e 's/\s+/,/g;' input_file

Keep in mind that \s represents any whitespace character, including spaces, tabs or newlines and cannot be used in the replacement string.

answered Oct 09 '22 12:10

Toby

Related questions
                            
                                bash wildcard n digits
                            
                                new to Bash - keep getting Illegal option error
                            
                                How do I silence the HEAD of a curl request while using the silent flag?
                            
                                List all files older than x days only in current directory
                            
                                Multiple driver-java-options in spark submit
                            
                                Is there any trivial way to 'delete by date' using ´rm'- in bash?
                            
                                Bash: controlling SSH
                            
                                Replace delimited block of text in file with the contents of another file
                            
                                Running delayed command with sudo
                            
                                In bash, how can I have a set of arguments in any random order? Like a key-value pair?
                            
                                Symlinks not working when link is made in another directory?
                            
                                How can I delete contents in a folder using a bash script?
                            
                                Convert HH:MM:SS.mm to seconds in bash
                            
                                How to printf an exclamation mark in bash? [duplicate]
                            
                                Why does AWK refuse to sum up floats
                            
                                Binary file in npm package
                            
                                Best/easiest way to parse configuration parameters in Sh/Bash and php
                            
                                .bashrc break, second line entered in shell eats up first line
                            
                                How to get part of path using linux commands
                            
                                Is there an inline-if with assignment (ternary conditional) in bash? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With