Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use zcat and sed or awk to edit compressed .gz text file

Tags:

bash

sed

awk

I am trying to edit compressed fastq.gz text files, by removing the first six characters of lines 2,6,10,14... I have two different ways of doing this right now, either using awk or sed, but these only seem to work if the files are unzipped. I would like to edit the files without unzipping them and tried the following code without getting it to work. Thanks.

Using sed:

zcat /dir/* | sed -i~ '2~4s/^.\{6\}//'

Using awk:

zcat /dir/* | awk 'NR%4==2 {gsub(/^....../,"")} 1'
like image 376
The Nightman Avatar asked Feb 17 '15 17:02

The Nightman


People also ask

Can you use awk on GZ file?

Awk doesn't read the . gz file. It still doesn't work.

What is the ZCAT command used for?

The zcat command allows the user to expand and view a compressed file without uncompressing that file. The zcat command does not rename the expanded file or remove the . Z extension. The zcat command writes the expanded output to standard output.


2 Answers

You can't bypass compression, but you can chain the decompress/edit/recompress together in an automated fashion:

for f in /dir/*; do
  cp "$f" "$f~" &&   
  gzip -cd "$f~" | sed '2~4s/^.\{6\}//' | gzip > "$f"
done

If you're quite confident in the operation, you can remove the backup files by adding rm "$f~" to the end of the loop body.

like image 90
Mark Reed Avatar answered Oct 14 '22 04:10

Mark Reed


I wrote a script called zawk which can do this natively. It's similar to glenn jackman's answer to a duplicate of this question, but it handles awk options and several different compression mechanisms and input methods while retaining FILENAME and FNR.

You'd use it like:

zawk 'awk logic goes here' log*.gz

This does not address sed's "in-place" flag (-i).

like image 32
Adam Katz Avatar answered Oct 14 '22 04:10

Adam Katz