I am trying to edit compressed fastq.gz text files, by removing the first six characters of lines 2,6,10,14... I have two different ways of doing this right now, either using awk or sed, but these only seem to work if the files are unzipped. I would like to edit the files without unzipping them and tried the following code without getting it to work. Thanks.
Using sed:
zcat /dir/* | sed -i~ '2~4s/^.\{6\}//'
Using awk:
zcat /dir/* | awk 'NR%4==2 {gsub(/^....../,"")} 1'
Awk doesn't read the . gz file. It still doesn't work.
The zcat command allows the user to expand and view a compressed file without uncompressing that file. The zcat command does not rename the expanded file or remove the . Z extension. The zcat command writes the expanded output to standard output.
You can't bypass compression, but you can chain the decompress/edit/recompress together in an automated fashion:
for f in /dir/*; do
cp "$f" "$f~" &&
gzip -cd "$f~" | sed '2~4s/^.\{6\}//' | gzip > "$f"
done
If you're quite confident in the operation, you can remove the backup files by adding rm "$f~"
to the end of the loop body.
I wrote a script called zawk which can do this natively. It's similar to glenn jackman's answer to a duplicate of this question, but it handles awk
options and several different compression mechanisms and input methods while retaining FILENAME
and FNR
.
You'd use it like:
zawk 'awk logic goes here' log*.gz
This does not address sed's "in-place" flag (-i
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With