Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to pipe contents of large tar.gz file to STDOUT?

Tags:

bash

I have a large.tar.gz file containing about 1 million files, out of which about 1/4 of them are html files, and I want to parse a few lines of each of the html files within.

I want to avoid having to extract the contents of large large.tar.gz into a folder and then parse the html files, instead I would like to know how can I pipe the contents of the html files in the large.tar.gz straight to STDOUT so that I can grep/parse out the information I want from them?

I presume there must be some magic like:

tar -special_flags large.tar.gz | grep_only_files_with_extension html | xargs -n1 head -n 99999 | ./parse_contents.pl -

Any ideas?

like image 865
719016 Avatar asked Dec 09 '15 10:12

719016


People also ask

How do I tar to stdout?

To write the extracted files to the standard output, instead of creating the files on the file system, use ' --to-stdout ' (' -O ') in conjunction with ' --extract ' (' --get ', ' -x '). This option is useful if you are extracting files to send them through a pipe, and do not need to preserve them in the file system.

How do I extract the contents of a tar file?

Simply right-click the item you want to compress, mouseover compress, and choose tar. gz. You can also right-click a tar. gz file, mouseover extract, and select an option to unpack the archive.


1 Answers

Use this with GNU tar to extract a tgz to stdout:

tar -xOzf large.tar.gz --wildcards '*.html' | grep ...

-O, --to-stdout: extract files to standard output

like image 52
Cyrus Avatar answered Nov 15 '22 23:11

Cyrus