Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find results piped to zcat and then to head

I'm trying to search for a certain string in a lot of gziped csv files, the string is located at the first row and my thought was to get the first row of each file by combining find, zcat and head. But I can't get them to work together.

$find . -name "*.gz" -print | xargs zcat -f | head -1
20051114083300,1070074.00,0.00000000
xargs: zcat: terminated by signal 13

example file:
$zcat 113.gz | head
20050629171845,1069335.50,-1.00000000
20050629171930,1069315.00,-1.00000000
20050629172015,1069382.50,-1.00000000
 .. and 2 milion rows like these ...

Though I solved the problem by writing a bash script, iterating over the files and writing to a temp file, it would be great to know what I did wrong, how to do it, and if there might be other ways to go about it.

like image 939
furedde Avatar asked Jul 27 '10 02:07

furedde


2 Answers

You should find that this will work:

find . -name "*.gz" | while read -r file; do zcat -f "$file" | head -n 1; done
like image 128
Dennis Williamson Avatar answered Oct 16 '22 05:10

Dennis Williamson


It worked as you asked it to.

head did its job, printed one line, and exited. zcat then running under the auspices of xargs tried to write to a closed pipe and received a fatal SIGPIPE for its efforts. Having its child die, xargs reported the whyfor.

To get the desired behaviour, you'd need to find -exec ... construction or a custom zhead to give to xargs.

added junk code I found behind the fridge:

#!/usr/bin/python

"""zhead - poor man's zcat file... | head -n
   no argument error checking, prefers to continue in the face of
   IO errors, with diagnostic to stderr

   sample usage: find ... | xargs zhead.py -1"""

import gzip
import sys

if sys.argv[1].startswith('-'):
    nlines = int(sys.argv[1][1:])
    start = 2
else:
    nlines = 10
    start = 1

for zfile in sys.argv[start:]:
    try:
        zin = gzip.open(zfile)
        for i in range(nlines):
            line = zin.readline()
            if not line:
                break
            print line,
    except Exception as err:
        print >> sys.stderr, zfile, err
    finally:
        try:
            zin.close()
        except:
            pass

It processed 10k files in /usr/share/man in about a minute.

like image 30
msw Avatar answered Oct 16 '22 06:10

msw