I need to count the number of entries in a zipped (.gz) file from a S3 bucket containing certain characters. How could I do it?
Specifically, my S3 bucket is s3://mys3.com/
. Under that, there are thousands of buckets like the following:
s3://mys3.com/bucket1/
s3://mys3.com/bucket2/
s3://mys3.com/bucket3/
...
s3://mys3.com/bucket2000/
In each of the bucket, there are about hundreds of zipped(.gz) JSON objects like the following:
s3://mys3.com/bucket1/file1.gz
s3://mys3.com/bucket1/file2.gz
s3://mys3.com/bucket1/file3.gz
...
s3://mys3.com/bucket1/file100.gz
Each of the zipped file contains about 20,000 JSON objects (Each JSON object is a line). In each of the JSON object, there are certain fields containing the word "request". I want to count how many JSON objects are there in bucket1 containing the word "request". I tried this but it did not work:
zcat s3cmd --recursive ls s3://mys3.com/bucket1/ | grep "request" | wc -l
I do not have a lot of shell experiences, so could anyone help me with that? Thanks!
In case anyone is interested:
s3cmd ls --recursive s3://mys3.com/bucket1/ | awk '{print $4}' | grep '.gz' | xargs -I@ s3cmd get @ - | zgrep 'request' | wc -l
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With