I have been trying to make the scripts I write simpler and simpler. There are numerous ways to write get the word count of all files in a folder, or even all files of subdirectories of a folder. For instance, I could write <pre class="prettyprint"><code>wc */* </code></pre> and I might get output like this (this is the desired output): <pre class="prettyprint"><code> 0 0 0 10.53400000/YRI.GS000018623.NONSENSE.vcf 0 0 0 10.53400000/YRI.GS000018623.NONSTOP.vcf 0 0 0 10.53400000/YRI.GS000018623.PFAM.vcf 0 0 0 10.53400000/YRI.GS000018623.SPAN.vcf 0 0 0 10.53400000/YRI.GS000018623.SVLEN.vcf 2 20 624 10.53400000/YRI.GS000018623.SVTYPE.vcf 2 20 676 10.53400000/YRI.GS000018623.SYNONYMOUS.vcf 13 130 4435 10.53400000/YRI.GS000018623.TSS-UPSTREAM.vcf 425 4250 126381 10.53400000/YRI.GS000018623.UNKNOWN-INC.vcf </code></pre> but if there are too many files, I might get an error message like this: <pre class="prettyprint"><code>-bash: /usr/bin/wc: Argument list too long </code></pre> so, I could make a variable and do one folder at a time, like so: <pre class="prettyprint"><code>while read $FOLDER do wc $FOLDER/* >> outfile.txt done < "$FOLDER_LIST" </code></pre> so this goes from one line to 5 just like that. Further, in one case, I want to use <code>grep -v</code> first, then carryout the word counting, like so: <pre class="prettyprint"><code>grep -v dbsnp */* | wc </code></pre> but this would suffer from two errors: <ol> <li>Argument list too long</li> <li>If it were not too long, it would give the wc for all of the files at once, not per file.</li> </ol> So, to recap, I would love to be able to do this: <pre class="prettyprint"><code>grep -v dbsnp */* wc > Outfile.txt awk '{print $4,$1} Outfile.txt > Outfile.summary.txt </code></pre> and have it return output like I showed above. Is there a very simple way to do this? Or I am looking at a loop at minimum? Again, I know 101 ways to do this just like the rest of us using a 4-10 line script, but I would love to be able to just type 2 one liners into the command prompt...and my knowledge of the shell is not yet deep enough to know which ways would allow what I am asking of the OS. EDIT - A solution was proposed: <pre class="prettyprint"><code>find -exec grep -v dbsnp {} \; | xargs -n 1 wc </code></pre> This solution leads to the following output: <pre class="prettyprint"><code>wc: 1|0:53458644:AMBIGUOUS:CCAGGGC|-16&GCCAGGGCCAGGGC|-18&GCCAGGGCC|-19&GGCCAGGGC|-19&GCCAGGGCG|-19,.:48:48,48:4,4:0,17:-48,0,-48:0,0,-17:27:3,24:24: No such file or directory wc: 10: No such file or directory wc: 53460829: No such file or directory wc: .: Is a directory 0 0 0 . wc: AA: No such file or directory wc: CT: No such file or directory wc: .: Is a directory 0 0 0 . wc: .: Is a directory 0 0 0 . </code></pre> As nearly as I can tell, appears to be treating each line as a file. I am still reviewing the other answers, and thanks for your help.

You mentioned that "this does not solve the problem of returning the wc in an item-by-item fashion" Following will: <pre class="prettyprint"><code>find -exec wc {} \; </code></pre> But this won't come with your <code>grep</code> filter <code>"grep -v"</code> If you intend to do the same as indicated by my comment on this answer, then please check if following works for you: <pre class="prettyprint"><code>find -exec bash -c "echo -n {}; grep -v dbsnp {} | wc " \; </code></pre>

Easily count words in a list of files in a folder after grep -v command

Tags:

grep

bash

wc

I have been trying to make the scripts I write simpler and simpler.

There are numerous ways to write get the word count of all files in a folder, or even all files of subdirectories of a folder.

For instance, I could write

wc */*

and I might get output like this (this is the desired output):

   0        0        0 10.53400000/YRI.GS000018623.NONSENSE.vcf
   0        0        0 10.53400000/YRI.GS000018623.NONSTOP.vcf
   0        0        0 10.53400000/YRI.GS000018623.PFAM.vcf
   0        0        0 10.53400000/YRI.GS000018623.SPAN.vcf
   0        0        0 10.53400000/YRI.GS000018623.SVLEN.vcf
   2       20      624 10.53400000/YRI.GS000018623.SVTYPE.vcf
   2       20      676 10.53400000/YRI.GS000018623.SYNONYMOUS.vcf
  13      130     4435 10.53400000/YRI.GS000018623.TSS-UPSTREAM.vcf
 425     4250   126381 10.53400000/YRI.GS000018623.UNKNOWN-INC.vcf

but if there are too many files, I might get an error message like this:

-bash: /usr/bin/wc: Argument list too long

so, I could make a variable and do one folder at a time, like so:

while read $FOLDER
do
    wc $FOLDER/* >> outfile.txt
done < "$FOLDER_LIST"

so this goes from one line to 5 just like that.

Further, in one case, I want to use grep -v first, then carryout the word counting, like so:

grep -v dbsnp */* | wc

but this would suffer from two errors:

Argument list too long
If it were not too long, it would give the wc for all of the files at once, not per file.

So, to recap, I would love to be able to do this:

grep -v dbsnp */* wc > Outfile.txt
awk '{print $4,$1} Outfile.txt > Outfile.summary.txt

and have it return output like I showed above.

Is there a very simple way to do this? Or I am looking at a loop at minimum? Again, I know 101 ways to do this just like the rest of us using a 4-10 line script, but I would love to be able to just type 2 one liners into the command prompt...and my knowledge of the shell is not yet deep enough to know which ways would allow what I am asking of the OS.

EDIT -

A solution was proposed:

find -exec grep -v dbsnp {} \; | xargs -n 1 wc

This solution leads to the following output:

wc: 1|0:53458644:AMBIGUOUS:CCAGGGC|-16&GCCAGGGCCAGGGC|-18&GCCAGGGCC|-19&GGCCAGGGC|-19&GCCAGGGCG|-19,.:48:48,48:4,4:0,17:-48,0,-48:0,0,-17:27:3,24:24: No such file or directory
wc: 10: No such file or directory
wc: 53460829: No such file or directory
wc: .: Is a directory
      0       0       0 .
wc: AA: No such file or directory
wc: CT: No such file or directory
wc: .: Is a directory
      0       0       0 .
wc: .: Is a directory
      0       0       0 .

As nearly as I can tell, appears to be treating each line as a file. I am still reviewing the other answers, and thanks for your help.

297

asked Jun 05 '14 06:06

Vincent Laufer

1 Answers

You mentioned that "this does not solve the problem of returning the wc in an item-by-item fashion"

Following will:

find -exec wc {} \;

But this won't come with your grep filter "grep -v"

If you intend to do the same as indicated by my comment on this answer, then please check if following works for you:

find -exec bash -c  "echo -n {}; grep -v dbsnp {} | wc " \;

116

answered Sep 27 '22 23:09

PradyJord

Related questions
                            
                                Python subprocess to Bash: curly braces
                            
                                standalone shell script vs. shell function?
                            
                                Bash script: save stream from Serial Port (/dev/ttyUSB0) to file until a specific input (e.g. eof) appears
                            
                                Limiting SED to the first 10 characters of a line
                            
                                Pipe to export command
                            
                                Accessing Teamcity git Change log in a build step
                            
                                Bash - Swap Values in Column
                            
                                Awk: How to work on multiple files.txt in folder and subfolders?
                            
                                How to use perlbrew with zsh or bash?
                            
                                How to set the terminal title to show the current running command while it's running and to show it in brackets once it's finished?
                            
                                How to call a bash script automatically when directory contents chage
                            
                                running a persistent python script from systemd?
                            
                                Grep --byte-offset not returning the offset (Grep version 2.5.1)
                            
                                How to use Bash script to loop through two files [duplicate]
                            
                                Why does OpenSSL return 0 even though there's an error?
                            
                                Linux Shell script what dirname and ? means?
                            
                                Shell - syntax error in expression (error token is "0 ")
                            
                                How to find the byte position of specific line in a file
                            
                                Rename all '.' to '_' in a filename except for the extension
                            
                                parallel grep pattern multiple files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With