I have been trying to make the scripts I write simpler and simpler.
There are numerous ways to write get the word count of all files in a folder, or even all files of subdirectories of a folder.
For instance, I could write
wc */*
and I might get output like this (this is the desired output):
0 0 0 10.53400000/YRI.GS000018623.NONSENSE.vcf
0 0 0 10.53400000/YRI.GS000018623.NONSTOP.vcf
0 0 0 10.53400000/YRI.GS000018623.PFAM.vcf
0 0 0 10.53400000/YRI.GS000018623.SPAN.vcf
0 0 0 10.53400000/YRI.GS000018623.SVLEN.vcf
2 20 624 10.53400000/YRI.GS000018623.SVTYPE.vcf
2 20 676 10.53400000/YRI.GS000018623.SYNONYMOUS.vcf
13 130 4435 10.53400000/YRI.GS000018623.TSS-UPSTREAM.vcf
425 4250 126381 10.53400000/YRI.GS000018623.UNKNOWN-INC.vcf
but if there are too many files, I might get an error message like this:
-bash: /usr/bin/wc: Argument list too long
so, I could make a variable and do one folder at a time, like so:
while read $FOLDER
do
wc $FOLDER/* >> outfile.txt
done < "$FOLDER_LIST"
so this goes from one line to 5 just like that.
Further, in one case, I want to use grep -v
first, then carryout the word counting, like so:
grep -v dbsnp */* | wc
but this would suffer from two errors:
So, to recap, I would love to be able to do this:
grep -v dbsnp */* wc > Outfile.txt
awk '{print $4,$1} Outfile.txt > Outfile.summary.txt
and have it return output like I showed above.
Is there a very simple way to do this? Or I am looking at a loop at minimum? Again, I know 101 ways to do this just like the rest of us using a 4-10 line script, but I would love to be able to just type 2 one liners into the command prompt...and my knowledge of the shell is not yet deep enough to know which ways would allow what I am asking of the OS.
EDIT -
A solution was proposed:
find -exec grep -v dbsnp {} \; | xargs -n 1 wc
This solution leads to the following output:
wc: 1|0:53458644:AMBIGUOUS:CCAGGGC|-16&GCCAGGGCCAGGGC|-18&GCCAGGGCC|-19&GGCCAGGGC|-19&GCCAGGGCG|-19,.:48:48,48:4,4:0,17:-48,0,-48:0,0,-17:27:3,24:24: No such file or directory
wc: 10: No such file or directory
wc: 53460829: No such file or directory
wc: .: Is a directory
0 0 0 .
wc: AA: No such file or directory
wc: CT: No such file or directory
wc: .: Is a directory
0 0 0 .
wc: .: Is a directory
0 0 0 .
As nearly as I can tell, appears to be treating each line as a file. I am still reviewing the other answers, and thanks for your help.
Using grep -c alone will count the number of lines that contain the matching word instead of the number of total matches. The -o option is what tells grep to output each match in a unique line and then wc -l tells wc to count the number of lines. This is how the total number of matching words is deduced.
To include all subdirectories in a search, add the -r operator to the grep command. This command prints the matches for all files in the current directory, subdirectories, and the exact path with the filename. In the example below, we also added the -w operator to show whole words, but the output form is the same.
Description. wc stands for word count is a command in Unix and Unix-like operating systems. It is mainly used for counting purpose.
You mentioned that "this does not solve the problem of returning the wc in an item-by-item fashion"
Following will:
find -exec wc {} \;
But this won't come with your grep
filter "grep -v"
If you intend to do the same as indicated by my comment on this answer, then please check if following works for you:
find -exec bash -c "echo -n {}; grep -v dbsnp {} | wc " \;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With