Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easily count words in a list of files in a folder after grep -v command

Tags:

grep

bash

wc

I have been trying to make the scripts I write simpler and simpler.

There are numerous ways to write get the word count of all files in a folder, or even all files of subdirectories of a folder.

For instance, I could write

wc */* 

and I might get output like this (this is the desired output):

   0        0        0 10.53400000/YRI.GS000018623.NONSENSE.vcf
   0        0        0 10.53400000/YRI.GS000018623.NONSTOP.vcf
   0        0        0 10.53400000/YRI.GS000018623.PFAM.vcf
   0        0        0 10.53400000/YRI.GS000018623.SPAN.vcf
   0        0        0 10.53400000/YRI.GS000018623.SVLEN.vcf
   2       20      624 10.53400000/YRI.GS000018623.SVTYPE.vcf
   2       20      676 10.53400000/YRI.GS000018623.SYNONYMOUS.vcf
  13      130     4435 10.53400000/YRI.GS000018623.TSS-UPSTREAM.vcf
 425     4250   126381 10.53400000/YRI.GS000018623.UNKNOWN-INC.vcf

but if there are too many files, I might get an error message like this:

-bash: /usr/bin/wc: Argument list too long

so, I could make a variable and do one folder at a time, like so:

while read $FOLDER
do
    wc $FOLDER/* >> outfile.txt
done < "$FOLDER_LIST"

so this goes from one line to 5 just like that.

Further, in one case, I want to use grep -v first, then carryout the word counting, like so:

grep -v dbsnp */* | wc

but this would suffer from two errors:

  1. Argument list too long
  2. If it were not too long, it would give the wc for all of the files at once, not per file.

So, to recap, I would love to be able to do this:

grep -v dbsnp */* wc > Outfile.txt
awk '{print $4,$1} Outfile.txt > Outfile.summary.txt

and have it return output like I showed above.

Is there a very simple way to do this? Or I am looking at a loop at minimum? Again, I know 101 ways to do this just like the rest of us using a 4-10 line script, but I would love to be able to just type 2 one liners into the command prompt...and my knowledge of the shell is not yet deep enough to know which ways would allow what I am asking of the OS.

EDIT -

A solution was proposed:

find -exec grep -v dbsnp {} \; | xargs -n 1 wc

This solution leads to the following output:

wc: 1|0:53458644:AMBIGUOUS:CCAGGGC|-16&GCCAGGGCCAGGGC|-18&GCCAGGGCC|-19&GGCCAGGGC|-19&GCCAGGGCG|-19,.:48:48,48:4,4:0,17:-48,0,-48:0,0,-17:27:3,24:24: No such file or directory
wc: 10: No such file or directory
wc: 53460829: No such file or directory
wc: .: Is a directory
      0       0       0 .
wc: AA: No such file or directory
wc: CT: No such file or directory
wc: .: Is a directory
      0       0       0 .
wc: .: Is a directory
      0       0       0 .

As nearly as I can tell, appears to be treating each line as a file. I am still reviewing the other answers, and thanks for your help.

like image 297
Vincent Laufer Avatar asked Jun 05 '14 06:06

Vincent Laufer


People also ask

How do I count words using grep?

Using grep -c alone will count the number of lines that contain the matching word instead of the number of total matches. The -o option is what tells grep to output each match in a unique line and then wc -l tells wc to count the number of lines. This is how the total number of matching words is deduced.

How do I grep words in all files in a directory?

To include all subdirectories in a search, add the -r operator to the grep command. This command prints the matches for all files in the current directory, subdirectories, and the exact path with the filename. In the example below, we also added the -w operator to show whole words, but the output form is the same.

Which command helps to count the words in a list?

Description. wc stands for word count is a command in Unix and Unix-like operating systems. It is mainly used for counting purpose.


1 Answers

You mentioned that "this does not solve the problem of returning the wc in an item-by-item fashion"

Following will:

find -exec wc {} \;

But this won't come with your grep filter "grep -v"

If you intend to do the same as indicated by my comment on this answer, then please check if following works for you:

find -exec bash -c  "echo -n {}; grep -v dbsnp {} | wc " \;
like image 116
PradyJord Avatar answered Sep 27 '22 23:09

PradyJord