I often use sort | uniq -c
to make count statistics.
Now, if I have two files with such count statistics, I would like to put them together and add the counts. (I know I could append the original files and count there, but lets assume only the count files are accessible).
For example given:
a.cnt:
1 a
2 c
b.cnt:
2 b
1 c
I would like to concatenate and get the following output:
1 a
2 b
3 c
What's the shortest way to do this in the shell?
Edit:
Thanks for the answers so far!
Some possible side-aspects one might want to consider additionally:
sort | uniq -c
-style command line option for this case that only looks at two lines at a time?'#' symbol can be used to count the length of the string without using any command. `expr` command can be used by two ways to count the length of a string. Without `expr`, `wc` and `awk` command can also be used to count the length of a string.
The easiest way to count files in a directory on Linux is to use the “ls” command and pipe it with the “wc -l” command. The “wc” command is used on Linux in order to print the bytes, characters or newlines count.
Using “wc -l” There are several ways to count lines in a file. But one of the easiest and widely used way is to use “wc -l”. The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard output. 1.
This can work for any given number of files:
$ cat a.cnt b.cnt | awk '{a[$2]+=$1} END{for (i in a) print a[i],i}'
1 a
2 b
3 c
So if you have let's say 10 files, you just have to do cat f1 f2 ...
and then pipe this awk
.
If the file names happen to share a pattern, you can also do (thanks Adrian Frühwirth!):
awk '{a[$2]+=$1} END{for (i in a) print a[i],i}' *cnt
So for example this will take into consideration all the files whose extension is cnt
.
Some possible side-aspects one might want to consider additionally:
- what if a, b, c are arbritrary strings, containing arbitrary white-spaces?
- what if the files are too big to fit in memory? Is there some
sort | uniq -c
-style command line option for this case that only looks at two lines at a time?
In that case, you can use the rest of the columns as indexes for the counter:
awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}' *cnt
Note that in fact you don't need to sort | uniq -c
and redirect to a cnt
file and then perform this re-counting. You can do it all together with something like this:
awk '{a[$0]++} END{for (i in a) print a[i], i}' file
$ cat a.cnt
1 and some
2 text here
$ cat b.cnt
4 and some
4 and other things
2 text here
9 blabla
$ cat *cnt | awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}'
4 text here
9 blabla
4 and some
4 and other things
Regarding second comment:
$ cat b
and some
text here
and some
and other things
text here
blabla
$ awk '{a[$0]++} END{for (i in a) print a[i], i}' b
2 and some
2 text here
1 and other things
1 blabla
Using awk:
awk 'FNR==NR{a[$2]=$1;next} $2 in a{a[$2]+=$1}1' a.cnt b.cnt
1 a
2 b
3 c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With