I often use <code>sort | uniq -c</code> to make count statistics. Now, if I have two files with such count statistics, I would like to put them together and add the counts. (I know I could append the original files and count there, but lets assume only the count files are accessible). For example given: a.cnt: <pre class="prettyprint"><code> 1 a 2 c </code></pre> b.cnt: <pre class="prettyprint"><code> 2 b 1 c </code></pre> I would like to concatenate and get the following output: <pre class="prettyprint"><code> 1 a 2 b 3 c </code></pre> What's the shortest way to do this in the shell? Edit: Thanks for the answers so far! Some possible side-aspects one might want to consider additionally: <ul> <li>what if a, b, c are arbritrary strings, containing arbitrary white-spaces?</li> <li>what if the files are too big to fit in memory? Is there some <code>sort | uniq -c</code>-style command line option for this case that only looks at two lines at a time?</li> </ul>

This can work for any given number of files: <pre class="prettyprint"><code>$ cat a.cnt b.cnt | awk '{a[$2]+=$1} END{for (i in a) print a[i],i}' 1 a 2 b 3 c </code></pre> So if you have let's say 10 files, you just have to do <code>cat f1 f2 ...</code> and then pipe this <code>awk</code>. If the file names happen to share a pattern, you can also do (thanks Adrian Frühwirth!): <pre class="prettyprint"><code>awk '{a[$2]+=$1} END{for (i in a) print a[i],i}' *cnt </code></pre> So for example this will take into consideration all the files whose extension is <code>cnt</code>. <hr> <blockquote> Some possible side-aspects one might want to consider additionally: <ul> <li>what if a, b, c are arbritrary strings, containing arbitrary white-spaces?</li> <li>what if the files are too big to fit in memory? Is there some <code>sort | uniq -c</code>-style command line option for this case that only looks at two lines at a time?</li> </ul> </blockquote> In that case, you can use the rest of the columns as indexes for the counter: <pre class="prettyprint"><code>awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}' *cnt </code></pre> Note that in fact you don't need to <code>sort | uniq -c</code> and redirect to a <code>cnt</code> file and then perform this re-counting. You can do it all together with something like this: <pre class="prettyprint"><code>awk '{a[$0]++} END{for (i in a) print a[i], i}' file </code></pre> <h3>Example</h3> <pre class="prettyprint"><code>$ cat a.cnt 1 and some 2 text here $ cat b.cnt 4 and some 4 and other things 2 text here 9 blabla $ cat *cnt | awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}' 4 text here 9 blabla 4 and some 4 and other things </code></pre> Regarding second comment: <pre class="prettyprint"><code>$ cat b and some text here and some and other things text here blabla $ awk '{a[$0]++} END{for (i in a) print a[i], i}' b 2 and some 2 text here 1 and other things 1 blabla </code></pre>

Using awk: <pre class="prettyprint"><code>awk 'FNR==NR{a[$2]=$1;next} $2 in a{a[$2]+=$1}1' a.cnt b.cnt 1 a 2 b 3 c </code></pre>

how to aggregate counts in a bash one-liner

Tags:

bash

unix

uniq

I often use sort | uniq -c to make count statistics. Now, if I have two files with such count statistics, I would like to put them together and add the counts. (I know I could append the original files and count there, but lets assume only the count files are accessible).

For example given:

a.cnt:

   1 a
   2 c

b.cnt:

   2 b
   1 c

I would like to concatenate and get the following output:

   1 a
   2 b
   3 c

What's the shortest way to do this in the shell?

Edit:

Thanks for the answers so far!

Some possible side-aspects one might want to consider additionally:

what if a, b, c are arbritrary strings, containing arbitrary white-spaces?
what if the files are too big to fit in memory? Is there some sort | uniq -c-style command line option for this case that only looks at two lines at a time?

305

asked Mar 13 '14 15:03

benroth

2 Answers

This can work for any given number of files:

$ cat a.cnt b.cnt | awk '{a[$2]+=$1} END{for (i in a) print a[i],i}'
1 a
2 b
3 c

So if you have let's say 10 files, you just have to do cat f1 f2 ... and then pipe this awk.

If the file names happen to share a pattern, you can also do (thanks Adrian Frühwirth!):

awk '{a[$2]+=$1} END{for (i in a) print a[i],i}' *cnt

So for example this will take into consideration all the files whose extension is cnt.

Some possible side-aspects one might want to consider additionally:

what if a, b, c are arbritrary strings, containing arbitrary white-spaces?

what if the files are too big to fit in memory? Is there some sort | uniq -c-style command line option for this case that only looks at two lines at a time?

In that case, you can use the rest of the columns as indexes for the counter:

awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}' *cnt

Note that in fact you don't need to sort | uniq -c and redirect to a cnt file and then perform this re-counting. You can do it all together with something like this:

awk '{a[$0]++} END{for (i in a) print a[i], i}' file

Example

$ cat a.cnt
   1 and some
   2 text here

$ cat b.cnt
   4 and some
   4 and other things
   2 text here
   9 blabla

$ cat *cnt | awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}'
4  text here
9  blabla
4  and some
4  and other things

Regarding second comment:

$ cat b
and some
text here
and some
and other things
text here
blabla

$ awk '{a[$0]++} END{for (i in a) print a[i], i}' b
2 and some
2 text here
1 and other things
1 blabla

answered Sep 29 '22 04:09

fedorqui 'SO stop harming'

Using awk:

awk 'FNR==NR{a[$2]=$1;next} $2 in a{a[$2]+=$1}1' a.cnt b.cnt
1 a
2 b
3 c

answered Sep 29 '22 04:09

anubhava

Related questions
                            
                                Case insensitive comparision in If condition
                            
                                Did upstart or bash scripts change on Ubuntu 14.04? (Trying to boot sidekiq with upstart)
                            
                                What is the correct way to wait until MongoDB is ready after restart?
                            
                                how to remove first two words of a strings output
                            
                                Telnet to login with username and password to mail Server
                            
                                `uniq` without sorting an immense text file?
                            
                                Wakanda Server scripted clean shutdown
                            
                                Why doesn't .vimrc get executed?
                            
                                zsh script parser error for nested if/else
                            
                                How do I list all folder names that have files changed in Git?
                            
                                How to replace "\n" string with a new line in Unix Bash script
                            
                                How to add flutter SDK to PATH permanently on ubuntu linux?
                            
                                Delete all files/directories except two specific directories
                            
                                Bash - $PATH and ${PATH}
                            
                                Compress a Mysqldump that is SSH'd to another machine
                            
                                Inside python code, how do I run a .sh script?
                            
                                BASH: Is there a simple way to check whether a string is a valid SHA-1 (or MD5) hash?
                            
                                How to combine the data from two CSV files in BASH?
                            
                                Why piping to the same file doesn't work on some platforms?
                            
                                Bash while loop, how to read input until a condition is false

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With