Create bins with awk histogram-like

Question

Here's my input file :

I would like to create bins of a size of my choice to get histogram-like output, e.g. something like this for 0.1 bins, starting from 0 :

0 0.1 0
...
0.5 0.6 0
0.6 0.7 1
...
1.0 1.1 1
1.1 1.2 0
1.2 1.3 2
1.3 1.4 1
...

My file is too big for R, so I'm looking for an awk solution (also open to anything else that I can understand, as I'm still a Linux beginner).

This was sort of already answered in this post : awk histogram in buckets but the solution is not working for me.

Ed Morton · Accepted Answer

This should be very close if not exactly right. Consider it a starting point at least and verify/figure out the math yourself (in particular decide/verify which bucket(s) an exact boundary match like 0.2 should go into - 0.1 to 0.2 and/or 0.2 to 0.3?):

$ cat tst.awk
BEGIN { delta = (delta == "" ? 0.1 : delta) }
{
    bucketNr = int(($0+delta) / delta)
    cnt[bucketNr]++
    numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
    for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
        end = beg + delta
        printf "%0.1f %0.1f %d
", beg, end, cnt[bucketNr]
        beg = end
    }
}

$ awk -f tst.awk file
0.0 0.1 0
0.1 0.2 0
0.2 0.3 0
0.3 0.4 0
0.4 0.5 0
0.5 0.6 0
0.6 0.7 1
0.7 0.8 0
0.8 0.9 0
0.9 1.0 0
1.0 1.1 1
1.1 1.2 0
1.2 1.3 2
1.3 1.4 1
1.4 1.5 1
1.5 1.6 0
1.6 1.7 0
1.7 1.8 1

Note that you can assign the bucket delta size on the command line, 0.1 is just the default value:

$ awk -v delta='0.3' -f tst.awk file
0.0 0.3 0
0.3 0.6 0
0.6 0.9 1
0.9 1.2 1
1.2 1.5 4
1.5 1.8 1

$ awk -v delta='0.5' -f tst.awk file
0.0 0.5 0
0.5 1.0 1
1.0 1.5 5
1.5 2.0 1

Create bins with awk histogram-like

Tags:

bash

unix

dataframe

awk

grouping

EvenStar69

1 Answers

Ed Morton

Recent Activity

Donate For Us

Create bins with awk histogram-like

Tags:

bash

unix

dataframe

awk

grouping

EvenStar69

1 Answers

Ed Morton

Related questions

Recent Activity

Donate For Us