Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make a logarithmic histogram in SQL?

My table:

val
1
2
3
4
5
6
10
15

Desired results:

bin | qty 
1   | 1
2   | 2
4   | 3
8   | 3

Which means there are, with inclusive/exclusive ranges,

  • 1 value between 1-2,
  • 2 values between 2-4,
  • 3 values between 4-8,
  • 3 values between 8-16.
like image 212
gberger Avatar asked Sep 12 '25 23:09

gberger


1 Answers

Your bin size, in this case, in logarithmic terms, is 2.

If you wish to use another bin size, substitute the 2s in the script below.

select 
    pow(2, floor(ln(val) / ln(2))) as bin,
    count(bin) as qty
from 
    mytable
group by
    bin;

Explanation

First, we take the log of your values in base 2. log(val, 2) might work in some RDBMS, but if not, just remember the log property that log(val, 2) = ln(val) / ln(2).

val | ln(val) / ln(2)
1   | 0
2   | 1
3   | 1.58496250072
4   | 2
5   | 2.32192809489

Then we floor this:

val | floor(ln(val) / ln(2))
1   | 0
2   | 1
3   | 1
4   | 2
5   | 2

Finally, we use the power of two to transform these floored values into the logarithmic bin values.

val | pow(2, floor(ln(val) / ln(2)))
1   | 1
2   | 2
3   | 2
4   | 4
5   | 4

The rest is simply grouping by the logarithmic bins and counting.

Caveats

No pow

If your RDBMS does not support pow(x, y), you can use exp(y * ln(x)). The expression then becomes:

    exp(floor(ln(val) / ln(2)) * ln(2))

Zero

log(0) is undefined. In the RDBMS I tested, it returns null.

If your table has values of 0, you most likely want to bin them between 0 and 1. To do that, you can wrap the whole expression with ifnull(..., 0), like so:

    ifnull(pow(2, floor(ln(val) / ln(2))), 0)

Negative

The log of negative numbers is undefined... But you probably want to bin them like [0 to -1), [-1 to -2), [-2 to -4), [-4 to -8), etc.

If your database has negative values, you can achieve that binning by first using abs in your value, then in the end restoring its original signal by multiplying the result by val/abs(val). Your expression then becomes:

pow(2, floor(ln(abs(val)) / ln(2))) * val/abs(val)

Negative and zero

If your database has both negative and zero values, you should wrap the ifnull around everything else. Otherwise, the val/abs(val) part would make you divide by zero, re-introducing nulls.

ifnull(pow(2, floor(ln(abs(val)) / ln(2))) * val/abs(val), 0)
like image 101
gberger Avatar answered Sep 15 '25 13:09

gberger