Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for nice graph labels for time/date axis?

I'm looking for a "nice numbers" algorithm for determining the labels on a date/time value axis. I'm familiar with Paul Heckbert's Nice Numbers algorithm.

I have a plot that displays time/date on the X axis and the user can zoom in and look at a smaller time frame. I'm looking for an algorithm that picks nice dates to display on the ticks.

For example:

  • Looking at a day or so: 1/1 12:00, 1/1 4:00, 1/1 8:00...
  • Looking at a week: 1/1, 1/2, 1/3...
  • Looking at a month: 1/09, 2/09, 3/09...

The nice label ticks don't need to correspond to the first visible point, but close to it.

Is anybody familiar with such an algorithm?

like image 777
Aaron Avatar asked Sep 14 '09 00:09

Aaron


4 Answers

The 'nice numbers' article you linked to mentioned that

the nicest numbers in decimal are 1, 2, 5 and all power-of-10 multiples of these numbers

So I think for doing something similar with date/time you need to start by similarly breaking down the component pieces. So take the nice factors of each type of interval:

  • If you're showing seconds or minutes use 1, 2, 3, 5, 10, 15, 30 (I skipped 6, 12, 15, 20 because they don't "feel" right).
  • If you're showing hours use 1, 2, 3, 4, 6, 8, 12
  • for days use 1, 2, 7
  • for weeks use 1, 2, 4 (13 and 26 fit the model but seem too odd to me)
  • for months use 1, 2, 3, 4, 6
  • for years use 1, 2, 5 and power-of-10 multiples

Now obviously this starts to break down as you get into larger amounts. Certainly you don't want to do show 5 weeks worth of minutes, even in "pretty" intervals of 30 minutes or something. On the other hand, when you only have 48 hours worth, you don't want to show 1 day intervals. The trick as you have already pointed out is finding decent transition points.

Just on a hunch, I would say a reasonable crossover point would be about twice as much as the next interval. That would give you the following (min and max number of intervals shown afterwards)

  • use seconds if you have less than 2 minutes worth (1-120)
  • use minutes if you have less than 2 hours worth (2-120)
  • use hours if you have less than 2 days worth (2-48)
  • use days if you have less than 2 weeks worth (2-14)
  • use weeks if you have less than 2 months worth (2-8/9)
  • use months if you have less than 2 years worth (2-24)
  • otherwise use years (although you could continue with decades, centuries, etc if your ranges can be that long)

Unfortunately, our inconsistent time intervals mean that you end up with some cases that can have over 1 hundred intervals while others have at most 8 or 9. So you'll want to pick the size of your intervals such than you don't have more than 10-15 intervals at most (or less than 5 for that matter). Also, you could break from a strict definition of 2 times the next biggest interval if you think its easy to keep track of. For instance, you could use hours up to 3 days (72 hours) and weeks up to 4 months. A little trial and error might be necessary.

So to go back over, choose the interval type based on the size of your range, then choose the interval size by picking one of the "nice" numbers that will leave you with between 5 and about 15 tick marks. Or if you know and/or can control the actual number of pixels between tick marks you could put upper and lower bounds on how many pixels are acceptable between ticks (if they are spaced too far apart the graph may be hard to read, but if there are too many ticks the graph will be cluttered and your labels may overlap).

like image 52
Rob Van Dam Avatar answered Nov 20 '22 02:11

Rob Van Dam


Have a look at

http://tools.netsa.cert.org/netsa-python/doc/index.html

It has a nice.py ( python/netsa/data/nice.py ) which i think is stand-alone, and should work fine.

like image 43
Arvind Avatar answered Nov 20 '22 02:11

Arvind


Still no answer to this question... I'll throw my first idea in then! I assume you have the range of the visible axis.

This is probably how I would do.

Rough pseudo:

// quantify range
rangeLength = endOfVisiblePart - startOfVisiblePart;

// qualify range resolution
if (range < "1.5 day") {
    resolution = "day";  // it can be a number, e.g.: ..., 3 for day, 4 for week, ...
} else if (range < "9 days") {
    resolution = "week";
} else if (range < "35 days") {
    resolution = "month";
} // you can expand this in both ways to get from nanoseconds to geological eras if you wish

After that, it should (depending on what you have easy access to) be quite easy to determine the value to each nice label tick. Depending on the 'resolution', you format it differently. E.g.: MM/DD for "week", MM:SS for "minute", etc., just like you said.

like image 1
Joanis Avatar answered Nov 20 '22 01:11

Joanis


[Edit - I expanded this a little more at http://www.acooke.org/cute/AutoScalin0.html ]

A naive extension of the "nice numbers" algorithm seems to work for base 12 and 60, which gives good intervals for hours and minutes. This is code I just hacked together:

LIM10 = (10, [(1.5, 1), (3, 2), (7, 5)], [1, 2, 5])
LIM12 = (12, [(1.5, 1), (3, 2), (8, 6)], [1, 2, 6])
LIM60 = (60, [(1.5, 1), (20, 15), (40, 30)], [1, 15, 40])


def heckbert_d(lo, hi, ntick=5, limits=None):
    '''
    Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
    '''
    if limits is None:
        limits = LIM10
    (base, rfs, fs) = limits
    def nicenum(x, round):
        step = base ** floor(log(x)/log(base))
        f = float(x) / step
        nf = base
        if round:
            for (a, b) in rfs:
                if f < a:
                    nf = b
                    break
        else:
            for a in fs:
                if f <= a:
                    nf = a
                    break
        return nf * step
    delta = nicenum(hi-lo, False)
    return nicenum(delta / (ntick-1), True)


def heckbert(lo, hi, ntick=5, limits=None):
    '''
    Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
    '''
    def _heckbert():
        d = heckbert_d(lo, hi, ntick=ntick, limits=limits)
        graphlo = floor(lo / d) * d
        graphhi = ceil(hi / d) * d
        fmt = '%' + '.%df' %  max(-floor(log10(d)), 0)
        value = graphlo
        while value < graphhi + 0.5*d:
            yield fmt % value
            value += d
    return list(_heckbert())

So, for example, if you want to display seconds from 0 to 60,

>>> heckbert(0, 60, limits=LIM60)
['0', '15', '30', '45', '60']

or hours from 0 to 5:

>>> heckbert(0, 5, limits=LIM12)
['0', '2', '4', '6']
like image 1
andrew cooke Avatar answered Nov 20 '22 01:11

andrew cooke