Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an algorithm to figure out a 'nice' number formatting for a sequence of numbers of arbitrary order of magnitude?

I'm currently using an implementation of the Extended Wilkinson Algorithm to generate a sequence of axis tick values. For this, the algorithm is given a value range [min,max] and a number n of desired tick mark values, then it outputs an array of evenly spaced values in the interval [min,max]. What I need to do is, to create String labels from these values, BUT depending on the order of magnitude of these values I would like to switch between scientific notation and decimal notation.

For example for a sequence {0.00001, 0.000015, 0.00002, 0.000025} I would like to use scientific notation {'1.0e-05','1.5e-05','2.0e-05','2.5e-05'}. For a sequence {0,8,16,24,32} I'd like to display it as decimal notation. I also don't want unnecessary trailing zeros like 0.001000 or 1.500e-05, but in case of the scientific notation example above, I want one trailing zero when other numbers need to use more decimal places. e.g. '1.00e-05' and '1.05e-05'. But wait there is more, for example for {20.0000001, 20.0000002, 20.0000003} the interesting part is of course the very small deviation of 0.0000001 for each value but 20 is still important, something like '20+1.0e-07' may be desirable because counting the zeros is tedious. Mixing scientific and decimal in the labels is also not appreciated e.g. {8000, 9000, 1.0e04, 1.1e04} is bad.

The goal is to have a consistent labeling that lets one differentiate between the values and that can be read nicely so that very small or very large values are represented in scientific notation to also save display space.

So the representation to use for a sequence does not depend on the single value itself but the whole sequence has to be taken into account. Is there a software package available or some research paper that concerns with this matter?

I have tried to implement something myself, but this does not work very well, sometimes it outputs the same strings for different numbers e.g. '86.0001', '86.0001', '86.0002', '86.0002' for {86.0001, 86.00015, 86.0002, 86.00025}.

protected String[] labelsForTicks(double[] ticks){
   String str1 = String.format(Locale.US, "%.4g", ticks[0]);
   String str2 = String.format(Locale.US, "%.4g", ticks[ticks.length-1]);
   String[] labels = new String[ticks.length];
   if(str1.contains("e") || str2.contains("e")){
      for(int i=0; i<ticks.length; i++){
         String l = String.format(Locale.US, "%.4e", ticks[i]);
         String[] Esplit = l.split("e", -2);
         String[] dotsplit = Esplit[0].split("\\.",-2);
         dotsplit[1] = ('#'+dotsplit[1])
               .replaceAll("0", " ")
               .trim()
               .replaceAll(" ", "0")
               .replaceAll("#", "");
         dotsplit[1] = dotsplit[1].isEmpty() ? "0":dotsplit[1];
         l = dotsplit[0]+'.'+dotsplit[1]+'e'+Esplit[1];
         labels[i] = l;
      }
   } else {
      for(int i=0; i<ticks.length; i++){
         String l = String.format(Locale.US, "%.4f", ticks[i]);
         if(l.contains(".")){
            String[] dotsplit = l.split("\\.",-2);
            dotsplit[1] = ('#'+dotsplit[1])
                  .replaceAll("0", " ")
                  .trim()
                  .replaceAll(" ", "0")
                  .replaceAll("#", "");
            if(dotsplit[1].isEmpty()){
               l = dotsplit[0];
            } else {
               l = dotsplit[0]+'.'+dotsplit[1];
            }
         }
         labels[i] = l;
      }
   }
   return labels;
}

It tries to decide whether to use scientific or decimal notation using the String format 'g' option on the first and last value in the sequence, and then tries to strip away the unnecessary zeros.

like image 921
hageldave Avatar asked Sep 04 '25 17:09

hageldave


1 Answers

The first problem when receiving the ticks doubles is to round them with the smallest number of digits that make them distinct. This is what the below function ScaleForTicks does. If finds the largest power of 10 that can scale all ticks to integers while keeping them distinct. For ticks >= 0, scaling means dividing by the power of 10, and for ticks < 1, it means multiplying by the power of 10. Once the ticks have been scaled to integer, we round them to 0 decimals. This gives us our base labels. They still require additional processing depending on the power of 10 applied.

The question did not say how many consecutive 0's it is acceptable to have in a label. So, I added the maxZeroDigits parameter to the LabelsForTicks function. So, a label will not be displayed with scientific notation, if it contains maxZeroDigits or less consecutive 0's. Otherwise, scientific notation is used.

Another difficulty is what is illustrated by the ticks 20.0000001 20.0000002 20.0000003 in the question. The problem is to extract the common offset of all labels so as to show the actual small variation 1.0e-07 2.0e-07 3.0e-07. This problem is solved by extracting that common offset from the set of integer labels obtained after scaling. The maxZeroDigits parameter is used to determine whether to format the offset in scientific notation or not.

The question asked for fully formatted labels consisting of an optional offset, a label, and an optional exponent. Because the offset and the exponent are the same for all labels, they can be returned as separate parts. This is what the below LabelsForTicks function does. For n ticks, the first n elements of the returned array are the formatted labels without offset and exponent. The next two elements of the returned array are the label and exponent of the offset. The last element of the returned array is the exponent of the labels. The different parts may be assembled to get fully formatted labels, or they may be used separately, for example to indicate a multiplying factor (x10^2), or an offset (+1.34e+04) for the labels, along the graph axes.

Here is the code.

static string[] LabelsForTicks(double[] ticks, int maxZeroDigits)
{
    int scale = ScaleForTicks(ticks);

    string[] labels = new string[ticks.Length + 3];

    if (scale >= 0)
    {
        if (scale >= maxZeroDigits + 1)
        {
            for (int i = 0; i < ticks.Length; i++)
                labels[i] = ((long)Math.Round(ticks[i] / Math.Pow(10, scale))).ToString(CultureInfo.InvariantCulture);
        }
        else
        {
            for (int i = 0; i < ticks.Length; i++)
                labels[i] = ((long)ticks[i]).ToString(CultureInfo.InvariantCulture);
        }
    }
    else
    {
        for (int i = 0; i < ticks.Length; i++)
            labels[i] = ((long)Math.Round(ticks[i] * Math.Pow(10, -scale))).ToString(CultureInfo.InvariantCulture);
    }

    // Find common offset.
    char[] mask = labels[0].ToCharArray();
    for (int i = 1; i < ticks.Length; i++)
    {
        for (int j = 0; j < labels[0].Length; j++)
            if (mask[j] != labels[i][j])
                mask[j] = 'x';
    }
    int k = mask.Length - 1;
    while (k >= 0 && mask[k] != 'x') k--;
    for (; k > 0; k--)
    {
        if (!(mask[k] == 'x' || mask[k] != '0'))
        {
            k++;
            break;
        }
    }

    // If there is an offset, and it contains a sequence of more than maxZeroDigits.
    string common = new string(mask, 0, k);
    if (common.Contains(new string('0', maxZeroDigits + 1)))
    {
        // Remove common offset from all labels.
        for (int i = 0; i < ticks.Length; i++)
            labels[i] = labels[i].Substring(k);
        // Add ofsset as the second-to-last label.
        labels[ticks.Length] = common + new string('0', labels[0].Length);
        // Reduce offset.
        string[] offset = LabelForNumber(Convert.ToDouble(labels[ticks.Length]) * Math.Pow(10, scale), maxZeroDigits);
        labels[ticks.Length] = offset[0];
        labels[ticks.Length + 1] = offset[1];
    }

    if (scale < 0)
    {
        int leadingDecimalDigits = (-scale) - labels[0].Length;
        if (leadingDecimalDigits <= maxZeroDigits)
        {
            string zeros = new string('0', leadingDecimalDigits);
            for (int i = 0; i < ticks.Length; i++)
                labels[i] = "0." + zeros + labels[i];
            scale = 0;
        }
        else
        {
            // If only one digit, append "0".
            if (labels[0].Length == 1)
            {
                scale -= 1;
                for (int i = 0; i < ticks.Length; i++)
                    labels[i] = labels[i] + "0";
            }
            // Put decimal point immediately after the first digit.
            scale += labels[0].Length - 1;
            for (int i = 0; i < ticks.Length; i++)
                labels[i] = labels[i][0] + "." + labels[i].Substring(1);
        }
    }
    else if (scale > maxZeroDigits)
    {
        // If only one digit, append "0".
        if (labels[0].Length == 1)
        {
            for (int i = 0; i < ticks.Length; i++)
                labels[i] = labels[i] + "0";
        }
        // Put decimal point immediately after the first digit.
        scale += labels[0].Length - 1;
        for (int i = 0; i < ticks.Length; i++)
            labels[i] = labels[i][0] + "." + labels[i].Substring(1);
    }

    // Add exponent as last labels.
    if (scale < 0 || scale > maxZeroDigits)
    {
        string exponent;
        if (scale < 0)
        {
            exponent = (-scale).ToString();
            if (exponent.Length == 1) exponent = "0" + exponent;
            exponent = "-" + exponent;
        }
        else
        {
            exponent = scale.ToString();
            if (exponent.Length == 1) exponent = "0" + exponent;
            exponent = "+" + exponent;
        }
        labels[ticks.Length + 2] = "e" + exponent;
    }

    return labels;
}

static int ScaleForTicks(double[] ticks)
{
    int scale = -1 + (int)Math.Ceiling(Math.Log10(ticks.Last()));

    int bound = Math.Max(scale - 15, 0);

    while (scale >= bound)
    {
        double t1 = Math.Round(ticks[0] / Math.Pow(10, scale));
        bool success = true;
        for (int i = 1; i < ticks.Length; i++)
        {
            double t2 = Math.Round(ticks[i] / Math.Pow(10, scale));
            if (t1 == t2)
            {
                success = false;
                break;
            }
            t1 = t2;
        }
        if (success)
            return scale;

        scale--;
    }

    bound = Math.Min(-1, scale - 15);

    while (scale >= bound)
    {
        double t1 = Math.Round(ticks[0] * Math.Pow(10, -scale));
        bool success = true;
        for (int i = 1; i < ticks.Length; i++)
        {
            double t2 = Math.Round(ticks[i] * Math.Pow(10, -scale));
            if (t1 == t2)
            {
                success = false;
                break;
            }
            t1 = t2;
        }
        if (success)
            return scale;

        scale--;
    }

    return scale;
}

static string[] LabelForNumber(double number, int maxZeroDigits)
{
    int scale = ScaleNumber(number);

    string[] labels = new string[2];

    if (scale >= 0)
    {
        if (scale >= maxZeroDigits + 1)
            labels[0] = ((long)Math.Round(number / Math.Pow(10, scale))).ToString(CultureInfo.InvariantCulture);
        else
            labels[0] = ((long)number).ToString(CultureInfo.InvariantCulture);
    }
    else
    {
        labels[0] = ((long)Math.Round(number * Math.Pow(10, -scale))).ToString(CultureInfo.InvariantCulture);
    }

    if (scale < 0)
    {
        int leadingDecimalDigits = (-scale) - labels[0].Length;
        if (leadingDecimalDigits <= maxZeroDigits)
        {
            string zeros = new string('0', leadingDecimalDigits);
            labels[0] = "0." + zeros + labels[0].TrimEnd(new char[] { '0' });
            scale = 0;
        }
        else
        {
            // Put decimal point immediately after the first digit.
            scale += labels[0].Length - 1;
            labels[0] = labels[0][0] + "." + labels[0].Substring(1);
            labels[0] = labels[0].TrimEnd(new char[] { '0' });
            // If only one digit, append "0".
            if (labels[0].Length == 2)
                labels[0] = labels[0] + "0";
        }
    }
    else if (scale > maxZeroDigits)
    {
        // Put decimal point immediately after the first digit.
        scale -= labels[0].Length - 1;
        labels[0] = labels[0][0] + "." + labels[0].Substring(1);
        labels[0] = labels[0].TrimEnd(new char[] { '0' });
        // If only one digit, append "0".
        if (labels[0].Length == 2)
            labels[0] = labels[0] + "0";
    }

    // Add exponent as last labels.
    if (scale < 0 || scale > maxZeroDigits)
    {
        string exponent;
        if (scale < 0)
        {
            exponent = (-scale).ToString();
            if (exponent.Length == 1) exponent = "0" + exponent;
            exponent = "-" + exponent;
        }
        else
        {
            exponent = scale.ToString();
            if (exponent.Length == 1) exponent = "0" + exponent;
            exponent = "+" + exponent;
        }
        labels[1] = "e" + exponent;
    }

    return labels;
}

static int ScaleNumber(double number)
{
    int scale = (int)Math.Ceiling(Math.Log10(number));

    int bound = Math.Max(scale - 15, 0);

    while (scale >= bound)
    {
        if (Math.Round(number / Math.Pow(10, scale)) == number / Math.Pow(10, scale))
            return scale;
        scale--;
    }

    bound = Math.Min(-1, scale - 15);

    while (scale >= bound)
    {
        if (Math.Round(number * Math.Pow(10, -scale)) == number * Math.Pow(10, -scale))
            return scale;
        scale--;
    }

    return scale;
}

Here are several examples with maxZeroDigits set to 3 and 2.

Ticks: 1 2 3 4 
MaxZeroDigits: 3
Labels: 1 2 3 4 
Exponent: 
Offset: 

Ticks: 10 11 12 13 
MaxZeroDigits: 3
Labels: 10 11 12 13 
Exponent: 
Offset: 

Ticks: 100 110 120 130 
MaxZeroDigits: 3
Labels: 100 110 120 130 
Exponent: 
Offset: 

Ticks: 1000 1100 1200 1300 
MaxZeroDigits: 3
Labels: 1000 1100 1200 1300 
Exponent: 
Offset: 

Ticks: 10000 11000 12000 13000 
MaxZeroDigits: 3
Labels: 10000 11000 12000 13000 
Exponent: 
Offset: 

Ticks: 100000 110000 120000 130000 
MaxZeroDigits: 3
Labels: 1.0 1.1 1.2 1.3 
Exponent: e+05
Offset: 

Ticks: 1.8E+15 1.9E+15 2E+15 2.1E+15 
MaxZeroDigits: 3
Labels: 1.8 1.9 2.0 2.1 
Exponent: e+15
Offset: 

Ticks: 1.8E+35 1.9E+35 2E+35 2.1E+35 
MaxZeroDigits: 3
Labels: 1.8 1.9 2.0 2.1 
Exponent: e+35
Offset: 

Ticks: 2000.000001 2000.0000015 2000.000002 2000.0000025 
MaxZeroDigits: 3
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-06
Offset: 2000

Ticks: 20000.00000105 20000.0000011 20000.00000115 20000.0000012 
MaxZeroDigits: 3
Labels: 1.05 1.10 1.15 1.20 
Exponent: e-06
Offset: 2.0e+04

Ticks: 2.000001 2.000002 2.000003 2.000004 
MaxZeroDigits: 3
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-06
Offset: 2

Ticks: 20.000001 20.000002 20.000003 20.000004 
MaxZeroDigits: 3
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-06
Offset: 20

Ticks: 200.000001 200.0000015 200.000002 200.0000025 
MaxZeroDigits: 3
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-06
Offset: 200

Ticks: 200000.000001 200000.000002 200000.000003 200000.000004 
MaxZeroDigits: 3
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-06
Offset: 2.0e+05

Ticks: 2.0000001E+35 2.0000002E+35 2.0000003E+35 2.0000004E+35 
MaxZeroDigits: 3
Labels: 1.0 2.0 3.0 4.0 
Exponent: e+29
Offset: 2.0e+35

Ticks: 0.1 0.15 0.2 0.25 
MaxZeroDigits: 3
Labels: 0.10 0.15 0.20 0.25 
Exponent: 
Offset: 

Ticks: 0.01 0.015 0.02 0.025 
MaxZeroDigits: 3
Labels: 0.010 0.015 0.020 0.025 
Exponent: 
Offset: 

Ticks: 0.001 0.0015 0.002 0.0025 
MaxZeroDigits: 3
Labels: 0.0010 0.0015 0.0020 0.0025 
Exponent: 
Offset: 

Ticks: 0.0001 0.00015 0.0002 0.00025 
MaxZeroDigits: 3
Labels: 0.00010 0.00015 0.00020 0.00025 
Exponent: 
Offset: 

Ticks: 1E-05 1.5E-05 2E-05 2.5E-05 
MaxZeroDigits: 3
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-05
Offset: 

Ticks: 1E-06 1.5E-06 2E-06 2.5E-06 
MaxZeroDigits: 3
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-06
Offset: 

Ticks: 1.8E-13 1.9E-13 2E-13 2.1E-13 
MaxZeroDigits: 3
Labels: 1.8 1.9 2.0 2.1 
Exponent: e-13
Offset: 

Ticks: 1.8E-33 1.9E-33 2E-33 2.1E-33 
MaxZeroDigits: 3
Labels: 1.8 1.9 2.0 2.1 
Exponent: e-33
Offset: 

Ticks: 2.0000001E-33 2.0000002E-33 2.0000003E-33 2.0000004E-33 
MaxZeroDigits: 3
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-40
Offset: 2.0e-33

Ticks: 2.00000000015E-30 2.0000000002E-30 2.00000000025E-30 2.0000000003E-30 
MaxZeroDigits: 3
Labels: 1.5 2.0 2.5 3.0 
Exponent: e-40
Offset: 2.0e-30

Ticks: 0.0010000010001 0.0010000010002 0.0010000010003 0.0010000010004 
MaxZeroDigits: 3
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-13
Offset: 0.001000001

Ticks: 0.0010000010001 0.00100000100015 0.0010000010002 0.00100000100025 
MaxZeroDigits: 3
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-13
Offset: 0.001000001

Ticks: 1000001000.1 1000001000.2 1000001000.3 1000001000.4 
MaxZeroDigits: 3
Labels: 0.1 0.2 0.3 0.4 
Exponent: 
Offset: 1000001000

Ticks: 1 2 3 4 
MaxZeroDigits: 2
Labels: 1 2 3 4 
Exponent: 
Offset: 

Ticks: 10 11 12 13 
MaxZeroDigits: 2
Labels: 10 11 12 13 
Exponent: 
Offset: 

Ticks: 100 110 120 130 
MaxZeroDigits: 2
Labels: 100 110 120 130 
Exponent: 
Offset: 

Ticks: 1000 1100 1200 1300 
MaxZeroDigits: 2
Labels: 1000 1100 1200 1300 
Exponent: 
Offset: 

Ticks: 10000 11000 12000 13000 
MaxZeroDigits: 2
Labels: 1.0 1.1 1.2 1.3 
Exponent: e+04
Offset: 

Ticks: 100000 110000 120000 130000 
MaxZeroDigits: 2
Labels: 1.0 1.1 1.2 1.3 
Exponent: e+05
Offset: 

Ticks: 1.8E+15 1.9E+15 2E+15 2.1E+15 
MaxZeroDigits: 2
Labels: 1.8 1.9 2.0 2.1 
Exponent: e+15
Offset: 

Ticks: 1.8E+35 1.9E+35 2E+35 2.1E+35 
MaxZeroDigits: 2
Labels: 1.8 1.9 2.0 2.1 
Exponent: e+35
Offset: 

Ticks: 2000.000001 2000.0000015 2000.000002 2000.0000025 
MaxZeroDigits: 2
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-06
Offset: 2.0e+03

Ticks: 20000.00000105 20000.0000011 20000.00000115 20000.0000012 
MaxZeroDigits: 2
Labels: 1.05 1.10 1.15 1.20 
Exponent: e-06
Offset: 2.0e+04

Ticks: 2.000001 2.000002 2.000003 2.000004 
MaxZeroDigits: 2
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-06
Offset: 2

Ticks: 20.000001 20.000002 20.000003 20.000004 
MaxZeroDigits: 2
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-06
Offset: 20

Ticks: 200.000001 200.0000015 200.000002 200.0000025 
MaxZeroDigits: 2
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-06
Offset: 200

Ticks: 200000.000001 200000.000002 200000.000003 200000.000004 
MaxZeroDigits: 2
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-06
Offset: 2.0e+05

Ticks: 2.0000001E+35 2.0000002E+35 2.0000003E+35 2.0000004E+35 
MaxZeroDigits: 2
Labels: 1.0 2.0 3.0 4.0 
Exponent: e+29
Offset: 2.0e+35

Ticks: 0.1 0.15 0.2 0.25 
MaxZeroDigits: 2
Labels: 0.10 0.15 0.20 0.25 
Exponent: 
Offset: 

Ticks: 0.01 0.015 0.02 0.025 
MaxZeroDigits: 2
Labels: 0.010 0.015 0.020 0.025 
Exponent: 
Offset: 

Ticks: 0.001 0.0015 0.002 0.0025 
MaxZeroDigits: 2
Labels: 0.0010 0.0015 0.0020 0.0025 
Exponent: 
Offset: 

Ticks: 0.0001 0.00015 0.0002 0.00025 
MaxZeroDigits: 2
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-04
Offset: 

Ticks: 1E-05 1.5E-05 2E-05 2.5E-05 
MaxZeroDigits: 2
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-05
Offset: 

Ticks: 1E-06 1.5E-06 2E-06 2.5E-06 
MaxZeroDigits: 2
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-06
Offset: 

Ticks: 1.8E-13 1.9E-13 2E-13 2.1E-13 
MaxZeroDigits: 2
Labels: 1.8 1.9 2.0 2.1 
Exponent: e-13
Offset: 

Ticks: 1.8E-33 1.9E-33 2E-33 2.1E-33 
MaxZeroDigits: 2
Labels: 1.8 1.9 2.0 2.1 
Exponent: e-33
Offset: 

Ticks: 2.0000001E-33 2.0000002E-33 2.0000003E-33 2.0000004E-33 
MaxZeroDigits: 2
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-40
Offset: 2.0e-33

Ticks: 2.00000000015E-30 2.0000000002E-30 2.00000000025E-30 2.0000000003E-30 
MaxZeroDigits: 2
Labels: 1.5 2.0 2.5 3.0 
Exponent: e-40
Offset: 2.0e-30

Ticks: 0.0010000010001 0.0010000010002 0.0010000010003 0.0010000010004 
MaxZeroDigits: 2
Labels: 1.0 2.0 3.0 4.0 
Exponent: e-13
Offset: 0.001000001

Ticks: 0.0010000010001 0.00100000100015 0.0010000010002 0.00100000100025 
MaxZeroDigits: 2
Labels: 1.0 1.5 2.0 2.5 
Exponent: e-13
Offset: 0.001000001

Ticks: 1000001000.1 1000001000.2 1000001000.3 1000001000.4 
MaxZeroDigits: 2
Labels: 0.1 0.2 0.3 0.4 
Exponent: 
Offset: 1.000001e-03
like image 80
RobertBaron Avatar answered Sep 07 '25 11:09

RobertBaron