Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing IEEE 754-1985 double as ASCII on a limited 16 bytes string

This is a follow-up to my original post. But I'll repeat it for clarity:

As per DICOM standard, a type of floating point can be stored using a Value Representation of Decimal String. See Table 6.2-1. DICOM Value Representations:

Decimal String: A string of characters representing either a fixed point number or a floating point number. A fixed point number shall contain only the characters 0-9 with an optional leading "+" or "-" and an optional "." to mark the decimal point. A floating point number shall be conveyed as defined in ANSI X3.9, with an "E" or "e" to indicate the start of the exponent. Decimal Strings may be padded with leading or trailing spaces. Embedded spaces are not allowed.

"0"-"9", "+", "-", "E", "e", "." and the SPACE character of Default Character Repertoire. 16 bytes maximum

The standard is saying that the textual representation is fixed point vs. floating point. The standard only refers to how the values are represented within in the DICOM data set itself. As such there is not requirement to load a fixed point textual representation into a fixed-point variable.

So now that this is clear that DICOM standard implicitely recommend double (IEEE 754-1985) for representing a Value Representation of type Decimal String (maximum of 16 significant digits). My question is how do I use the standard C I/O library to convert back this binary representation from memory into ASCII onto this limited sized string ?

From random source on internet, this is non-trivial, but a generally accepted solution is either:

printf("%1.16e\n", d); // Round-trippable double, always with an exponent

or

printf("%.17g\n", d); // Round-trippable double, shortest possible

Of course both expression are invalid in my case since they can produce output much longer than my limited maximum of 16 bytes. So what is the solution to minimizing the loss in precision when writing out an arbitrary double value to a limited 16 bytes string ?


Edit: if this is not clear, I am required to follow the standard. I cannot use hex/uuencode encoding.

Edit 2: I am running the comparison using travis-ci see: here

So far the suggested codes are:

  1. Serge Ballesta
  2. chux
  3. Mark Dickinson
  4. chux

Results I see over here are:

  • compute1.c leads to a total sum error of: 0.0095729050923877828
  • compute2.c leads to a total sum error of: 0.21764383725715469
  • compute3.c leads to a total sum error of: 4.050031792674619
  • compute4.c leads to a total sum error of: 0.001287056579548422

So compute4.c leads to the best possible precision (0.001287056579548422 < 4.050031792674619), but triple (x3) the overall execution time (only tested in debug mode using time command).

like image 233
malat Avatar asked Sep 17 '15 13:09

malat


2 Answers

It is trickier than first thought.

Given the various corner cases, it seems best to try at a high precision and then work down as needed.

  1. Any negative number prints the same as a positive number with 1 less precision due to the '-'.

  2. '+' sign not needed at the beginning of the string nor after the 'e'.

  3. '.' not needed.

  4. Dangerous to use anything other than sprintf() to do the mathematical part given so many corner cases. Given various rounding modes, FLT_EVAL_METHOD, etc., leave the heavy coding to well established functions.

  5. When an attempt is too long by more than 1 character, iterations can be saved. E.g. If an attempt, with precision 14, resulted with a width of 20, no need to try precision 13 and 12, just go to 11.

  6. Scaling of the exponent due to the removal of the '.', must be done after sprintf() to 1) avoid injecting computational error 2) decrementing a double to below its minimum exponent.

  7. Maximum relative error is less than 1 part in 2,000,000,000 as with -1.00000000049999e-200. Average relative error about 1 part in 50,000,000,000.

  8. 14 digit precision, the highest, occurs with numbers like 12345678901234e1 so start with 16-2 digits.


static size_t shrink(char *fp_buffer) {
  int lead, expo;
  long long mant;
  int n0, n1;
  int n = sscanf(fp_buffer, "%d.%n%lld%ne%d", &lead, &n0, &mant, &n1, &expo);
  assert(n == 3);
  return sprintf(fp_buffer, "%d%0*llde%d", lead, n1 - n0, mant,
          expo - (n1 - n0));
}

int x16printf(char *dest, size_t width, double value) {
  if (!isfinite(value)) return 1;

  if (width < 5) return 2;
  if (signbit(value)) {
    value = -value;
    strcpy(dest++, "-");
    width--;
  }
  int precision = width - 2;
  while (precision > 0) {
    char buffer[width + 10];
    // %.*e prints 1 digit, '.' and then `precision - 1` digits
    snprintf(buffer, sizeof buffer, "%.*e", precision - 1, value);
    size_t n = shrink(buffer);
    if (n <= width) {
      strcpy(dest, buffer);
      return 0;
    }
    if (n > width + 1) precision -= n - width - 1;
    else precision--;
  }
  return 3;
}

Test code

double rand_double(void) {
  union {
    double d;
    unsigned char uc[sizeof(double)];
  } u;
  do {
    for (size_t i = 0; i < sizeof(double); i++) {
      u.uc[i] = rand();
    }
  } while (!isfinite(u.d));
  return u.d;
}

void x16printf_test(double value) {
  printf("%-27.*e", 17, value);
  char buf[16+1];
  buf[0] = 0;
  int y = x16printf(buf, sizeof buf - 1, value);
  printf(" %d\n", y);
  printf("'%s'\n", buf);
}


int main(void) {
  for (int i = 0; i < 10; i++)
    x16printf_test(rand_double());
}

Output

-1.55736829786841915e+118   0
'-15573682979e108'
-3.06117209691283956e+125   0
'-30611720969e115'
8.05005611774356367e+175    0
'805005611774e164'
-1.06083057094522472e+132   0
'-10608305709e122'
3.39265065244054607e-209    0
'33926506524e-219'
-2.36818580315246204e-244   0
'-2368185803e-253'
7.91188576978592497e+301    0
'791188576979e290'
-1.40513111051994779e-53    0
'-14051311105e-63'
-1.37897140950449389e-14    0
'-13789714095e-24'
-2.15869805640288206e+125   0
'-21586980564e115'
like image 67
chux - Reinstate Monica Avatar answered Oct 09 '22 02:10

chux - Reinstate Monica


C library formatter has no direct format for your requirement. At a simple level, if you can accept the waste of characters of the standard %g format (e20 is written e+020: 2 chars wasted), you can:

  • generate the output for the %.17g format
  • if it is greater the 16 characters, compute the precision that would lead to 16
  • generate the output for that format.

Code could look like:

void encode(double f, char *buf) {
    char line[40];
    char format[8];
    int prec;
    int l;

    l = sprintf(line, "%.17g", f);
    if (l > 16) {
        prec = 33 - strlen(line);
        l = sprintf(line, "%.*g", prec, f);
        while(l > 16) {
            /* putc('.', stdout);*/
            prec -=1;
            l = sprintf(line, "%.*g", prec, f);
        }
    }
    strcpy(buf, line);
}

If you really try to be optimal (meaning write e30 instead of e+030), you could try to use %1.16e format and post-process the output. Rationale (for positive numbers):

  • the %1.16e format allows you to separate the mantissa and the exponent (base 10)
  • if the exponenent is between size-2 (included) and size (excluded): just correctly round the mantissa to the int part and display it
  • if the exponent is between 0 and size-2 (both included): display the rounded mantissa with the dot correctly placed
  • if the exponent is between -1 and -3 (both included): start with a dot, add eventual 0 and fill with rounded mantissa
  • else use a e format with minimal size for the exponent part and fill with the rounded mantissa

Corner cases:

  • for negative numbers, put a starting - and add the display for the opposite number and size-1
  • rounding : if first rejected digit is >=5, increase preceding number and iterate if it was a 9. Process 9.9999999999... as a special case rounded to 10

Possible code:

void clean(char *mant) {
    char *ix = mant + strlen(mant) - 1;
    while(('0' == *ix) && (ix > mant)) {
        *ix-- = '\0';
    }
    if ('.' == *ix) {
        *ix = '\0';
    }
}

int add1(char *buf, int n) {
    if (n < 0) return 1;
    if (buf[n] == '9') {
        buf[n] = '0';
        return add1(buf, n-1);
    }
    else {
        buf[n] += 1;
    }
    return 0;
}

int doround(char *buf, unsigned int n) {
    char c;
    if (n >= strlen(buf)) return 0;
    c = buf[n];
    buf[n] = 0;
    if ((c >= '5') && (c <= '9')) return add1(buf, n-1);
    return 0;
}

int roundat(char *buf, unsigned int i, int iexp) {
    if (doround(buf, i) != 0) {
        iexp += 1;
        switch(iexp) {
            case -2:
                strcpy(buf, ".01");
                break;
            case -1:
                strcpy(buf, ".1");
                break;
            case 0:
                strcpy(buf, "1.");
                break;
            case 1:
                strcpy(buf, "10");
                break;
            case 2:
                strcpy(buf, "100");
                break;
            default:
                sprintf(buf, "1e%d", iexp);
        }
        return 1;
    }
    return 0;
}

void encode(double f, char *buf, int size) {
    char line[40];
    char *mant = line + 1;
    int iexp, lexp, i;
    char exp[6];

    if (f < 0) {
        f = -f;
        size -= 1;
        *buf++ = '-';
    }
    sprintf(line, "%1.16e", f);
    if (line[0] == '-') {
        f = -f;
    size -= 1;
    *buf++ = '-';
    sprintf(line, "%1.16e", f);
    }
    *mant = line[0];
    i = strcspn(mant, "eE");
    mant[i] = '\0';
    iexp = strtol(mant + i + 1, NULL, 10);
    lexp = sprintf(exp, "e%d", iexp);
    if ((iexp >= size) || (iexp < -3)) {
        i = roundat(mant, size - 1 -lexp, iexp);
        if(i == 1) {
            strcpy(buf, mant);
            return;
        }
        buf[0] = mant[0];
        buf[1] = '.';
        strncpy(buf + i + 2, mant + 1, size - 2 - lexp);
        buf[size-lexp] = 0;
        clean(buf);
        strcat(buf, exp);
    }
    else if (iexp >= size - 2) {
        roundat(mant, iexp + 1, iexp);
        strcpy(buf, mant);
    }
    else if (iexp >= 0) {
        i = roundat(mant, size - 1, iexp);
        if (i == 1) {
            strcpy(buf, mant);
            return;
        }
        strncpy(buf, mant, iexp + 1);
        buf[iexp + 1] = '.';
        strncpy(buf + iexp + 2, mant + iexp + 1, size - iexp - 1);
        buf[size] = 0;
        clean(buf);
    }
    else {
        int j;
        i = roundat(mant, size + 1 + iexp, iexp);
        if (i == 1) {
            strcpy(buf, mant);
            return;
        }
        buf[0] = '.';
        for(j=0; j< -1 - iexp; j++) {
            buf[j+1] = '0';
        }
        if ((i == 1) && (iexp != -1)) {
            buf[-iexp] = '1';
            buf++;
        }
        strncpy(buf - iexp, mant, size + 1 + iexp);
        buf[size] = 0;
        clean(buf);
    }
}
like image 20
Serge Ballesta Avatar answered Oct 09 '22 00:10

Serge Ballesta