Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the result of `strtod("3ex", &end)` supposed to be? What about `sscanf`?

In my experiments this expression

double d = strtod("3ex", &end);

initializes d with 3.0 and places end pointer at 'e' character in the input string. This is exactly as I would expect it to behave. The 'e' character might look as a beginning of the exponent part, but since the actual exponent value (required by 6.4.4.2) is missing, that 'e' should be treated as a completely independent character.

However, when I do

double d;
char c;
sscanf("3ex", "%lf%c", &d, &c);

I notice that sscanf consumes both '3' and 'e' for the %lf format specifier. Variable d receives 3.0 value. Variable c ends up with 'x' in it. This look strange to me for two reasons.

Firstly, since the language specification refers to strtod when describing the behavior of %f format specifier, I intuitively expected %lf to treat the input the same way strtod does (i.e. choose the same position as the termination point). However, I know that historically scanf was supposed to return no more than one character back to the input stream. That limits the distance of any look-ahead scanf can perform by one character. And the example above requires at least two character look-ahead. So, let's say I accept the fact that %lf consumed both '3' and 'e' from the input stream.

But then we run into the second issue. Now sscanf has to convert that "3e" to type double. "3e" is not a valid representation of a floating-point constant (again, according to 6.4.4.2 the exponent value is not optional). I would expect sscanf to treat this input as erroneous: terminate during %lf conversion, return 0 and leave d and c unchanged. However, the above sscanf completes successfully (returning 2).

This behavior is consistent between GCC and MSVC implementations of standard library.

So, my question is, where exactly in the C language standard document does it allow sscanf to behave as described above, referring to the above two points: consuming more than strtod does and successfully converting such sequences as "3e"?

By looking at my experiment results I can probably "reverse engineer" the sscanf's behavior: consume as much as "looks right" never stepping back and then just pass the consumed sequence to strtod. That way that 'e' gets consumed by %lf and then just ignored by strtod. But were exactly is all that in the language specification?

like image 210
AnT Avatar asked Oct 13 '14 06:10

AnT


1 Answers

I just find the description below on die.net

The strtod(), strtof(), and strtold() functions convert the initial portion of the string pointed to by nptr to double, float, and long double representation, respectively.

The expected form of the (initial portion of the) string is optional leading white space as recognized by isspace(3), an optional plus ('+') or minus sign ('-') and then either (i) a decimal number, or (ii) a hexadecimal number, or (iii) an infinity, or (iv) a NAN (not-a-number).

A decimal number consists of a nonempty sequence of decimal digits possibly containing a radix character (decimal point, locale-dependent, usually '.'), optionally followed by a decimal exponent. A decimal exponent consists of an 'E' or 'e', followed by an optional plus or minus sign, followed by a nonempty sequence of decimal digits, and indicates multiplication by a power of 10.

A hexadecimal number consists of a "0x" or "0X" followed by a nonempty sequence of hexadecimal digits possibly containing a radix character, optionally followed by a binary exponent. A binary exponent consists of a 'P' or 'p', followed by an optional plus or minus sign, followed by a nonempty sequence of decimal digits, and indicates multiplication by a power of 2. At least one of radix character and binary exponent must be present.

An infinity is either "INF" or "INFINITY", disregarding case.

A NAN is "NAN" (disregarding case) optionally followed by '(', a sequence of characters, followed by ')'. The character string specifies in an implementation-dependent way the type of NAN.

Then I performed an experiment, I executed the code below with gcc

#include <stdlib.h>
#include <stdio.h>

char head[1024], *tail;

void core(const char *stmt){
    sprintf(head, "%s", stmt);
    double d=strtod(head, &tail);
    printf("cover %s to %.2f with length=%ld.\n", head, d, tail-head);
}

int main(){
    core("3.0x");
    core("3e");
    core("3ex");
    core("3e0x");

    return 0;
}

and get the result

cover 3.0x to 3.00 with length=3.
cover 3e to 3.00 with length=1.
cover 3ex to 3.00 with length=1.
cover 3e0x to 3.00 with length=3.

So, It seems that there should be some digits behind 'e'.

For sscanf , I performed another experiment with gcc code:

#include <stdlib.h>
#include <stdio.h>

char head[1024];

void core(const char *stmt){
    int i;sscanf(stmt, "%x%s", &i, head);
    printf("sscanf %s catch %d with '%s'.\n", stmt, i, head);
}

int main(){
    core("0");
    core("0x0g");
    core("0x1g");
    core("0xg");

    return 0;
}

then get the output below:

sscanf 0 catch 0 with ''.
sscanf 0x0g catch 0 with 'g'.
sscanf 0x1g catch 1 with 'g'.
sscanf 0xg catch 0 with 'g'.

It seems that sscanf would try to CATCH MORE CHARACTER AND WOULD NOT ROLLBACK IF IT JUDGED IT IS LEGAL CURRENTLY (MAY BE ILLEGAL WITH INCOMPLETE SITUATION).

like image 143
JKi Wang Avatar answered Nov 09 '22 15:11

JKi Wang