So I ran some static code analyzer over some c code and one thing that surprised me was a warning about:
int val;
scanf("%d", &val);
which said that for large enough input this may result in a segfault. And surely enough this can actually happen. Now the fix is simple enough (specify some width; after all we know how many places a valid integer may have at most depending on the architecture) but what I'm wondering about is WHY this is happening in the first place and why this isn't regarded as a bug in libc (and a simple one to fix at that)?
Now I assume there's some reason for this behavior in the first place that I'm missing?
Edit: Ok since the question doesn't seem to be such clear cut, a bit more explanation: No the code analyzer doesn't warn about scanf in general but about scanf reading a digit without a width specified in specific.
So here's a minimal working example:
#include <stdlib.h>
#include <stdio.h>
int main() {
    int val;
    scanf("%d", &val);
    printf("Number not large enough.\n");
    return 0;
}
We can get a segfault by sending a gigantic number (using eg Python):
import subprocess
cmd = "./test"
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, shell=True)
p.communicate("9"*50000000000000)
# program will segfault, if not make number larger
If the static analyzer is cppcheck, then it is warning about it because of a bug in glibc which has since been fixed: http://sources.redhat.com/bugzilla/show_bug.cgi?id=13138
edited since I missed the fact you feed a static code analyzer with it
If the format %d matchs the size of int, what overflows should not be what it is written into val through the pointer, since it should be always an int. Try to pass a pointer to long int and see if the analyzer give the warning still. Try to change %d into %ld, keeping the long int pointer, and see if the warning is given again.
I suppose standards should say something about %d, the type it needs. Maybe analyzer is worried about the fact that on some system int could be shorter than what %d means? It would sound odd to me.
Running your example compiled with gcc (and I have python 2.6.6) I obtain
Traceback (most recent call last):
  File "./feed.py", line 4, in <module>
    p.communicate("9"*50000000000000)
OverflowError: cannot fit 'long' into an index-sized integer
Number not large enough.
Then I tried running this instead:
perl -e 'print "1"x6000000000000000;' |./test
and modified the C part to write
printf("%d Number not large enough.\n", val);
I obtain as output
5513204 Number not large enough.
where the number changes at every run... never segfault... the GNU scanf implementation is safe... though the resulting number is wrong...
The first step in processing an integer is to isolate the sequence of digits. If that sequence is longer than expected, it may overflow a fixed-length buffer, leading to a segmentation fault.
You can achieve a similar effect with doubles. Pushed to extremes, you can write 1 followed by one thousand zeroes, and an exponent of -1000 (net value is 1). Actually, when I was testing this a few years ago, Solaris handled 1000 digits with aplomb; it was at a little over 1024 that it ran into trouble.
So, there is an element of QoI - quality of implementation.  There is also an element of 'to follow the C standard, scanf() cannot stop reading before it comes across a non-digit'.  These are conflicting goals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With