Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scanf %d segfault at large input

Tags:

c

scanf

So I ran some static code analyzer over some c code and one thing that surprised me was a warning about:

int val;
scanf("%d", &val);

which said that for large enough input this may result in a segfault. And surely enough this can actually happen. Now the fix is simple enough (specify some width; after all we know how many places a valid integer may have at most depending on the architecture) but what I'm wondering about is WHY this is happening in the first place and why this isn't regarded as a bug in libc (and a simple one to fix at that)?

Now I assume there's some reason for this behavior in the first place that I'm missing?

Edit: Ok since the question doesn't seem to be such clear cut, a bit more explanation: No the code analyzer doesn't warn about scanf in general but about scanf reading a digit without a width specified in specific.

So here's a minimal working example:

#include <stdlib.h>
#include <stdio.h>

int main() {
    int val;
    scanf("%d", &val);
    printf("Number not large enough.\n");
    return 0;
}

We can get a segfault by sending a gigantic number (using eg Python):

import subprocess
cmd = "./test"
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, shell=True)
p.communicate("9"*50000000000000)
# program will segfault, if not make number larger
like image 520
Voo Avatar asked Jul 02 '11 02:07

Voo


3 Answers

If the static analyzer is cppcheck, then it is warning about it because of a bug in glibc which has since been fixed: http://sources.redhat.com/bugzilla/show_bug.cgi?id=13138

like image 164
Philip Craig Avatar answered Oct 17 '22 04:10

Philip Craig


edited since I missed the fact you feed a static code analyzer with it

If the format %d matchs the size of int, what overflows should not be what it is written into val through the pointer, since it should be always an int. Try to pass a pointer to long int and see if the analyzer give the warning still. Try to change %d into %ld, keeping the long int pointer, and see if the warning is given again.

I suppose standards should say something about %d, the type it needs. Maybe analyzer is worried about the fact that on some system int could be shorter than what %d means? It would sound odd to me.


Running your example compiled with gcc (and I have python 2.6.6) I obtain

Traceback (most recent call last):
  File "./feed.py", line 4, in <module>
    p.communicate("9"*50000000000000)
OverflowError: cannot fit 'long' into an index-sized integer
Number not large enough.

Then I tried running this instead:

perl -e 'print "1"x6000000000000000;' |./test

and modified the C part to write

printf("%d Number not large enough.\n", val);

I obtain as output

5513204 Number not large enough.

where the number changes at every run... never segfault... the GNU scanf implementation is safe... though the resulting number is wrong...

like image 23
ShinTakezou Avatar answered Oct 17 '22 02:10

ShinTakezou


The first step in processing an integer is to isolate the sequence of digits. If that sequence is longer than expected, it may overflow a fixed-length buffer, leading to a segmentation fault.

You can achieve a similar effect with doubles. Pushed to extremes, you can write 1 followed by one thousand zeroes, and an exponent of -1000 (net value is 1). Actually, when I was testing this a few years ago, Solaris handled 1000 digits with aplomb; it was at a little over 1024 that it ran into trouble.

So, there is an element of QoI - quality of implementation. There is also an element of 'to follow the C standard, scanf() cannot stop reading before it comes across a non-digit'. These are conflicting goals.

like image 1
Jonathan Leffler Avatar answered Oct 17 '22 04:10

Jonathan Leffler