Today, I took a short "C++ skills test" from Elance.com. One question was the following:
What is the security vulnerability of the following line of code:
printf("%s", argv[1]);
Option 1: Format String
Option 2: Stack Overflow <-- This was marked by Elance as the correct answer
The user was provided 10 seconds to answer this question after an initial few seconds of seeing the question (or automatically fail the question). (There were also two other clearly irrelevant answers that were not marked as the correct answer by Elance.)
I was looking for buffer overrun or buffer overflow as an option.
I instinctively did not like the answer stack overflow, because in my 10 seconds I mentally used what I believe is the standard definition of "Stack Overflow":
In software, a stack overflow occurs when the stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program ...
According to this definition of "Stack Overflow", a buffer overrun is entirely possible without a stack overflow; it is a stack overflow only if the program attempts to write outside the calling program's total stack allocation (whether due to a buffer overrun, or whether it would otherwise be a legitimate write, such as allocating memory for stack-based variables an excessive number of times).
My 10-second instinct told me that "buffer overrun" is a more accurate description of the problematic line of code, above - because often (in my experience) there are sufficient null characters ('\0'
) peppered through garbage data in RAM to often avoid an actual stack overflow in cases like this, but a buffer overrun in the implementation seems reasonably possible or even likely. (But the possibility that printf
reads garbage here might assume that argc == 1
, such that there was no user-provided argv[1]
; if argv[1]
is present, perhaps one can assume it's likely that the calling function has not inserted NULL
's. It was not stated in the problem whether argv[1]
was present.)
Because I imagined that there could be a buffer overrun problem here, even without a stack overflow, I answered Format String, because simply by passing a different format string such as "%.8s"
, the problem can be mostly avoided, so it seemed like an overall more generic, and therefore better, answer.
My answer was marked as wrong. The correct answer was marked as "Stack Overflow".
It now occurs to me that perhaps if one assumes that argv[1]
is present, that the only possible buffer overrun is a stack overflow, in which case, stack overflow might in fact be the correct answer. However, even in this case, would it not be considered odd to call this a stack overflow? Isn't buffer overflow a better way to describe this problem, even assuming argv[1]
is present? And, if argv[1]
is not present, isn't it pretty much incorrect to state that the problem is stack overflow, rather than the more accurate buffer overrun?
I would like the opinion of professionals on this site: Is "stack overflow" the proper way to define the memory safety problem with the above line of code? Or, rather, is "buffer overflow" or "buffer overrun" clearly a better way to describe the problem? Finally, given the two options provided for the question's answer (above), is the answer ambiguous, or is "stack overflow" (or "format string") clearly the better answer?
Tangential Comments regarding the Elance test (Not related to the question in this posting)
None of the Elance "C++ skills test" questions pertained to any C++-specific features such as classes, templates, anything in the STL, or any aspect of polymorphism. Every question was a down-and-out, straight-from-C question.
Because there were many (at least 3) other questions in Elance's so-called "C++ skills test" that were unarguably wrong (such as this question: given sizeof(int) == sizeof(int*)
and sizeof(int) == 4
, then in the code int *a, *b; a=b; b++; b-a;
, what is b-a
, with the correct answer listed as 4
, rather than the actual correct answer of 1
), and given the fact that there were no C++-specific questions on the test, I have contacted Elance and plan to seriously pursue their problematic test with the organization. However, for the question discussed in this posting, I am uncertain whether the question/answers are problematic.
There is no potential stack overflow here.
The standard guarantees that argc
is non-negative, which means it can be 0
. If argc
is positive, argv[0]
through argv[argc-1]
are pointers to strings.
If argc == 0
, then argv[1]
is not merely a null pointer -- it doesn't exist at all. In that case, argv[1]
attempts to access a nonexistent array element. (argv[1]
is equivalent to *(argv+1)
; the pointer addition is permitted, but the dereference has undefined behavior.) Note that in this case the program name, which would otherwise be accessible via argv[0]
is unavailable.
If argc==1
, then argv[1] == NULL
. Evaluating argv[1]
is perfectly valid, but it yields a null pointer. Passing a null pointer to printf
with a "%s"
option has undefined behavior. I suppose you could call this a format string problem, but the real problem is using a null pointer when a non-null pointer to a string is required.
If argc >= 2
, then argv[1]
is guaranteed to point to a string, printf("%s", argv[1])
will simply print the characters of that string, up to but not including the terminating '\0'
(which is guaranteed to be exist).
There is still a potential vulnerability in that case. Quoting N1570 7.21.6.1 paragraph 15:
The number of characters that can be produced by any single conversion shall be at least 4095.
(N1570 is a draft of the C standard; C++ refers to the C standard for portions of its standard library.)
Which means that an implementation may limit the number of characters produced by the printf
call. In practice, there's probably no reason to impose a fixed limit; printf
can simply print characters, one at a time, until it reaches the end of the string. But in principle, if strlen(argv[1]) > 4095
, and if the current implementation imposes such a limit, then the behavior could be undefined.
Still, this isn't what I'd call a "stack overflow" -- particularly since the C++ standard does not use the word "stack" (except for a couple of brief references to "stack unwinding").
Most of these problems can be avoided by checking first:
if (argc >= 2) {
printf("%s", argv[1]);
}
or, if you're feeling paranoid:
if (argc >= 2 && argv[1] != NULL) {
printf("%s", argv[1]);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With