I have the following code:
#include <stdio.h>
int main(void)
{
unsigned char c;
setbuf(stdin, NULL);
scanf("%2hhx", &c);
printf("%d\n", (int)c);
return 0;
}
I set stdin
to be unbuffered, then ask scanf
to read up to 2 hex characters. Indeed, scanf
does as asked; for example, having compiled the code above as foo
:
$ echo 23 | ./foo
35
However, if I strace
the program, I find that libc actually read 3 characters. Here is a partial log from strace:
$ echo 234| strace ./foo
read(0, "2", 1) = 1
read(0, "3", 1) = 1
read(0, "4", 1) = 1
35 # prints the correct result
So sscanf is giving the expected result. However, this extra character being read is detectable, and it happens to break the communications protocol I am trying to implement (in my case, GDB remote debugging).
The man page for sscanf says about the field width:
Reading of characters stops either when this maximum is reached or when a nonmatching character is found, whichever happens first.
This seems a bit deceptive, at least; or is it in fact a bug? Is it too much to hope that with unbuffered stdin, scanf might read no more than the amount of input I asked for?
(I'm running on Ubuntu 18.04 with glibc 2.27; I've not tried this on other systems.)
The %s specifier in fscanf reads words, so it stops when reaching a space. Use fgets to read a whole line.
A white space character causes fscanf(), scanf(), and sscanf() to read, but not to store, all consecutive white space characters in the input up to the next character that is not white space.
The fscanf() function returns the number of fields that it successfully converted and assigned. The return value does not include fields that the fscanf() function read but did not assign. The return value is EOF if an input failure occurs before any conversion, or the number of input items assigned if successful.
This seems a bit deceptive, at least; or is it in fact a bug?
IMO, no.
An input item is read from the stream, ... An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. The first character, if any , after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure. C17dr § 7.21.6.2 9
Code such as "%hhx"
(without a width limit) certainly must get 1 past the hex characters to know it is done. That excess character is pushed-back into stdin
for the next input operation.
The "The first character, if any, after the input item remains unread" implies to me then a disassociation of reading characters from the stream at the lowest level and reading characters from the stream as a stream can pushed-back at least 1 character and consider that as "remains unread". The width limit of 2 does not save code as 3 characters can be read from the stream and 1 pushed back.
The width of 2 limits the maximum length of bytes to interpret, not a limit of the number of characters read at the lowest level.
Is it too much to hope that with unbuffered stdin, scanf might read no more than the amount of input I asked for?
Yes. If buffered or not, I think as a stream like stdin
allows pushed-back of characters to consider them unread.
Anyways, "%2hhx"
is brittle to expect not more than 2 characters read given leading white-space do not count. "These white-space characters are not counted against a specified field width."
The "I set stdin to be unbuffered" does not stop a stream from reading an excess character and later pushing it back.
Given "this extra character being read is detectable, and it happens to break the communications protocol" I recommend a new approach that does not use a stream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With