Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fscanf read()s more than the number of characters I asked for

Tags:

c

stdio

I have the following code:

#include <stdio.h>

int main(void)
{
  unsigned char c;

  setbuf(stdin, NULL);
  scanf("%2hhx", &c);
  printf("%d\n", (int)c);
  return 0;
}

I set stdin to be unbuffered, then ask scanf to read up to 2 hex characters. Indeed, scanf does as asked; for example, having compiled the code above as foo:

$ echo 23 | ./foo
35

However, if I strace the program, I find that libc actually read 3 characters. Here is a partial log from strace:

$ echo 234| strace ./foo
read(0, "2", 1)                         = 1
read(0, "3", 1)                         = 1
read(0, "4", 1)                         = 1
35 # prints the correct result

So sscanf is giving the expected result. However, this extra character being read is detectable, and it happens to break the communications protocol I am trying to implement (in my case, GDB remote debugging).

The man page for sscanf says about the field width:

Reading of characters stops either when this maximum is reached or when a nonmatching character is found, whichever happens first.

This seems a bit deceptive, at least; or is it in fact a bug? Is it too much to hope that with unbuffered stdin, scanf might read no more than the amount of input I asked for?

(I'm running on Ubuntu 18.04 with glibc 2.27; I've not tried this on other systems.)

like image 840
Reuben Thomas Avatar asked Jul 19 '20 21:07

Reuben Thomas


People also ask

Does fscanf read a whole line?

The %s specifier in fscanf reads words, so it stops when reaching a space. Use fgets to read a whole line.

Can fscanf read whitespace?

A white space character causes fscanf(), scanf(), and sscanf() to read, but not to store, all consecutive white space characters in the input up to the next character that is not white space.

What does fscanf return if failed?

The fscanf() function returns the number of fields that it successfully converted and assigned. The return value does not include fields that the fscanf() function read but did not assign. The return value is EOF if an input failure occurs before any conversion, or the number of input items assigned if successful.


Video Answer


1 Answers

This seems a bit deceptive, at least; or is it in fact a bug?

IMO, no.

An input item is read from the stream, ... An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. The first character, if any , after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure. C17dr § 7.21.6.2 9

Code such as "%hhx" (without a width limit) certainly must get 1 past the hex characters to know it is done. That excess character is pushed-back into stdin for the next input operation.

The "The first character, if any, after the input item remains unread" implies to me then a disassociation of reading characters from the stream at the lowest level and reading characters from the stream as a stream can pushed-back at least 1 character and consider that as "remains unread". The width limit of 2 does not save code as 3 characters can be read from the stream and 1 pushed back.

The width of 2 limits the maximum length of bytes to interpret, not a limit of the number of characters read at the lowest level.

Is it too much to hope that with unbuffered stdin, scanf might read no more than the amount of input I asked for?

Yes. If buffered or not, I think as a stream like stdin allows pushed-back of characters to consider them unread.

Anyways, "%2hhx" is brittle to expect not more than 2 characters read given leading white-space do not count. "These white-space characters are not counted against a specified field width."


The "I set stdin to be unbuffered" does not stop a stream from reading an excess character and later pushing it back.


Given "this extra character being read is detectable, and it happens to break the communications protocol" I recommend a new approach that does not use a stream.

like image 145
chux - Reinstate Monica Avatar answered Oct 23 '22 06:10

chux - Reinstate Monica