Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the range of the "number of characters read" in fscanf?

Tags:

c

scanf

fscanf() specifies the "%n" directive as a means to write "the number of characters read from the input stream so far by this call to the fscanf function" C11dr §7.21.6.2 12.

Let us call this number: ncount.


The "%n" directive may be preceded by length modifiers hh, h, l, ll, j and others. Examples:

FILE *stream = stdin;
int n_i;
fscanf(stream, "%*s%n", &n_i);    // save as int
signed char n_hh;
fscanf(stream, "%*s%hhn", &n_hh); // save as signed char
long long n_ll;
fscanf(stream, "%*s%lln", &n_ll); // save as long long

What is type or the minimal expected range of ncount?
What happens, or should happen, when "the number of characters read from the input stream" is large?

My findings:
The C spec appears quiet on the definition of the minimal range/type of ncount. ncount is usually saved via "%n" which specifies an int destination though not an int source.

By experimentation, ncount appears to be treated like an int or long on my platform - no real surprise there. (My int/long/long long are 4/4/8 bytes.) When saving ncount to a long long, the value saved does not exceed INT_MAX/LONG_MAX. ncount could have been unsigned for twice the usable range when assigned to long long, yet, this is an extreme corner and perhaps not considered by implementors.

My tests below showed no extended range of ncount past an int range, even when saved as a long long.

My interest stemmed from using "%*[^\n]%lln" to determine a (extreme) line length.


Implementation notes:

GNU C11 (GCC) version 6.4.0 (i686-pc-cygwin) compiled by GNU C version 6.4.0, GMP version 6.1.2, MPFR version 3.1.5-p10, MPC version 1.0.3, isl version 0.14 or 0.13

glibc 2.26 released.

Intel Xeon W3530, 64-bit OS (Windows 7)


Test code

#include <limits.h>
#include <stdio.h>
#include <string.h>

int print(FILE *stream, long long size, int ch) {
  char buf[4096];
  memset(buf, ch, sizeof buf);
  while (size > 0) {
    size_t len = size < (long long) sizeof buf ? (size_t) size : sizeof buf;
    size_t y = fwrite(buf, 1, len, stream);
    if (len != y) {
      perror("printf");
      return 1;
    }
    size -= len;
  }
  return 0;
}

int scan(FILE *stream) {
  rewind(stream);
  long long n = -42;
  int cnt = fscanf(stream, "%*s%lln", &n);
  printf("cnt:%d n:%lld ", cnt, n);
  return cnt != 0;
}

int testf(long long n) {
  printf("%10lld ", n);
  FILE *f = fopen("x.txt", "w+b");
  if (f == NULL) {
    perror("fopen");
    return 1;
  }
  if (print(f, n, 'x')) {
    perror("print");
    fclose(f);
    return 2;
  }
  if (scan(f)) {
    perror("scan");
    fclose(f);
    return 3;
  }
  fclose(f);
  puts("OK");
  fflush(stdout);
  return 0;
}

int main(void) {
  printf("%d %ld %lld\n", INT_MAX, LONG_MAX, LLONG_MAX);
  testf(1000);
  testf(1000000);
  testf(INT_MAX);
  testf(INT_MAX + 1LL);
  testf(UINT_MAX);
  testf(UINT_MAX + 1LL);
  testf(1);
  return 0;
}

Test output

2147483647 2147483647 9223372036854775807

File length      Reported bytes read
      1000 cnt:0 n:1000 OK
   1000000 cnt:0 n:1000000 OK
2147483647 cnt:0 n:2147483647 OK
2147483648 cnt:0 n:-2147483648 OK  // implies ncount is internally an `int/long`
4294967295 cnt:0 n:-1 OK
4294967296 cnt:0 n:-1088421888 OK  // This `n` value may not be consistent. -1 also seen
         1 cnt:0 n:1 OK

[Edit]

With some runs of testf(UINT_MAX + 1LL);, I received other inconsistent results like '4294967296 cnt:0 n:1239482368 OK'. Hmmmm.

Sample fscanf() support source code uses an int for ncount.

like image 661
chux - Reinstate Monica Avatar asked Dec 12 '17 20:12

chux - Reinstate Monica


1 Answers

What is type or the minimal expected range of ncount?

The standard does not specify any specific minimum. It flatly says

The corresponding argument shall be a pointer to signed integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function.

(C2011, 7.21.6.2/12)

This leaves no room for a conforming implementation to store a different number in the destination variable, except inasmuch as the standard specifies for all conversions, including %n, that

if the result of the conversion cannot be represented in the [destination] object, the behavior is undefined.

(C2011 7.21.6.2/10)

What happens, or should happen, when "the number of characters read from the input stream" is large?

If the pointer corresponding to the %n directive is correctly typed for the directive's length specifier (or lack thereof), and if the true count of characters read up to that point by that scanf() call can be represented in an object of that type, then the true count should in fact be stored. Otherwise, the behavior is undefined.

like image 185
John Bollinger Avatar answered Oct 11 '22 02:10

John Bollinger