Suppose I forgot to close the right square bracket ]
of a scanset. What will happen then? Does it invoke Undefined Behavior?
Example:
char str[] = "Hello! One Two Three";
char s1[50] = {0}, s2[50] = {0};
sscanf(str, "%s %[^h", s1, s2); /* UB? */
printf("s1='%s' s2='%s'\n", s1, s2);
I get a warning from GCC when compiling:
source_file.c: In function ‘main’:
source_file.c:11:5: warning: no closing ‘]’ for ‘%[’ format [-Wformat=]
sscanf(str, "%s %[^h", s1, s2); /* UB? */
and the output as
s1='Hello!' s2=''
I've also noticed that the sscanf
returns 1. But what exactly is going on here?
I've checked the C11 standard, but found no information related to this.
The scanset is basically a specifier supported by scanf family functions. It is represented by %[]. Inside scanset we can specify only one character or a set of characters (Case Sensitive). When the scanset is processed, the scanf() can process only those characters which are mentioned in the scanset.
We can define scanset by putting characters inside square brackets. Please note that the scansets are case-sensitive. We can also use scanset by providing comma in between the character you want to add. example: scanf(%s[A-Z,_,a,b,c]s,str);
When passed as part of a `scanf` format string, “%*c” means “read and ignore a character”. There has to be a character there for the conversion to succeed, but other than that, the character is ignored. A typical use-case would be reading up to some delimiter, then ignoring the delimiter. For example: char s[20];
Just use scanf("%s", stringName); or cin >> stringName; tip: If you want to store the length of the string while you scan the string, use this : scanf("%s %n", stringName, &stringLength); stringName is a character array/string and strigLength is an integer.
Excellent! You should file a defect report for C11!
Here is the relevant part in C11 7.21.6.2
... The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket.
A strict interpretation of The characters between the brackets is that in the absence of a closing bracket there are no such characters, but in the presence of ^
as the first character after [
, it would be inconsistent. gcc
is kind enough to point the probable error in the source code. The actual behavior is determined by the C library implementation, but does not seem to be specified in the C Standard. As such it is a form of undefined behavior that IMHO should really be documented as such in the Standard.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With