Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doesn't %[] or %[^] specifier in scanf(),sscanf() or fscanf() store the input in null-terminated character array?

Here's what the Beez C guide (LINK) tells about the %[] format specifier:

It allows you to specify a set of characters to be stored away (likely in an array of chars). Conversion stops when a character that is not in the set is matched.

I would appreciate if you can clarify some basic questions that arise from this premise:

1) Are the input fetched by those two format specifiers stored in the arguments(of type char*) as a character array or a character array with a \0 terminating character (string)? If not a string, how to make it store as a string , in cases like the program below where we want to fetch a sequence of characters as a string and stop when a particular character (in the negated character set) is encountered?

2) My program seems to suggest that processing stops for the %[^|] specifier when the negated character | is encountered.But when it starts again for the next format specifier,does it start from the negated character where it had stopped earlier?In my program I intend to ignore the | hence I used %*c.But I tested and found that if I use %c and an additional argument of type char,then the character | is indeed stored in that argument.

3) And lastly but crucially for me,what is the difference between passing a character array for a %s format specifier in printf() and a string(NULL terminated character array)?In my other program titled character array vs string,I've passed a character array(not NULL terminated) for a %s format specifier in printf() and it gets printed just as a string would.What is the difference?

//Program to illustrate %[^] specifier

#include<stdio.h>

int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10];

sscanf(ptr, "%[^|]%*c%[^|]%*c%s", type,fruit1, fruit2);
printf("%s,%s,%s",type,fruit1,fruit2);
}

//character array vs string

#include<stdio.h>

int main()
{
char test[10]={'J','O','N'};
printf("%s",test);
}

Output JON

//Using %c instead of %*c

#include<stdio.h>

int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10],char_var;

sscanf(ptr, "%[^|]%c%[^|]%*c%s", type,&char_var,fruit1, fruit2);
printf("%s,%s,%s,and the character is %c",type,fruit1,fruit2,char_var);

}

Output fruit,apple,lemon,and the character is |

like image 337
Rüppell's Vulture Avatar asked Mar 25 '23 01:03

Rüppell's Vulture


2 Answers

  1. It is null terminated. From sscanf():

    The conversion specifiers s and [ always store the null terminator in addition to the matched characters. The size of the destination array must be at least one greater than the specified field width.

  2. The excluded characters are unconsumed by the scan set and remain to be processed. An alternative format specifier:

    if (sscanf(ptr, "%9[^|]|%9[^|]|%9s", type,fruit1, fruit2) == 3)
    
  3. The array is actually null terminated as remaining elements will be zero initialized:

    char test[10]={'J','O','N' /*,0,0,0,0,0,0,0*/ };
    

If it was not null terminated then it would keep printing until a null character was found somewhere in memory, possibly overruning the end of the array causing undefined behaviour. It is possible to print a non-null terminated array:

    char buf[] = { 'a', 'b', 'c' };
    printf("%.*s", 3, buf);
like image 57
hmjd Avatar answered Apr 19 '23 17:04

hmjd


1) Are the input fetched by those two format specifiers stored in the arguments(of type char*) as a character array or a character array with a \0 terminating character (string)? If not a string, how to make it store as a string , in cases like the program below where we want to fetch a sequence of characters as a string and stop when a particular character (in the negated character set) is encountered?

They're stored in ASCIIZ format - with a NUL/'\0' terminator.

2) My program seems to suggest that processing stops for the %[^|] specifier when the negated character | is encountered.But when it starts again for the next format specifier,does it start from the negated character where it had stopped earlier?In my program I intend to ignore the | hence I used %*c.But I tested and found that if I use %c and an additional argument of type char,then the character | is indeed stored in that argument.

It shouldn't consume the next character. Show us your code or it didn't happen ;-P.

3) And lastly but crucially for me,what is the difference between passing a character array for a %s format specifier in printf() and a string(NULL terminated character array)?In my other program titled character array vs string,I've passed a character array(not NULL terminated) for a %s format specifier in printf() and it gets printed just as a string would.What is the difference?

(edit: the following addresses the question above, which talks about array behaviours generally and is broader than the code snippet in the question that specifically posed the case char[10] = "abcd"; and is safe)

%s must be passed a pointer to a ASCIIZ text... even if that text is explicitly in a char array, it's the mandatory presence of the NUL terminator that defines the textual content and not the array length. You must NUL terminate your character array or you have undefined behaviour. You might get away with it sometimes - e.g. strncpy into the array will NUL terminate it if-and-only-if there's room to do so, and static arrays start with all-0 content so if you only overwrite before the final character you'll have a NUL, your char[10] example happens to have elements for which values aren't specified populated with NULs, but you should generally take responsibility for ensuring that something is ensuring NUL termination.

like image 44
Tony Delroy Avatar answered Apr 19 '23 18:04

Tony Delroy