Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string input and output in C

Tags:

c

string

I have this snippet of the code:

char* receiveInput(){
    char *s;
    scanf("%s",s);

    return s;
}

int main()
{
    char *str = receiveInput();
    int length = strlen(str);

    printf("Your string is %s, length is %d\n", str, length);

    return 0;
}

I receive this output:

Your string is hellàÿ", length is 11

my input was:

helloworld!

can somebody explain why, and why this style of the coding is bad, thanks in advance

like image 835
rookie Avatar asked Feb 02 '11 09:02

rookie


2 Answers

Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):

can somebody explain why, and why this style of the coding is bad

I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).

First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?

If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?

But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.

What if we want to pass the size of our array as an argument to scanf? printf can do this:

char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"

Want to do the same thing with scanf? Here's how:

static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);

That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.

Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:

fgets(buffer, bufsize, stdin);

Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.

Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:

char *readline(FILE *file)
{
    size_t size  = 80; // start off small
    size_t curr  = 0;
    char *buffer = malloc(size);
    while(fgets(buffer + curr, size - curr, file))
      {
        if(strchr(buffer + curr, '\n')) return buffer; // success
        curr = size - 1;
        size *= 2;
        char *tmp = realloc(buffer, size);
        if(tmp == NULL) /* handle error */;
        buffer = tmp;
      }
    /* handle error */;
}

The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.

Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.

We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.

Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:

int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);

That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:

...
scanf("%i\n", &i);
...

You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:

...
scanf("%i", &i);
getchar();
...

Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?

Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).

The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.

like image 196
Chris Lutz Avatar answered Sep 29 '22 09:09

Chris Lutz


scanf doesn't allocate memory for you.

You need to allocate memory for the variable passed to scanf.

You could do like this:

char* receiveInput(){
    char *s = (char*) malloc( 100 );
    scanf("%s",s);
    return s;
}

But warning:

  1. the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).

    An easy fix is getting the allocated memory as a parameter.

  2. if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).

    An easy fix is to pass to scanf the length of your buffer:

    scanf("%99s",s);
    

A fixed code could be like this:

// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
    scanf("%99s",s);
    return s;
}
int main()
{
    char str[100];
    receiveInput( str );
    int length = strlen(str);

    printf("Your string is %s, length is %d\n", str, length);

    return 0;
}
like image 35
peoro Avatar answered Sep 29 '22 09:09

peoro