Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining Length of Char String in C - if user inputs the string's contents

Tags:

c

I know in C you can declare a string and the number of characters like below,

char mystring[50];

with '50' being the number of characters.

However, what is proper procedure if the user is going to be inputting the contents of the string (via scanf("%s", mystring);)? Do I leave it as,

char mystring[0];

leaving it as '0' since I have no clue how many characters the user will input?

Or do I do,

char mystring[400];

giving up to 400 characters for the user to input?

like image 216
Zach Smith Avatar asked Nov 29 '09 03:11

Zach Smith


4 Answers

You've hit upon the exact problem with scanf() and %s - what happens when you don't know how much input there is?

If you try running char mystring[0];, your program will compile just fine. But you will always segfault. You're creating an array of size 0, so when you try to place something into that array, you will immediately go out of bounds for your string (since no memory will have been allocated) - which is a segfault.

So, point 1: you should always allocate a size for your string. I can think of very few circumstances (okay, none) where you would want to say char mystring[0] rather than char *mystring.

Next, when you use scanf, you never want to use the "%s" specifier - because this will not do any bounds-checking on the size of the string. so even if you have:

char mystring[512];
scanf("%s", mystring);

if the user enters more than 511 characters (since the 512th is \0), you will go out of the bounds of your array. The way to remedy this is:

scanf("%511s", mystring);

This is all to say that C doesn't have a facility to automatically resize a string if there is more input than you're expecting. This is the kind of thing you have to do manually.

One way to deal with this is by using fgets().

You could say:

while (fgets(mystring, 512, stdin))
{
   /* process input */
}

You may then use sscanf() to parse mystring

Try the above code, with a string of length 5. After 4 characters have been read, that code loops again to retrieve the rest of the input. "Processing" could include code to re-allocate a string to be a bigger size and then append the newest input from fgets().

The above code isn't perfect - it would make your program loop and process any infinite string length, so you might want to have some internal hard limit on that (eg, loop a maximum of 10 times).

like image 51
poundifdef Avatar answered Nov 15 '22 07:11

poundifdef


The user will always be able to enter more characters, thereby overflowing your buffer (a common source of security vulnerabilities). You can, however, specify a "field width" to scanf, like so:

scanf("%50s", mystring);

In this case your buffer should be 51 characters, to account for the 50 character field plus the null terminator. Or make your buffer 50 characters and tell scanf 49 is the width.

like image 34
John Zwinck Avatar answered Nov 15 '22 08:11

John Zwinck


There is a function called ggets() which is not part of the standard C library. It's a fairly simple function. It initializes a char array using malloc(). It then reads characters from stdin one char at a time. It keeps track of how many characters were read and expands the char array with realloc() when it runs out of space.

It is available here: http://cbfalconer.home.att.net/download/index.htm

I would suggest you read the code and re-implement yourself.

like image 25
jonescb Avatar answered Nov 15 '22 07:11

jonescb


This is cbfalconer's code (http://cbfalconer.home.att.net/download/index.htm) with a couple minor modifications and compiled into one file:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ggets.h"

#define INITSIZE   112  /* power of 2 minus 16, helps malloc */
#define DELTASIZE (INITSIZE + 16)

enum {OK = 0, NOMEM};

int fggets(char* *ln, FILE *f)
{
   int     cursize, ch, ix;
   char   *buffer, *temp;

   *ln = NULL; /* default */
   if (NULL == (buffer = malloc(INITSIZE))) return NOMEM;
   cursize = INITSIZE;

   ix = 0;
   while ((EOF != (ch = getc(f))) && ('\n' != ch)) {
      if (ix >= (cursize - 1)) { /* extend buffer */
         cursize += DELTASIZE;
         if (NULL == (temp = realloc(buffer, (size_t)cursize))) {
            /* ran out of memory, return partial line */
            buffer[ix] = '\0';
            *ln = buffer;
            return NOMEM;
         }
         buffer = temp;
      }
      buffer[ix++] = ch;
   }
   if ((EOF == ch) && (0 == ix)) {
      free(buffer);
      return EOF;
   }

   buffer[ix] = '\0';
   if (NULL == (temp = realloc(buffer, (size_t)ix + 1))) {
      *ln = buffer;  /* without reducing it */
   }
   else *ln = temp;
   return OK;
} /* fggets */
/* End of ggets.c */

int main(int argc, char **argv)
{
   FILE *infile;
   char *line;
   int   cnt;

   //if (argc == 2)
      //if ((infile = fopen(argv[1], "r"))) {
         cnt = 0;
         while (0 == fggets(&line, stdin)) {
            fprintf(stderr, "%4d %4d\n", ++cnt, (int)strlen(line));
            (void)puts(line);
            free(line);
         }
         return 0;
      //}
   //(void)puts("Usage: tggets filetodisplay");
   //return EXIT_FAILURE;
} /* main */
/* END file tggets.c */

I tested it out and it will always give you what you want.

like image 23
Brian T Hannan Avatar answered Nov 15 '22 08:11

Brian T Hannan