Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading Strings with Undefined Length in C

Tags:

c

string

pointers

first(as always) I want to apologize about my english, it may not be clear enough.

I'm not that good at C programming, and I was asked to read a "string" input with undefined length.

This is my solution

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *newChar();
char *addChar(char *, char);
char *readLine(void);

int main() {
  char *palabra;
  palabra = newChar();

  palabra = readLine();
  printf("palabra=%s\n", palabra);

  return 0;
}

char *newChar() {
  char *list = (char *) malloc(0 * sizeof (char));
  *list = '\0';
  return list;
}

char *addChar(char *lst, char num) {
  int largo = strlen(lst) + 1;
  realloc(&lst, largo * sizeof (char));
  *(lst + (largo - 1)) = num;
  *(lst + largo) = '\0';
  return lst;
}

char *readLine() {
  char c;
  char *palabra = newChar();

  c = getchar();
  while (c != '\n') {
    if (c != '\n') {
      palabra = addChar(palabra, c);
    }
    c = getchar();
  }
  return palabra;
}

Please, I'd appreciate that you help me by telling me if it's a good idea or giving me some other idea(and also telling me if it's a "correct" use for pointers).

Thanks in advance


EDIT: Well, thanks for you answers,they were very useful. Now I post edited(and I hope better) code, maybe could be useful for someone new to C(like me) and be feedbacked again.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


void reChar(char **, int *);
void readLine(char **, int *);

int main() {
    char *palabra = NULL;
    int largo = 0;

    reChar(&palabra, &largo);
    readLine(&palabra, &largo);
    printf("palabra=%s\n", palabra, largo);

    system("pause");
    return 0;
}

void reChar(char **lst, int *largo) {
    (*largo) += 4;
    char *temp = (char*) realloc(*lst, (*largo) * sizeof (char));

    if (temp != NULL) {
        *lst = temp;
    } else {
        free(*lst);
        puts("error (re)allocating memory");
        exit(1);
    }
}

void readLine(char **lst, int *largo) {
    int c;
    int pos = 0;

    c = getchar();
    while (c != '\n' && c != EOF) {
        if ((pos + 1) % 4 == 0) {
            reChar(lst, largo);
        }
        (*lst)[pos] =(char) c;
        pos++;
        c = getchar();
    }
    (*lst)[pos] = '\0';
}

PS:

  • It seem enough to slow increase size of "palabra".

  • I'm not sure if capture getchar() into a int and then cast it into a char is the correct way to hadle EOF pitfall

like image 446
Marco Aviles Avatar asked Aug 30 '10 06:08

Marco Aviles


Video Answer


2 Answers

  1. Look up the definition of POSIX getline().

  2. Remember that you need to capture the return value from realloc(); it is not guaranteed that the new memory block starts at the same position as the old one.

  3. Know that malloc(0) may return a null pointer, or it may return a non-null pointer that is unusable (because it points to zero bytes of memory).

  4. You may not write '*list = '\0'; when list points to zero bytes of allocated memory; you don't have permission to write there. If you get a NULL back, you are likely to get a core dump. In any case, you are invoking undefined behaviour, which is 'A Bad Idea™'. (Thanks)

  5. The palabra = newChar(); in main() leaks memory - assuming that you fix the other problems already discussed.

  6. The code in readLine() doesn't consider the possibility of getting EOF before getting a newline; that is bad and will result in a core dump when memory allocation (finally) fails.

  7. Your code will exhibit poor performance because it allocates one character at a time. Typically, you should allocate considerably more than one extra character at a time; starting with an initial allocation of perhaps 4 bytes and doubling the allocation each time you need more space might be better. Keep the initial allocation small so that the reallocation code is properly tested.

  8. The return value from getchar() is an int, not a char. On most machines, it can return 256 different positive character values (even if char is a signed type) and a separate value, EOF, that is distinct from all the char values. (The standard allows it to return more than 256 different characters if the machine has bytes that are bigger than 8 bits each.) (Thanks) The C99 standard §7.19.7.1 says of fgetc():

    If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

    (Emphasis added.) It defines getchar() in terms of getc(), and it defines getc() in terms of fgetc().

  9. (Borrowed: Thanks). The first argument to realloc() is the pointer to the start of the currently allocated memory, not a pointer to the pointer to the start of the currently allocated memory. If you didn't get a compilation warning from it, you are not compiling with enough warnings set on your compiler. You should turn up the warnings to the maximum. You should heed the compiler warnings - they are normally indicative of bugs in your code, especially while you are still learning the language.

  10. It is often easier to keep the string without a null terminator until you know you have reached the end of the line (or end of input). When there are no more characters to be read (for the time being), then append the null so that the string is properly terminated before it is returned. These functions do not need the string properly terminate while they are reading, as long as you keep track of where you are in the string. Do make sure you have enough room at all times to add the NUL '\0' to the end of the string, though.

See Kernighan & Pike 'The Practice of Programming' for a lot of relevant discussions. I also think Maguire 'Writing Solid Code' has relevant advice to offer, for all it is somewhat dated. However, you should be aware that there are those who excoriate the book. Consequently, I recommend TPOP over WSC (but Amazon has WSC available from $0.01 + p&p, whereas TPOP starts at $20.00 + p&p -- this may be the market speaking).


TPOP was previously at http://plan9.bell-labs.com/cm/cs/tpop and http://cm.bell-labs.com/cm/cs/tpop but both are now (2015-08-10) broken. See also Wikipedia on TPOP.

like image 187
Jonathan Leffler Avatar answered Sep 20 '22 13:09

Jonathan Leffler


  • You are always allocating one byte less than you are using. For example in the beginning you allocate space for zero characters and then try to set the (non-existing) first character to '\0'.

  • realloc doesn't take a pointer to a pointer as first parameter. It's supposed to be used like this:

    lst = realloc(lst, largo * sizeof (char));
    
  • If you want to handle out-of-memory conditions you would have to check if malloc() or realloc() returned NULL.

  • It would be more efficient to allocate a bigger buffer in the beginning and increase it in bigger steps instead of reallocating every added character separately.

like image 29
sth Avatar answered Sep 23 '22 13:09

sth