Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do strings and char arrays work in C?

No guides I've seen seem to explain this very well.

I mean, you can allocate memory for a char*, or write char[25] instead? What's the difference? And then there are literals, which can't be manipulated? What if you want to assign a fixed string to a variable? Like, stringVariable = "thisIsALiteral", then how do you manipulate it afterwards?

Can someone set the record straight here? And in the last case, with the literal, how do you take care of null-termination? I find this very confusing.


EDIT: The real problem seems to be that as I understand it, you have to juggle these different constructs in order to accomplish even simple things. For instance, only char * can be passed as an argument or return value, but only char[] can be assigned a literal and modified. I feel like it's obvious that we frequently/always needs to be able to do both, and that's where my pitfall is.

like image 523
temporary_user_name Avatar asked Oct 04 '12 00:10

temporary_user_name


3 Answers

What is the difference between an allocated char* and char[25]?

The lifetime of a malloc-ed string is not limited by the scope of its declaration. In plain language, you can return malloc-ed string from a function; you cannot do the same with char[25] allocated in the automatic storage, because its memory will be reclaimed upon return from the function.

Can literals be manipulated?

String literals cannot be manipulated in place, because they are allocated in read-only storage. You need to copy them into a modifiable space, such as static, automatic, or dynamic one, in order to manipulate them. This cannot be done:

char *str = "hello";
str[0] = 'H'; // <<== WRONG! This is undefined behavior.

This will work:

char str[] = "hello";
str[0] = 'H'; // <<=== This is OK

This works too:

char *str = malloc(6);
strcpy(str, "hello");
str[0] = 'H'; // <<=== This is OK too

How do you take care of null termination of string literals?

C compiler takes care of null termination for you: all string literals have an extra character at the end, filled with \0.

like image 119
Sergey Kalinichenko Avatar answered Nov 12 '22 15:11

Sergey Kalinichenko


Your question refers to three different constructs in C: char arrays, char pointers allocated on the heap, and string literals. These are all different is subtle ways.

  • Char arrays, which you get by declaring char foo[25] inside a function, that memory is allocated on the stack, it exists only within the scope you declared it, but exactly 25 bytes have been allocated for you. You may store whatever you want in those bytes, but if you want a string, don't forget to use the last byte to null-terminate it.

  • Character pointers defined with char *bar only hold a pointer to some unallocated memory. To make use of them you need to point them to something, either an array as before (bar = foo) or allocate space bar = malloc(sizeof(char) * 25);. If you do the latter, you should eventually free the space.

  • String literals behave differently depending on how you use them. If you use them to initialize a char array char s[] = "String"; then you're simply declaring an array large enough to exactly hold that string (and the null terminator) and putting that string there. It's the same as declaring a char array and then filling it up.

    On the other hand, if you assign a string literal to a char * then the pointer is pointing to memory you are not supposed to modify. Attempting to modify it may or may not crash, and leads to undefined behavior, which means you shouldn't do it.

like image 8
epsalon Avatar answered Nov 12 '22 17:11

epsalon


Since other aspects are answered already, i would only add to the question "what if you want the flexibility of function passing using char * but modifiability of char []"

You can allocate an array and pass the same array to a function as char *. This is called pass by reference and internally only passes the address of actual array (precisely address of first element) instead of copying the whole. The other effect is that any change made inside the function modifies the original array.

void fun(char *a) {
   a[0] = 'y'; // changes hello to yello
}

main() {
   char arr[6] = "hello"; // Note that its not char * arr
   fun(arr); // arr now contains yello
}

The same could have been done for an array allocated with malloc

char * arr = malloc(6);
strcpy(arr, "hello");

fun(arr); // note that fun remains same.

Latter you can free the malloc memory

free(arr);

char * a, is just a pointer that can store address, which might be of a single variable or might be the first element of an array. Be ware, we have to assign to this pointer before actually using it.

Contrary to that char arr[SIZE] creates an array on the stack i.e. it also allocates SIZE bytes. So you can directly access arr[3] (assuming 3 is less than SIZE) without any issues.

Now it makes sense to allow assigning any address to a, but not allowing this for arr, since there is no other way except using arr to access its memory.

like image 5
fkl Avatar answered Nov 12 '22 15:11

fkl