char a[]="string"; // a is an array of characters. char *p="string"; // p is a string literal having static allocation. Any attempt to modify contents of p leads to Undefined Behavior since string literals are stored in read-only section of memory.
char* means a pointer to a character. In C strings are an array of characters terminated by the null character.
Software Engineering C C uses char type to store characters and letters. However, the char type is integer type because underneath C stores integer numbers instead of characters.In C, char values are stored in 1 byte in memory,and value range from -128 to 127 or 0 to 255.
The type of both the variables is a pointer to char or (char*) , so you can pass either of them to a function whose formal argument accepts an array of characters or a character pointer.
The difference here is that
char *s = "Hello world";
will place "Hello world"
in the read-only parts of the memory, and making s
a pointer to that makes any writing operation on this memory illegal.
While doing:
char s[] = "Hello world";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making
s[0] = 'J';
legal.
First off, in function arguments, they are exactly equivalent:
void foo(char *x);
void foo(char x[]); // exactly the same in all respects
In other contexts, char *
allocates a pointer, while char []
allocates an array. Where does the string go in the former case, you ask? The compiler secretly allocates a static anonymous array to hold the string literal. So:
char *x = "Foo";
// is approximately equivalent to:
static const char __secret_anonymous_array[] = "Foo";
char *x = (char *) __secret_anonymous_array;
Note that you must not ever attempt to modify the contents of this anonymous array via this pointer; the effects are undefined (often meaning a crash):
x[1] = 'O'; // BAD. DON'T DO THIS.
Using the array syntax directly allocates it into new memory. Thus modification is safe:
char x[] = "Foo";
x[1] = 'O'; // No problem.
However the array only lives as long as its contaning scope, so if you do this in a function, don't return or leak a pointer to this array - make a copy instead with strdup()
or similar. If the array is allocated in global scope, of course, no problem.
This declaration:
char s[] = "hello";
Creates one object - a char
array of size 6, called s
, initialised with the values 'h', 'e', 'l', 'l', 'o', '\0'
. Where this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialised data segment" that is loaded from the executable file into writeable memory when the program is run.
On the other hand, this declaration:
char *s ="hello";
Creates two objects:
char
s containing the values 'h', 'e', 'l', 'l', 'o', '\0'
, which has no name and has static storage duration (meaning that it lives for the entire life of the program); ands
, which is initialised with the location of the first character in that unnamed, read-only array.The unnamed read-only array is typically located in the "text" segment of the program, which means it is loaded from disk into read-only memory, along with the code itself. The location of the s
pointer variable in memory depends on where the declaration appears (just like in the first example).
Given the declarations
char *s0 = "hello world";
char s1[] = "hello world";
assume the following hypothetical memory map (the columns represent characters at offsets 0 to 3 from the given row address, so e.g. the 0x00
in the bottom right corner is at address 0x0001000C + 3
= 0x0001000F
):
+0 +1 +2 +3 0x00008000: 'h' 'e' 'l' 'l' 0x00008004: 'o' ' ' 'w' 'o' 0x00008008: 'r' 'l' 'd' 0x00 ... s0: 0x00010000: 0x00 0x00 0x80 0x00 s1: 0x00010004: 'h' 'e' 'l' 'l' 0x00010008: 'o' ' ' 'w' 'o' 0x0001000C: 'r' 'l' 'd' 0x00
The string literal "hello world"
is a 12-element array of char
(const char
in C++) with static storage duration, meaning that the memory for it is allocated when the program starts up and remains allocated until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior.
The line
char *s0 = "hello world";
defines s0
as a pointer to char
with auto storage duration (meaning the variable s0
only exists for the scope in which it is declared) and copies the address of the string literal (0x00008000
in this example) to it. Note that since s0
points to a string literal, it should not be used as an argument to any function that would try to modify it (e.g., strtok()
, strcat()
, strcpy()
, etc.).
The line
char s1[] = "hello world";
defines s1
as a 12-element array of char
(length is taken from the string literal) with auto storage duration and copies the contents of the literal to the array. As you can see from the memory map, we have two copies of the string "hello world"
; the difference is that you can modify the string contained in s1
.
s0
and s1
are interchangeable in most contexts; here are the exceptions:
sizeof s0 == sizeof (char*)
sizeof s1 == 12
type of &s0 == char **
type of &s1 == char (*)[12] // pointer to a 12-element array of char
You can reassign the variable s0
to point to a different string literal or to another variable. You cannot reassign the variable s1
to point to a different array.
C99 N1256 draft
There are two different uses of character string literals:
Initialize char[]
:
char c[] = "abc";
This is "more magic", and described at 6.7.8/14 "Initialization":
An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
So this is just a shortcut for:
char c[] = {'a', 'b', 'c', '\0'};
Like any other regular array, c
can be modified.
Everywhere else: it generates an:
So when you write:
char *c = "abc";
This is similar to:
/* __unnamed is magic because modifying it gives UB. */
static char __unnamed[] = "abc";
char *c = __unnamed;
Note the implicit cast from char[]
to char *
, which is always legal.
Then if you modify c[0]
, you also modify __unnamed
, which is UB.
This is documented at 6.4.5 "String literals":
5 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence [...]
6 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
6.7.8/32 "Initialization" gives a direct example:
EXAMPLE 8: The declaration
char s[] = "abc", t[3] = "abc";
defines "plain" char array objects
s
andt
whose elements are initialized with character string literals.This declaration is identical to
char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration
char *p = "abc";
defines
p
with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to usep
to modify the contents of the array, the behavior is undefined.
GCC 4.8 x86-64 ELF implementation
Program:
#include <stdio.h>
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}
Compile and decompile:
gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o
Output contains:
char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata
Conclusion: GCC stores char*
it in .rodata
section, not in .text
.
Note however that the default linker script puts .rodata
and .text
in the same segment, which has execute but no write permission. This can be observed with:
readelf -l a.out
which contains:
Section to Segment mapping:
Segment Sections...
02 .text .rodata
If we do the same for char[]
:
char s[] = "abc";
we obtain:
17: c7 45 f0 61 62 63 00 movl $0x636261,-0x10(%rbp)
so it gets stored in the stack (relative to %rbp
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With