char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is the string is broken into substrings, each terminating with a '\0', where the '\0' replaces any characters contained in string s2. The first call uses the string to be tokenized as s1; subsequent calls use NULL as the first argument. A pointer to the beginning of the current token is returned; NULL is returned if there are no more tokens.
Hi,
I have been trying to use strtok
just now and found out that if I pass in a char*
into s1
, I get a segmentation fault. If I pass in a char[]
, strtok
works fine.
Why is this?
I googled around and the reason seems to be something about how char*
is read only and char[]
is writeable. A more thorough explanation would be much appreciated.
The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.
The strtok() function is used in tokenizing a string based on a delimiter. It is present in the header file “string. h” and returns a pointer to the next token if present, if the next token is not present it returns NULL. To get all the tokens the idea is to call this function in a loop.
strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok() . Therefore the original string gets affected. strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token.
The first time the strtok() function is called, it returns a pointer to the first token in string1. In later calls with the same token string, the strtok() function returns a pointer to the next token in the string. A NULL pointer is returned when there are no more tokens. All tokens are null-ended.
What did you initialize the char *
to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok
writes into the string you give it - overwriting the separator character with null
and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok
keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x
with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x
references to int
value 0
.
Note the subtle difference here: the memory location that reference point x
holds is constant (and cannot be changed). However, the value that x
points can be changed. You do it in your code through assignment, e.g. x = 15;
. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name
with a constant reference to a region of memory I've allocated to hold a char
pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL
. Then I need to replace the part of the code which says "Tom"
with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char
's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char *
is not read-only. Only a const char *
is read-only. But your problem in this case isn't that char *
s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With