Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C's strtok() and read only string literals

Tags:

c

string

strtok

char *strtok(char *s1, const char *s2)

repeated calls to this function break string s1 into "tokens"--that is the string is broken into substrings, each terminating with a '\0', where the '\0' replaces any characters contained in string s2. The first call uses the string to be tokenized as s1; subsequent calls use NULL as the first argument. A pointer to the beginning of the current token is returned; NULL is returned if there are no more tokens.

Hi,

I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.

Why is this?

I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.

like image 512
Gilbert Avatar asked Nov 07 '08 17:11

Gilbert


People also ask

What does strtok () do in C?

The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.

What is strtok () and implement user defined strtok ()?

The strtok() function is used in tokenizing a string based on a delimiter. It is present in the header file “string. h” and returns a pointer to the next token if present, if the next token is not present it returns NULL. To get all the tokens the idea is to call this function in a loop.

Does strtok affect string?

strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok() . Therefore the original string gets affected. strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token.

How do I use strings in strtok?

The first time the strtok() function is called, it returns a pointer to the first token in string1. In later calls with the same token string, the strtok() function returns a pointer to the next token in the string. A NULL pointer is returned when there are no more tokens. All tokens are null-ended.


2 Answers

What did you initialize the char * to?

If something like

char *text = "foobar";

then you have a pointer to some read-only characters

For

char text[7] = "foobar";

then you have a seven element array of characters that you can do what you like with.

strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.

Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.

Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)

like image 165
The Archetypal Paul Avatar answered Sep 20 '22 16:09

The Archetypal Paul


An important point that's inferred but not stated explicitly:

Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.

As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:

int x = 0;

The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.

When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.

Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.

When you have a statement like:

char *name = "Tom";

The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.

But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.

When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).

So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.

I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.

I hope this was helpful. ;)

like image 29
Jason L Avatar answered Sep 22 '22 16:09

Jason L