Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does strtok() split the string into tokens in C?

People also ask

How does strtok () work in C?

The strtok() function parses the string up to the first instance of the delimiter character, replaces the character in place with a null byte ( '\0' ), and returns the address of the first character in the token. Subsequent calls to strtok() begin parsing immediately after the most recently placed null character.

How does strtok () works how can you make it thread-safe?

If you create some sort of mutex or alternate guard to protect the use of the strtok function, and only call strtok from a thread which has obtained a lock on said mutex, then strtok is safe to use. However, it is probably better just to use the thread-safe strtok_r as others have said.

What is strtok () and implement user defined strtok ()?

The strtok() function is used in tokenizing a string based on a delimiter. It is present in the header file “string. h” and returns a pointer to the next token if present, if the next token is not present it returns NULL. To get all the tokens the idea is to call this function in a loop.


the strtok runtime function works like this

the first time you call strtok you provide a string that you want to tokenize

char s[] = "this is a string";

in the above string space seems to be a good delimiter between words so lets use that:

char* p = strtok(s, " ");

what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)

in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointer to your previous passed string:

p = strtok(NULL," ");

p now points to 'is'

and so on until no more spaces can be found, then the last string is returned as the last token 'string'.

more conveniently you could write it like this instead to print out all tokens:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
  puts(p);
}

EDIT:

If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.


strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."


strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.

This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.


strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)

From the POSIX strtok page:

This function uses static storage to keep track of the current string position between calls.

There is a thread-safe variant (strtok_r) that doesn't do this type of magic.


strtok will tokenize a string i.e. convert it into a series of substrings.

It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.

The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.

One way to invoke strtok, succintly, is as follows:

char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;

for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
    printf("token=%s\n", token);
}

Result:

this
is
the
string
I
want
to
parse

The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.

The strtok function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)


strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.