Could someone explain me what differences there are between strtok()
and strsep()
? What are the advantages and disadvantages of them? And why would I pick one over the other one.
The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.
What is the “strsep” Function in C? The “strsep” function in the C programming language is used to slice the given strings. While writing your code in C, you often come across different lengthy strings that you want to tokenize based upon a given delimiter.
One major difference between strtok()
and strsep()
is that strtok()
is standardized (by the C standard, and hence also by POSIX) but strsep()
is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use strtok()
than strsep()
.
Another difference is that calls to the strsep()
function on different strings can be interleaved, whereas you cannot do that with strtok()
(though you can with strtok_r()
). So, using strsep()
in a library doesn't break other code accidentally, whereas using strtok()
in a library function must be documented because other code using strtok()
at the same time cannot call the library function.
The manual page for strsep()
at kernel.org says:
The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.
Thus, the other major difference is the one highlighted by George Gaál in his answer; strtok()
permits multiple delimiters between a single token, whereas strsep()
expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.
Both strsep()
and strtok()
modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\0'
over the separator after the end of the token).
strsep()
when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.strtok_r()
when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).strtok()
when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own strtok_r()
or strsep()
than to use strtok()
.strtok()
poisonous?The strtok()
function is poisonous if used in a library function. If your library function uses strtok()
, it must be documented clearly.
That's because:
strtok()
and calls your function that also uses strtok()
, you break the calling function.strtok()
, that will break your function's use of strtok()
.strtok()
at any given time — across a sequence of strtok()
calls.The root of this problem is the saved state between calls that allows strtok()
to continue where it left off. There is no sensible way to fix the problem other than "do not use strtok()
".
strsep()
if it is available.strtok_r()
if it is available.strtok_s()
if it is available.strtok_s()
, but its interface is different from both strtok_r()
and Microsoft's strtok_s()
.BSD strsep()
:
char *strsep(char **stringp, const char *delim);
POSIX strtok_r()
:
char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);
Microsoft strtok_s()
:
char *strtok_s(char *strToken, const char *strDelimit, char **context);
Annex K strtok_s()
:
char *strtok_s(char * restrict s1, rsize_t * restrict s1max, const char * restrict s2, char ** restrict ptr);
Note that this has 4 arguments, not 3 as in the other two variants on strtok()
.
From The GNU C Library manual - Finding Tokens in a String:
One difference between
strsep
andstrtok_r
is that if the input string contains more than one character from delimiter in a rowstrsep
returns an empty string for each pair of characters from delimiter. This means that a program normally should test forstrsep
returning an empty string before processing it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With