Could someone explain me what differences there are between <code>strtok()</code> and <code>strsep()</code>? What are the advantages and disadvantages of them? And why would I pick one over the other one.

One major difference between <code>strtok()</code> and <code>strsep()</code> is that <code>strtok()</code> is standardized (by the C standard, and hence also by POSIX) but <code>strsep()</code> is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use <code>strtok()</code> than <code>strsep()</code>. Another difference is that calls to the <code>strsep()</code> function on different strings can be interleaved, whereas you cannot do that with <code>strtok()</code> (though you can with <code>strtok_r()</code>). So, using <code>strsep()</code> in a library doesn't break other code accidentally, whereas using <code>strtok()</code> in a library function must be documented because other code using <code>strtok()</code> at the same time cannot call the library function. The manual page for <code>strsep()</code> at kernel.org says: <blockquote> The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields. </blockquote> Thus, the other major difference is the one highlighted by George Gaál in his answer; <code>strtok()</code> permits multiple delimiters between a single token, whereas <code>strsep()</code> expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token. Both <code>strsep()</code> and <code>strtok()</code> modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL <code>'\0'</code> over the separator after the end of the token). <h3>When to use them?</h3> <ul> <li>You would use <code>strsep()</code> when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.</li> <li>You would use <code>strtok_r()</code> when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).</li> <li>You would only use <code>strtok()</code> when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own <code>strtok_r()</code> or <code>strsep()</code> than to use <code>strtok()</code>.</li> </ul> <h3>Why is <code>strtok()</code> poisonous?</h3> The <code>strtok()</code> function is poisonous if used in a library function. If your library function uses <code>strtok()</code>, it must be documented clearly. That's because: <ol> <li>If any calling function is using <code>strtok()</code> and calls your function that also uses <code>strtok()</code>, you break the calling function.</li> <li>If your function calls any function that calls <code>strtok()</code>, that will break your function's use of <code>strtok()</code>.</li> <li>If your program is multithreaded, at most one thread can be using <code>strtok()</code> at any given time — across a sequence of <code>strtok()</code> calls.</li> </ol> The root of this problem is the saved state between calls that allows <code>strtok()</code> to continue where it left off. There is no sensible way to fix the problem other than "do not use <code>strtok()</code>". <ul> <li>You can use <code>strsep()</code> if it is available.</li> <li>You can use POSIX's <code>strtok_r()</code> if it is available.</li> <li>You can use Microsoft's <code>strtok_s()</code> if it is available.</li> <li>Nominally, you could use the ISO/IEC 9899:2011 Annex K.3.7.3.1 function <code>strtok_s()</code>, but its interface is different from both <code>strtok_r()</code> and Microsoft's <code>strtok_s()</code>.</li> </ul> BSD <code>strsep()</code>: <pre class="prettyprint"><code>char *strsep(char **stringp, const char *delim); </code></pre> POSIX <code>strtok_r()</code>: <pre class="prettyprint"><code>char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state); </code></pre> Microsoft <code>strtok_s()</code>: <pre class="prettyprint"><code>char *strtok_s(char *strToken, const char *strDelimit, char **context); </code></pre> Annex K <code>strtok_s()</code>: <pre class="prettyprint"><code>char *strtok_s(char * restrict s1, rsize_t * restrict s1max, const char * restrict s2, char ** restrict ptr); </code></pre> Note that this has 4 arguments, not 3 as in the other two variants on <code>strtok()</code>.

From The GNU C Library manual - Finding Tokens in a String: <blockquote> One difference between <code>strsep</code> and <code>strtok_r</code> is that if the input string contains more than one character from delimiter in a row <code>strsep</code> returns an empty string for each pair of characters from delimiter. This means that a program normally should test for <code>strsep</code> returning an empty string before processing it. </blockquote>

What are the differences between strtok and strsep in C

2 Answers

One major difference between strtok() and strsep() is that strtok() is standardized (by the C standard, and hence also by POSIX) but strsep() is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use strtok() than strsep().

Another difference is that calls to the strsep() function on different strings can be interleaved, whereas you cannot do that with strtok() (though you can with strtok_r()). So, using strsep() in a library doesn't break other code accidentally, whereas using strtok() in a library function must be documented because other code using strtok() at the same time cannot call the library function.

The manual page for strsep() at kernel.org says:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.

Thus, the other major difference is the one highlighted by George Gaál in his answer; strtok() permits multiple delimiters between a single token, whereas strsep() expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.

Both strsep() and strtok() modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\0' over the separator after the end of the token).

When to use them?

You would use strsep() when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.
You would use strtok_r() when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).
You would only use strtok() when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own strtok_r() or strsep() than to use strtok().

Why is `strtok()` poisonous?

The strtok() function is poisonous if used in a library function. If your library function uses strtok(), it must be documented clearly.

That's because:

If any calling function is using strtok() and calls your function that also uses strtok(), you break the calling function.
If your function calls any function that calls strtok(), that will break your function's use of strtok().
If your program is multithreaded, at most one thread can be using strtok() at any given time — across a sequence of strtok() calls.

The root of this problem is the saved state between calls that allows strtok() to continue where it left off. There is no sensible way to fix the problem other than "do not use strtok()".

You can use strsep() if it is available.
You can use POSIX's strtok_r() if it is available.
You can use Microsoft's strtok_s() if it is available.
Nominally, you could use the ISO/IEC 9899:2011 Annex K.3.7.3.1 function strtok_s(), but its interface is different from both strtok_r() and Microsoft's strtok_s().

BSD strsep():

char *strsep(char **stringp, const char *delim);

POSIX strtok_r():

char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state);

Microsoft strtok_s():

char *strtok_s(char *strToken, const char *strDelimit, char **context);

Annex K strtok_s():

char *strtok_s(char * restrict s1, rsize_t * restrict s1max,                const char * restrict s2, char ** restrict ptr);

Note that this has 4 arguments, not 3 as in the other two variants on strtok().

191

answered Sep 30 '22 21:09

Jonathan Leffler

From The GNU C Library manual - Finding Tokens in a String:

One difference between strsep and strtok_r is that if the input string contains more than one character from delimiter in a row strsep returns an empty string for each pair of characters from delimiter. This means that a program normally should test for strsep returning an empty string before processing it.

answered Sep 30 '22 23:09

George Gaál

Related questions
                            
                                Why is char[] on the stack but char * on the heap?
                            
                                implicit declaration of function usleep
                            
                                What does assert(0) mean?
                            
                                Executing machine code in memory
                            
                                How do I retrieve an error string from WSAGetLastError()?
                            
                                How big can a malloc be in C?
                            
                                Connecting n commands with pipes in a shell?
                            
                                Pointer expressions: **ptr++, *++*ptr and ++**ptr use
                            
                                Operating System compile time
                            
                                Using floats with sprintf() in embedded C
                            
                                Can functions from the C standard library be used in C++?
                            
                                How to sleep in the Linux kernel?
                            
                                Strange definition of FALSE and TRUE, why? [duplicate]
                            
                                Compiler optimization of bitwise not operation
                            
                                C Macro - how to get an integer value into a string literal [duplicate]
                            
                                Use static_assert to check types passed to macro
                            
                                Is one's complement a real-world issue, or just a historical one?
                            
                                What is *(uint32_t*)?
                            
                                How can I get an int from stdio in C?
                            
                                Increment operator is not invoked at sizeof(++n) expression [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the differences between strtok and strsep in C

Tags:

c

strtok

strsep

mizuki

People also ask

2 Answers

When to use them?

Why is `strtok()` poisonous?

Jonathan Leffler

George Gaál

Recent Activity

Donate For Us

What are the differences between strtok and strsep in C

Tags:

c

strtok

strsep

mizuki

People also ask

2 Answers

When to use them?

Why is strtok() poisonous?

Jonathan Leffler

George Gaál

Related questions

Recent Activity

Donate For Us

Why is `strtok()` poisonous?