Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between strtok and strsep in C

Tags:

c

strtok

strsep

Could someone explain me what differences there are between strtok() and strsep()? What are the advantages and disadvantages of them? And why would I pick one over the other one.

like image 568
mizuki Avatar asked Aug 28 '11 02:08

mizuki


People also ask

What is strtok in C?

The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.

How does Strsep work in C?

What is the “strsep” Function in C? The “strsep” function in the C programming language is used to slice the given strings. While writing your code in C, you often come across different lengthy strings that you want to tokenize based upon a given delimiter.


2 Answers

One major difference between strtok() and strsep() is that strtok() is standardized (by the C standard, and hence also by POSIX) but strsep() is not standardized (by C or POSIX; it is available in the GNU C Library, and originated on BSD). Thus, portable code is more likely to use strtok() than strsep().

Another difference is that calls to the strsep() function on different strings can be interleaved, whereas you cannot do that with strtok() (though you can with strtok_r()). So, using strsep() in a library doesn't break other code accidentally, whereas using strtok() in a library function must be documented because other code using strtok() at the same time cannot call the library function.

The manual page for strsep() at kernel.org says:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.

Thus, the other major difference is the one highlighted by George Gaál in his answer; strtok() permits multiple delimiters between a single token, whereas strsep() expects a single delimiter between tokens, and interprets adjacent delimiters as an empty token.

Both strsep() and strtok() modify their input strings and neither lets you identify which delimiter character marked the end of the token (because both write a NUL '\0' over the separator after the end of the token).

When to use them?

  • You would use strsep() when you want empty tokens rather than allowing multiple delimiters between tokens, and when you don't mind about portability.
  • You would use strtok_r() when you want to allow multiple delimiters between tokens and you don't want empty tokens (and POSIX is sufficiently portable for you).
  • You would only use strtok() when someone threatens your life if you don't do so. And you'd only use it for long enough to get you out of the life-threatening situation; you would then abandon all use of it once more. It is poisonous; do not use it. It would be better to write your own strtok_r() or strsep() than to use strtok().

Why is strtok() poisonous?

The strtok() function is poisonous if used in a library function. If your library function uses strtok(), it must be documented clearly.

That's because:

  1. If any calling function is using strtok() and calls your function that also uses strtok(), you break the calling function.
  2. If your function calls any function that calls strtok(), that will break your function's use of strtok().
  3. If your program is multithreaded, at most one thread can be using strtok() at any given time — across a sequence of strtok() calls.

The root of this problem is the saved state between calls that allows strtok() to continue where it left off. There is no sensible way to fix the problem other than "do not use strtok()".

  • You can use strsep() if it is available.
  • You can use POSIX's strtok_r() if it is available.
  • You can use Microsoft's strtok_s() if it is available.
  • Nominally, you could use the ISO/IEC 9899:2011 Annex K.3.7.3.1 function strtok_s(), but its interface is different from both strtok_r() and Microsoft's strtok_s().

BSD strsep():

char *strsep(char **stringp, const char *delim); 

POSIX strtok_r():

char *strtok_r(char *restrict s, const char *restrict sep, char **restrict state); 

Microsoft strtok_s():

char *strtok_s(char *strToken, const char *strDelimit, char **context); 

Annex K strtok_s():

char *strtok_s(char * restrict s1, rsize_t * restrict s1max,                const char * restrict s2, char ** restrict ptr); 

Note that this has 4 arguments, not 3 as in the other two variants on strtok().

like image 191
Jonathan Leffler Avatar answered Sep 30 '22 21:09

Jonathan Leffler


From The GNU C Library manual - Finding Tokens in a String:

One difference between strsep and strtok_r is that if the input string contains more than one character from delimiter in a row strsep returns an empty string for each pair of characters from delimiter. This means that a program normally should test for strsep returning an empty string before processing it.

like image 28
George Gaál Avatar answered Sep 30 '22 23:09

George Gaál