Why should strtok() be deprecated?

Tags:

I hear this from a lot of programmers that the use of strtok maybe deprecated in near future. Some say it is still. Why is it a bad choice? strtok() works great in tokenizing a given string. Does it have to do anything with the time and space complexities? Best link I found on the internet was this. But that doesn't seem to solve my curiousity. Suggest any alternatives if possible.

388

asked Jun 02 '17 20:06

Pushan Gupta

2 Answers

Why is it a bad choice?

The fundamental technique for solving problems by programming is to construct abstractions which can be used reliably to solve sub-problems, and then compose solutions to those sub-problems into solutions to larger problems.

strtok's behaviour works directly against these goals in a variety of ways; it is a poor abstraction that is unreliable because it composes poorly.

The fundamental problem of tokenization is: given a position in a string, give the position of the end of the token beginning at that position. If strtok did only that, it would be great. It would have a clear abstraction, it would not rely on hidden global state, it would not modify its inputs.

To see the limitations of strtok, imagine trying to tokenize a language where we wish to separate tokens by spaces, unless the token is enclosed in " ", in which case we wish to apply a different tokenization rule to the contents of the quoted area, and then pick up with the space separation rule after. strtok composes very poorly with itself, and is therefore only useful for the most trivial of tokenization tasks.

Does it have to do anything with the time and space complexities?

No.

Suggest any alternatives if possible.

Lexers are not hard to write; just write one!

Bonus points if you write an immutable lexer. An immutable lexer is a little struct that contains a reference to the string being lexed, the current position of the lexer, and any state needed by the lexer. To extract a token you call a "next token" method, pass in the lexer, and you get back the token and a new lexer. The new lexer can then be used to lex the next token, and you discard the previous lexer if you wish.

The immutable lexer technique is easier to reason about than lexers which modify state. And you can debug them by saving the discarded lexers in a list, and now you have the complete history of tokenization operations open to inspection at once.

154

answered Oct 06 '22 00:10

Eric Lippert

The limitation of strtok(char *str, const char *delim) is that it can't work on multiple strings simultaneously as it maintains a static pointer to store the index till it has parsed (hence sufficient if playing with only one string at a time). The better and safer method is to use strtok_r(char *str, const char *delim, char **saveptr) which explicitly takes a third pointer to save the parsed index.

answered Oct 06 '22 01:10

Shashwat Kumar

Related questions
                            
                                Getting base name of the source file at compile time
                            
                                Commenting C code, header and source files [closed]
                            
                                glGenBuffers not defined?
                            
                                Why do I need a redistributable package on unmanaged code? (msvcp100.dll)
                            
                                Defining the function's argument type after the ")", is it a very old standard?
                            
                                How to return matrix (2D array) from function? (C)
                            
                                __attribute__(packed) v/s GCC __attribute__((aligned(x))
                            
                                int8_t vs char ; Which is the best one?
                            
                                How many bytes do pointers take up?
                            
                                Am I correct that strcmp is equivalent (and safe) for literals?
                            
                                How could these case conversion functions be improved?
                            
                                Explain stack overflow and heap overflow in programming with example? [duplicate]
                            
                                Not able to compile C/C++ code that's using ncurses [duplicate]
                            
                                Linux - ioctl with FIONREAD always 0
                            
                                Is usleep() in C implemented as busy wait?
                            
                                RSA encrypt/decrypt
                            
                                What is the optimal algorithm for generating an unbiased random integer within a range?
                            
                                How to pass arguments to processes created by fork()
                            
                                abs 'implicit declaration...' error after including math.h
                            
                                cmake include header into every source file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why should strtok() be deprecated?

Tags:

c

token

strtok

Pushan Gupta

People also ask

2 Answers

Eric Lippert

Shashwat Kumar

Recent Activity

Donate For Us