Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining zero-length string from strtok()

Tags:

c

csv

strtok

I have a CSV file containing data such as

value;name;test;etc

which I'm trying to split by using strtok(string, ";"). However, this file can contain zero-length data, like this:

value;;test;etc

which strtok() skips. Is there a way I can avoid strtok from skipping zero-length data like this?

like image 925
Mauren Avatar asked Sep 16 '13 12:09

Mauren


People also ask

Can strtok return empty string?

If the string starts or is terminated with the delimiter, the system considers empty space before or after the delimiter, respectively, as a valid token. Similar to Linux strtok(), STRTOK never returns an empty string as a token.

Can you use strtok on string?

In C, the strtok() function is used to split a string into a series of tokens based on a particular delimiter.

What is strtok () in C?

The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.

What is strtok () and implement user defined strtok ()?

The strtok() function is used in tokenizing a string based on a delimiter. It is present in the header file “string. h” and returns a pointer to the next token if present, if the next token is not present it returns NULL. To get all the tokens the idea is to call this function in a loop.


1 Answers

A possible alternative is to use the BSD function strsep() instead of strtok(), if available. From the man page:

The strsep() function is intended as a replacement for the strtok() function. While the strtok() function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990 ("ISO C90")) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. The strsep() function first appeared in 4.4BSD.

A simple example (also copied from that man page):

char *token, *string, *tofree;

tofree = string = strdup("value;;test;etc");
while ((token = strsep(&string, ";")) != NULL)
    printf("token=%s\n", token);

free(tofree);

Output:

token=value
token=
token=test
token=etc

so empty fields are handled correctly.

Of course, as others already said, none of these simple tokenizer functions handles delimiter inside quotation marks correctly, so if that is an issue, you should use a proper CSV parsing library.

like image 85
Martin R Avatar answered Sep 28 '22 07:09

Martin R