I want to split a char *string
based on multiple-character delimiter. I know that strtok()
is used to split a string but it works with single character delimiter.
I want to split char *string based on a substring such as "abc"
or any other sub-string. How that can be achieved?
Finding the point at which the desired sequence occurs is pretty easy: strstr
supports that:
char str[] = "this is abc a big abc input string abc to split up";
char *pos = strstr(str, "abc");
So, at that point, pos
points to the first location of abc
in the larger string. Here's where things get a little ugly. strtok
has a nasty design where it 1) modifies the original string, and 2) stores a pointer to the "current" location in the string internally.
If we didn't mind doing roughly the same, we could do something like this:
char *multi_tok(char *input, char *delimiter) {
static char *string;
if (input != NULL)
string = input;
if (string == NULL)
return string;
char *end = strstr(string, delimiter);
if (end == NULL) {
char *temp = string;
string = NULL;
return temp;
}
char *temp = string;
*end = '\0';
string = end + strlen(delimiter);
return temp;
}
This does work. For example:
int main() {
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, "abc");
}
}
produces roughly the expected output:
this is
a big
input string
to split up
Nonetheless, it's clumsy, difficult to make thread-safe (you have to make its internal string
variable thread-local) and generally just a crappy design. Using (for one example) an interface something like strtok_r
, we can fix at least the thread-safety issue:
typedef char *multi_tok_t;
char *multi_tok(char *input, multi_tok_t *string, char *delimiter) {
if (input != NULL)
*string = input;
if (*string == NULL)
return *string;
char *end = strstr(*string, delimiter);
if (end == NULL) {
char *temp = *string;
*string = NULL;
return temp;
}
char *temp = *string;
*end = '\0';
*string = end + strlen(delimiter);
return temp;
}
multi_tok_t init() { return NULL; }
int main() {
multi_tok_t s=init();
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, &s, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, &s, "abc");
}
}
I guess I'll leave it at that for now though--to get a really clean interface, we really want to reinvent something like coroutines, and that's probably a bit much to post here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With