Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C strtok() split string into tokens but keep old data unaltered

Tags:

c

strtok

I have the following code:

#include <stdio.h>
#include <string.h>

int main (void) {
    char str[] = "John|Doe|Melbourne|6270|AU";

    char fname[32], lname[32], city[32], zip[32], country[32];
    char *oldstr = str;

    strcpy(fname, strtok(str, "|"));
    strcpy(lname, strtok(NULL, "|"));
    strcpy(city, strtok(NULL, "|"));
    strcpy(zip, strtok(NULL, "|"));
    strcpy(country, strtok(NULL, "|"));

    printf("Firstname: %s\n", fname);
    printf("Lastname: %s\n", lname);
    printf("City: %s\n", city);
    printf("Zip: %s\n", zip);
    printf("Country: %s\n", country);
    printf("STR: %s\n", str);
    printf("OLDSTR: %s\n", oldstr);

    return 0;
}

Execution output:

$ ./str
Firstname: John
Lastname: Doe
City: Melbourne
Zip: 6270
Country: AU
STR: John
OLDSTR: John

Why can't I keep the old data nor in the str or oldstr, what am I doing wrong and how can I not alter the data or keep it?

like image 630
bsteo Avatar asked Jun 14 '13 09:06

bsteo


People also ask

Does strtok affect the original string?

strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok() . Therefore the original string gets affected. strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token.

What happens to string after strtok?

strtok() returns a NULL pointer. The token ends with the first character contained in the string pointed to by string2. If such a character is not found, the token ends at the terminating NULL character. Subsequent calls to strtok() will return the NULL pointer.

What can I use instead of strtok in C?

Use strtok_r(). It's the same behaviour as strtok, but allow you to work with multiple strings "simultaneously".

Why multiple call to strtok () is not safe?

The strtok() function uses a static buffer while parsing, so it's not thread safe.


2 Answers

when you do strtok(NULL, "|") strtok() find token and put null on place (replace token with \0) and modify string.

you str, becomes:

char str[] = John0Doe0Melbourne062700AU;
                 
  Str array in memory 
+------------------------------------------------------------------------------------------------+
|'J'|'o'|'h'|'n'|0|'D'|'o'|'e'|0|'M'|'e'|'l'|'b'|'o'|'u'|'r'|'n'|'e'|0|'6'|'2'|'7'|'0'|0|'A'|'U'|0|
+------------------------------------------------------------------------------------------------+
                 ^  replace | with \0  (ASCII value is 0)

Consider the diagram is important because char '0' and 0 are diffident (in string 6270 are char in figure parenthesised by ' where for \0 0 is as number)

when you print str using %s it print chars upto first \0 that is John

To keep your original str unchanged you should fist copy str into some tempstr variable and then use that tempstr string in strtok():

char str[] = "John|Doe|Melbourne|6270|AU";
char* tempstr = calloc(strlen(str)+1, sizeof(char));
strcpy(tempstr, str);

Now use this tempstr string in place of str in your code.

like image 172
Grijesh Chauhan Avatar answered Sep 30 '22 00:09

Grijesh Chauhan


Because oldstr is just a pointer, an assignment will not make a new copy of your string.

Copy it before passing str to the strtok:

          char *oldstr=malloc(sizeof(str));
          strcpy(oldstr,str);

Your corrected version:

#include <stdio.h>
#include <string.h>
#include<malloc.h>
int main (void) {

   char str[] = "John|Doe|Melbourne|6270|AU";
   char fname[32], lname[32], city[32], zip[32], country[32];
   char *oldstr = malloc(sizeof(str));
   strcpy(oldstr,str);

    ...................
    free(oldstr);
return 0;
}

EDIT:

As @CodeClown mentioned, in your case, it's better to use strncpy. And instead of fixing the sizes of fname etc before hand, you can have pointers in their place and allocate the memory as is required not more and not less. That way you can avoid writing to the buffer out of bounds......

Another Idea: would be to assign the result of strtok to pointers *fname, *lname, etc.. instead of arrays. It seems the strtok is designed to be used that way after seeing the accepted answer.

Caution:In this way, if you change str further that would be reflected in fname,lname also. Because, they just point to str data but not to new memory blocks. So, use oldstr for other manipulations.

#include <stdio.h>
#include <string.h>
#include<malloc.h>
int main (void) {

    char str[] = "John|Doe|Melbourne|6270|AU";
    char *fname, *lname, *city, *zip, *country;
    char *oldstr = malloc(sizeof(str));
    strcpy(oldstr,str);
    fname=strtok(str,"|");
    lname=strtok(NULL,"|");
    city=strtok(NULL, "|");
    zip=strtok(NULL, "|");
    country=strtok(NULL, "|");

    printf("Firstname: %s\n", fname);
    printf("Lastname: %s\n", lname);
    printf("City: %s\n", city);
    printf("Zip: %s\n", zip);
    printf("Country: %s\n", country);
    printf("STR: %s\n", str);
    printf("OLDSTR: %s\n", oldstr);
    free(oldstr);
return 0;
}
like image 24
pinkpanther Avatar answered Sep 30 '22 01:09

pinkpanther