Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I do what strtok() does in C, in Python?

Tags:

python

I am learning Python and trying to figure out an efficient way to tokenize a string of numbers separated by commas into a list. Well formed cases work as I expect, but less well formed cases not so much.

If I have this:

A = '1,2,3,4'
B = [int(x) for x in A.split(',')]

B results in [1, 2, 3, 4]

which is what I expect, but if the string is something more like

A = '1,,2,3,4,'

if I'm using the same list comprehension expression for B as above, I get an exception. I think I understand why (because some of the "x" string values are not integers), but I'm thinking that there would be a way to parse this still quite elegantly such that tokenization of the string a works a bit more directly like strtok(A,",\n\t") would have done when called iteratively in C.

To be clear what I am asking; I am looking for an elegant/efficient/typical way in Python to have all of the following example cases of strings:

A='1,,2,3,\n,4,\n'
A='1,2,3,4'
A=',1,2,3,4,\t\n'
A='\n\t,1,2,3,,4\n'

return with the same list of:

B=[1,2,3,4]

via some sort of compact expression.

like image 596
Tall Jeff Avatar asked Jan 18 '09 23:01

Tall Jeff


People also ask

What does strtok return in C?

strtok() returns a NULL pointer. The token ends with the first character contained in the string pointed to by string2. If such a character is not found, the token ends at the terminating NULL character. Subsequent calls to strtok() will return the NULL pointer.

How do you implement strtok?

Steps: Create a function strtok() which accepts string and delimiter as an argument and return char pointer. Create a static variable input to maintain the state of the string. Check if extracting the tokens for the first time then initialize the input with it.

What can I use instead of strtok?

Use strtok_r(). It's the same behaviour as strtok, but allow you to work with multiple strings "simultaneously". char *strtok_r(char *str, const char *delim, char **saveptr); The strtok_r() function is a reentrant version strtok().

What is the difference between strtok and strtok_r?

The strtok_r() function is a reentrant version of strtok() . The context pointer last must be provided on each call. The strtok_r() function may also be used to nest two parsing loops within one another, as long as separate context pointers are used.


2 Answers

How about this:

A = '1, 2,,3,4  '
B = [int(x) for x in A.split(',') if x.strip()]

x.strip() trims whitespace from the string, which will make it empty if the string is all whitespace. An empty string is "false" in a boolean context, so it's filtered by the if part of the list comprehension.

like image 154
Dave Ray Avatar answered Oct 08 '22 14:10

Dave Ray


For the sake of completeness, I will answer this seven year old question: The C program that uses strtok:

int main()
{
    char myLine[]="This is;a-line,with pieces";
    char *p;
    for(p=strtok(myLine, " ;-,"); p != NULL; p=strtok(NULL, " ;-,"))
    {
        printf("piece=%s\n", p);
    }
}

can be accomplished in python with re.split as:

import re
myLine="This is;a-line,with pieces"
for p in re.split("[ ;\-,]",myLine):
    print("piece="+p)
like image 21
user1683793 Avatar answered Oct 08 '22 16:10

user1683793