Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Source code for str.split?

I would like to see how str.split() is implemented in Python Here's what I tried:

> inspect.getsource(str.split)

TypeError: <method 'split' of 'str' objects> is not a module, 
class, method, function, traceback, frame, or code object

Copying the other example on StackOverflow has not work: Code for Greatest Common Divisor in Python

like image 465
john mangual Avatar asked Oct 30 '16 19:10

john mangual


People also ask

What is the limit of the string split method?

This variant of the split method takes a regular expression as a parameter and breaks the given string around matches of this regular expression regex. Here, by default limit is 0. Returns: An array of strings is computed by splitting the given string.

What is the default value of STR_split in Python?

Default is 1 If length is less than 1, the str_split () function will return FALSE. If length is larger than the length of string, the entire string will be returned as the only element of the array.

What is string split () method in Java?

The string split () method breaks a given string around matches of the given regular expression. For Example: Input String: 016-78967 Regular Expression: - Output : {"016", "78967"} Following are the two variants of split () method in Java: 1. Public String [ ] split ( String regex, int limit )

How to split a string around a regular expression?

The string split() method breaks a given string around matches of the given regular expression. For Example: Input String: 016-78967 Regular Expression: - Output : {"016", "78967"}.


1 Answers

inspect.getsource(str.split) is not written to handle code written in the implementation language (C here). str.split is builtin, i.e written in C.

The source code for the implementation of str.split is broken up in two parts based on if a sep argument is supplied.

The first function, for when no sep argument is supplied and split removes white space characters, is split_whitespace. How it is implemented is pretty straight-forward; the main bulk is located in the while loop that removes leading whitespace, searches the remaining string characters if any white space exists and splits on it. I've added some comments for illustrating this:

i = j = 0;
while (maxcount-- > 0) {
    /* Increment counter past all leading whitespace in 
       the string. */
    while (i < str_len && STRINGLIB_ISSPACE(str[i]))
        i++;
    /* if string only contains whitespace, break. */
    if (i == str_len) break;

    /* After leading white space, increment counter 
       while the character is not a whitespace. 
       If this ends before i == str_len, it points to 
       a white space character. */
    j = i; i++;
    while (i < str_len && !STRINGLIB_ISSPACE(str[i]))
        i++;
#ifndef STRINGLIB_MUTABLE
    /* Case where no split should be done, return the string. */
    if (j == 0 && i == str_len && STRINGLIB_CHECK_EXACT(str_obj)) {
        /* No whitespace in str_obj, so just use it as list[0] */
        Py_INCREF(str_obj);
        PyList_SET_ITEM(list, 0, (PyObject *)str_obj);
        count++;
        break;
    }
#endif
    /* Make the split based on the incremented counters. */
    SPLIT_ADD(str, j, i);
}

Similarly, split_char is the case where a character is supplied as sep. Its implementation is again pretty straight-forward, examine it a bit after seeing split_whitespace; you won't find it too difficult.

There's also the split function for handling cases where the separator is more than one characters long. This is implemented by searching for the characters in the string and splitting accordingly.

like image 157
Dimitris Fasarakis Hilliard Avatar answered Oct 19 '22 09:10

Dimitris Fasarakis Hilliard