Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the computational cost of count operation on strings Python?

For example:

'hello'.count('e')

Is this O(n)? I'm guessing the way it works is it scans 'hello' and increments a counter each time the letter 'e' is seen. How can I know this without guessing? I tried reading the source code here, but got stuck upon finding this:

def count(s, *args):
    """count(s, sub[, start[,end]]) -> int

    Return the number of occurrences of substring sub in string
    s[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

    """
    return s.count(*args)

Where can I read about what's executed in s.count(*args)?

edit: I understand what *args does in the context of Python functions.

like image 440
nacho Avatar asked Mar 07 '16 22:03

nacho


People also ask

Does count work with strings Python?

The Python count() method can be used to count the number of times a particular item appears in a list or a string. When used with a string, the count() method counts the number of times a substring appears in a larger string.

How do you count strings in Python?

The count() is a built-in function in Python. It will return you the count of a given element in a list or a string. In the case of a string, the counting begins from the start of the string till the end. It is also possible to specify the start and end index from where you want the search to begin.

Is Len function expensive in Python?

Python follows the idea that keeping the length as an attribute is cheap and easy to maintain. len() is actually a function that calls the method '__len__()'. This method is defined in the predefined classes of iterable data structures.

What does the count () method do in Python?

The count() method returns the number of elements with the specified value.


2 Answers

str.count is implemented in native code, in the stringobject.c file, which delegates to either stringlib_count, or PyUnicode_Count which itself delegates to stringlib_count again. stringlib_count ultimately uses fastsearch to search for occurrences of the substring in the string and counting those.

For one-character strings (e.g. your 'e'), it is short-circuited to the following code path:

for (i = 0; i < n; i++)
    if (s[i] == p[0]) {
        count++;
        if (count == maxcount)
            return maxcount;
    }
return count;

So yes, this is exactly as you assumed a simple iteration over the string sequence and counting the occurences of the substring.

For search strings longer than a single character it gets a bit more complicated, due to handling overlaps etc., and the logic is buried deeper in the fastsearch implementation. But it’s essentially the same: a linear search through the string.

So yes, str.count is in linear time, O(n). And if you think about it, it makes a lot of sense: In order to know how often a substring appears in a string, you need to look at every possible substring of the same length. So for a substring length of 1, you have to look at every character in the string, giving you a linear complexity.

Btw. for more information about the underlying fastsearch algorithm, see this article on effbot.org.


For Python 3, which only has a single Unicode string type, the links to the implementations are: unicode_count which uses stringlib_count which uses fastsearch.

like image 169
poke Avatar answered Jan 11 '23 04:01

poke


Much of python's library code is written in C. The code you are looking for is here:

http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup

static PyMethodDef
string_methods[] = {
    // ...
    {"count", (PyCFunction)string_count, METH_VARARGS, count__doc__},
    // ...
    {NULL,     NULL}                         /* sentinel */
};

static PyObject *
string_count(PyStringObject *self, PyObject *args) {
    ...
}
like image 21
AJNeufeld Avatar answered Jan 11 '23 04:01

AJNeufeld