For example:
'hello'.count('e')
Is this O(n)? I'm guessing the way it works is it scans 'hello'
and increments a counter each time the letter 'e'
is seen. How can I know this without guessing? I tried reading the source code here, but got stuck upon finding this:
def count(s, *args):
"""count(s, sub[, start[,end]]) -> int
Return the number of occurrences of substring sub in string
s[start:end]. Optional arguments start and end are
interpreted as in slice notation.
"""
return s.count(*args)
Where can I read about what's executed in s.count(*args)
?
edit: I understand what *args
does in the context of Python functions.
The Python count() method can be used to count the number of times a particular item appears in a list or a string. When used with a string, the count() method counts the number of times a substring appears in a larger string.
The count() is a built-in function in Python. It will return you the count of a given element in a list or a string. In the case of a string, the counting begins from the start of the string till the end. It is also possible to specify the start and end index from where you want the search to begin.
Python follows the idea that keeping the length as an attribute is cheap and easy to maintain. len() is actually a function that calls the method '__len__()'. This method is defined in the predefined classes of iterable data structures.
The count() method returns the number of elements with the specified value.
str.count
is implemented in native code, in the stringobject.c
file, which delegates to either stringlib_count
, or PyUnicode_Count
which itself delegates to stringlib_count
again. stringlib_count
ultimately uses fastsearch
to search for occurrences of the substring in the string and counting those.
For one-character strings (e.g. your 'e'
), it is short-circuited to the following code path:
for (i = 0; i < n; i++)
if (s[i] == p[0]) {
count++;
if (count == maxcount)
return maxcount;
}
return count;
So yes, this is exactly as you assumed a simple iteration over the string sequence and counting the occurences of the substring.
For search strings longer than a single character it gets a bit more complicated, due to handling overlaps etc., and the logic is buried deeper in the fastsearch
implementation. But it’s essentially the same: a linear search through the string.
So yes, str.count
is in linear time, O(n). And if you think about it, it makes a lot of sense: In order to know how often a substring appears in a string, you need to look at every possible substring of the same length. So for a substring length of 1, you have to look at every character in the string, giving you a linear complexity.
Btw. for more information about the underlying fastsearch algorithm, see this article on effbot.org.
For Python 3, which only has a single Unicode string type, the links to the implementations are: unicode_count
which uses stringlib_count
which uses fastsearch
.
Much of python's library code is written in C. The code you are looking for is here:
http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup
static PyMethodDef
string_methods[] = {
// ...
{"count", (PyCFunction)string_count, METH_VARARGS, count__doc__},
// ...
{NULL, NULL} /* sentinel */
};
static PyObject *
string_count(PyStringObject *self, PyObject *args) {
...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With