Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Runtime of collections.Counter Equality

I am wondering what the big-O runtime complexity is for comparing two collections.Counter objects. Here is some code to demonstrate what I mean:

import collections
counter_1 = collections.Counter("abcabcabcabcabcabcdefg")
counter_2 = collections.Counter("xyzxyzxyzabc")
comp = counter_1 == counter_2  # What is the runtime of this comparison statement?

Is the runtime of the equality comparison in the final statement O(1)? Or is it O(num_of_unique_keys_in_largest_counter)? Or is it something else?

For reference, here is the source code for collections.Counter https://github.com/python/cpython/blob/0250de48199552cdaed5a4fe44b3f9cdb5325363/Lib/collections/init.py#L497

I do not see the class implementing an __eq()__ method.

Bonus points: If the answer to this question changes between python2 and python3, I would love to hear the difference?

like image 562
Ryan Avatar asked Feb 13 '26 10:02

Ryan


1 Answers

Counter is a subclass of dict, therefore the big O analysis is the one of dict, with the caveat that Counter objects are specialized to only hold int values (i/e they can not hold collections of values as dicts can); this simplifies the analysis.

Looking at the c code implementation of the equality comparison:

  • There is an early exit if the number of keys is different. (this does not influence big-O).
  • Then a loop that iterates over all the keys that exits early if the key is not found, or if the corresponding value is different. (again, this has no bearing on big-O).
  • if all keys are found, and the corresponding values are all equal, then the dictionaries are declared equal. The lookup and comparisons of each key-value pair is O(1); this operation is repeated at most n times (n being the number of keys)

In all, the time complexity is O(n), with n the number of keys.
This applies to both python 2 and 3.


from dictobject.c

/* Return 1 if dicts equal, 0 if not, -1 if error.
 * Gets out as soon as any difference is detected.
 * Uses only Py_EQ comparison.
 */
static int
dict_equal(PyDictObject *a, PyDictObject *b)
{
    Py_ssize_t i;

    if (a->ma_used != b->ma_used)
        /* can't be equal if # of entries differ */
        return 0;
    /* Same # of entries -- check all of 'em.  Exit early on any diff. */
    for (i = 0; i < a->ma_keys->dk_nentries; i++) {
        PyDictKeyEntry *ep = &DK_ENTRIES(a->ma_keys)[i];
        PyObject *aval;
        if (a->ma_values)
            aval = a->ma_values[i];
        else
            aval = ep->me_value;
        if (aval != NULL) {
            int cmp;
            PyObject *bval;
            PyObject *key = ep->me_key;
            /* temporarily bump aval's refcount to ensure it stays
               alive until we're done with it */
            Py_INCREF(aval);
            /* ditto for key */
            Py_INCREF(key);
            /* reuse the known hash value */
            b->ma_keys->dk_lookup(b, key, ep->me_hash, &bval);
            if (bval == NULL) {
                Py_DECREF(key);
                Py_DECREF(aval);
                if (PyErr_Occurred())
                    return -1;
                return 0;
            }
            cmp = PyObject_RichCompareBool(aval, bval, Py_EQ);
            Py_DECREF(key);
            Py_DECREF(aval);
            if (cmp <= 0)  /* error or not equal */
                return cmp;
        }
    }
    return 1;
}
like image 144
Reblochon Masque Avatar answered Feb 15 '26 01:02

Reblochon Masque



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!