Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do dict keys support list subtraction but not tuple subtraction?

Presumably dict_keys are supposed to behave as a set-like object, but they are lacking the difference method and the subtraction behaviour seems to diverge.

>>> d = {0: 'zero', 1: 'one', 2: 'two', 3: 'three'}
>>> d.keys() - [0, 2]
{1, 3}
>>> d.keys() - (0, 2)
TypeError: 'int' object is not iterable

Why does dict_keys class try to iterate an integer here? Doesn't that violate duck-typing?


>>> dict.fromkeys(['0', '1', '01']).keys() - ('01',)
{'01'}
>>> dict.fromkeys(['0', '1', '01']).keys() - ['01',]
{'1', '0'}
like image 559
wim Avatar asked Mar 03 '16 22:03

wim


People also ask

Can you subtract tuple?

Explanation. Two tuples are defined, and are displayed on the console. The lambda function is used to subtract each of the corresponding elements from the two tuples.

Do Dict Keys return same order?

No, there is no guaranteed order for the list of keys returned by the keys() function. In most cases, the key list is returned in the same order as the insertion, however, that behavior is NOT guaranteed and should not be depended on by your program.


Video Answer


1 Answers

This looks to be a bug. The implementation is to convert the dict_keys to a set, then call .difference_update(arg) on it.

It looks like they misused _PyObject_CallMethodId (an optimized variant of PyObject_CallMethod), by passing a format string of just "O". Thing is, PyObject_CallMethod and friends are documented to require a Py_BuildValue format string that "should produce a tuple". With more than one format code, it wraps the values in a tuple automatically, but with only one format code, it doesn't tuple, it just creates the value (in this case, because it's already PyObject*, all it does is increment the reference count).

While I haven't tracked down where it might be doing this, I suspect somewhere in the internals it's identifying CallMethod calls that don't produce a tuple and wrapping them to make a one element tuple so the called function can actually receive the arguments in the expected format. When subtracting a tuple, it's already a tuple, and this fix up code never activates; when passing a list, it does, becoming a one element tuple containing the list.

difference_update takes varargs (as if it were declared def difference_update(self, *args)). So when it receives the unwrapped tuple, it thinks it's supposed to subtract away the elements from each entry in the tuple, not treat said entries as values to subtract away themselves. To illustrate, when you do:

mydict.keys() - (1, 2)

the bug is causing it to do (roughly):

result = set(mydict)
# We've got a tuple to pass, so all's well...
result.difference_update(*(1, 2)) # Unpack behaves like difference_update(1, 2)
# OH NO!

While:

mydict.keys() - [1, 2]

does:

result = set(mydict)
# [1, 2] isn't a tuple, so wrap
result.difference_update(*([1, 2],)) # Behaves like difference_update([1, 2])
# All's well

That's why a tuple of str works (incorrectly), - ('abc', '123') is performing a call equivalent to:

result.difference_update(*('abc', '123'))
# or without unpacking:
result.difference_update('abc', '123')

and since strs are iterables of their characters, it just blithely removes entries for 'a', 'b', 'c', etc. instead of 'abc' and '123' like you expected.

Basically, this is a bug; it's filed against the CPython folks and fixed in 3.6.0 (as well as later releases of 2.7, 3.4, and 3.5).

The correct behavior probably should have been to call (assuming this Id variant exists for this API):

_PyObject_CallMethodObjArgsId(result, &PyId_difference_update, other, NULL);

which wouldn't have the packing issues at all, and would run faster to boot; the smallest change would be to change the format string to "(O)" to force tuple creation even for a single item, but since the format string gains nothing, _PyObject_CallMethodObjArgsId is better.

like image 64
ShadowRanger Avatar answered Sep 22 '22 11:09

ShadowRanger