Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zen of Python: Errors should never pass silently. Why does zip work the way it does?

Tags:

python

I use python's function zip a lot in my code (mostly to create dicts like below)

dict(zip(list_a, list_b)) 

I find it really useful, but sometimes it frustrates me because I end up with a situation where list_a is a different length to list_b. zip just goes ahead and zips together the two lists until it achieves a zipped list that is the same length as the shorter list, ignoring the rest of the longer list. This seems like it should be treated as an error in most circumstances, which according to the zen of python should never pass silently.

Given that this is such an integral function, I'm curious as to why it's been designed this way? Why isn't it treated as an error if you try to zip together two lists of different lengths?

like image 651
chris Avatar asked Sep 22 '16 00:09

chris


People also ask

How does zip work in Python?

Python's zip() function is defined as zip(*iterables) . The function takes in iterables as arguments and returns an iterator. This iterator generates a series of tuples containing elements from each iterable. zip() can accept any type of iterable, such as files, lists, tuples, dictionaries, sets, and so on.

How does zip and unzip work in Python?

We create a ZipFile object in READ mode and name it as zip. printdir() method prints a table of contents for the archive. extractall() method will extract all the contents of the zip file to the current working directory. You can also call extract() method to extract any file by specifying its path in the zip file.

Is zip a generator Python?

The zip() function is not a generator function, it just returns an iterators.

Can you zip strings Python?

Python has a zipfile module which allows you to read/write zip archives. The zipfile. ZipFile class has a writestr() method that can create a "file" in the archive directly from a string. So no, you don't have to write your string to a file before archiving it.


2 Answers

With python 3.10 zip() gets a new, optional strict flag. When it is set and lists of unequal length are encountered, it will raise a ValueError. This is detailed in PEP 618, and mentioned in the changelog of 3.10

like image 109
L_W Avatar answered Nov 02 '22 04:11

L_W


Reason 1: Historical Reason

zip allows unequal-length arguments because it was meant to improve upon map by allowing unequal-length arguments. This behavior is the reason zip exists at all.

Here's how you did zip before it existed:

>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> for i in map(None, a, b): print i
...
(1, 4)
(2, 5)
(3, 6)
>>> map(None, a, b)
[(1, 4), (2, 5), (3, 6)]

This is terribly unintuitive, and does not support unequal-length lists. This was a major design concern, which you can see plain-as-day in the official RFC proposing zip for the first time:

While the map() idiom is a common one in Python, it has several disadvantages:

  • It is non-obvious to programmers without a functional programming background.

  • The use of the magic None first argument is non-obvious.

  • It has arbitrary, often unintended, and inflexible semantics when the lists are not of the same length - the shorter sequences are padded with None :

    >>> c = (4, 5, 6, 7)

    >>> map(None, a, c)

    [(1, 4), (2, 5), (3, 6), (None, 7)]

So, no, this behaviour would not be treated as an error - it is why it was designed in the first place.


Reason 2: Practical Reason

Because it is pretty useful, is clearly specified and doesn't have to be thought of as an error at all.

By allowing unequal lengths, zip only requires that its arguments conform to the iterator protocol. This allows zip to be extended to generators, tuples, dictionary keys and literally anything in the world that implements __next__() and __iter__(), precisely because it doesn't inquire about length.

This is significant, because generators do not support len() and thus there is no way to check the length beforehand. Add a check for length, and you break zips ability to work on generators, when it should. That's a fairly serious disadvantage, wouldn't you agree?


Reason 3: By Fiat

Guido van Rossum wanted it this way:

Optional padding. An earlier version of this PEP proposed an optional pad keyword argument, which would be used when the argument sequences were not the same length. This is similar behavior to the map(None, ...) semantics except that the user would be able to specify pad object. This has been rejected by the BDFL in favor of always truncating to the shortest sequence, because of the KISS principle. If there's a true need, it is easier to add later. If it is not needed, it would still be impossible to delete it in the future.

KISS trumps everything.

like image 29
Akshat Mahajan Avatar answered Nov 02 '22 05:11

Akshat Mahajan