I have come across a code like this:
from random import randint
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
points = [Point(randint(1, 10), randint(1, 10)) for _ in range(10)]
xs = [point.x for point in points]
ys = [point.y for point in points]
And I think this code is not Pythonic because it repeats itself. If another dimension is added to Point
class, a whole new loop needs to be written like:
zs = [point.z for point in points]
So I tried to make it more Pythonic by writing something like this:
xs, ys = zip(*[(point.x, point.y) for point in p])
If a new dimension is added, no problem:
xs, ys, zs = zip(*[(point.x, point.y, point.z) for point in p])
But this is almost 10 times slower than the other solution when there are millions of points, although it has only one loop. I think it is because *
operator needs to unpack millions of arguments to the zip
function which is horrible. So my question is:
Is there a way to change the code above so that it is as fast as before and Pythonic (without using 3rd party libraries)?
Python's zip() function is defined as zip(*iterables) . The function takes in iterables as arguments and returns an iterator. This iterator generates a series of tuples containing elements from each iterable. zip() can accept any type of iterable, such as files, lists, tuples, dictionaries, sets, and so on.
Basically, it passes the contents of the lists as arguments.
Zip is a useful in-built function provided by Python to loop through multiple iterables. Zip takes any number of iterables and returns a zip object which is an iterator of tuples. Each ith tuple inside the zip object has ith element from each iterable. We saw multiple examples of using the zip function.
The zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.
When using zip (), it is important to pay attention to the length iterables. It’s possible that the iterables we pass in as arguments aren’t the same length. In these cases, the number of elements that zip () puts out will be equal to the length of the shortest iterable.
Here, you call the Python zip () function with three iterables, so the resulting tuples have three elements each. When you’re working with the Python zip () function, it’s important to pay attention to the length of your iterables. It’s possible that the iterables you pass in as arguments aren’t the same length.
Since 5 is the length of the first (and shortest) range () object, zip () outputs a list of five tuples. There are still 95 unmatched elements from the second range () object. These are all ignored by zip () since there are no more elements from the first range () object to complete the pairs.
Let's now see how Python's zip () function can help us iterate through multiple lists in parallel. Read ahead to find out. Let's start by looking up the documentation for zip () and parse it in the subsequent sections. Syntax: zip (*iterables) – the zip () function takes in one or more iterables as arguments.
I just tested several ways of zipping Point
coordinates and looked for their performance with increasing number of points.
Below are the functions I used to test:
def hardcode(points):
# a hand crafted comprehension for each coordinate
return [point.x for point in points], [point.y for point in points]
def using_zip(points):
# using the "problematic" qip function
return zip(*((point.x, point.y) for point in points))
def loop_and_comprehension(points):
# making comprehension from a list of coordinate names
zipped = []
for coordinate in ('x', 'y'):
zipped.append([getattr(point, coordinate) for point in points])
return zipped
def nested_comprehension(points):
# making comprehension from a list of coordinate names using nested
# comprehensions
return [
[getattr(point, coordinate) for point in points]
for coordinate in ('x', 'y')
]
Using timeit I timed execution of each function with different number of points and here are the results:
comparing processing times using 10 points and 10000000 iterations
hardcode................. 14.12024447 [+0%]
using_zip................ 16.84289724 [+19%]
loop_and_comprehension... 30.83631476 [+118%]
nested_comprehension..... 30.45758349 [+116%]
comparing processing times using 100 points and 1000000 iterations
hardcode................. 9.30594717 [+0%]
using_zip................ 13.74953714 [+48%]
loop_and_comprehension... 19.46766583 [+109%]
nested_comprehension..... 19.27818860 [+107%]
comparing processing times using 1000 points and 100000 iterations
hardcode................. 7.90372457 [+0%]
using_zip................ 12.51523594 [+58%]
loop_and_comprehension... 18.25679913 [+131%]
nested_comprehension..... 18.64352790 [+136%]
comparing processing times using 10000 points and 10000 iterations
hardcode................. 8.27348382 [+0%]
using_zip................ 18.23079485 [+120%]
loop_and_comprehension... 18.00183383 [+118%]
nested_comprehension..... 17.96230063 [+117%]
comparing processing times using 100000 points and 1000 iterations
hardcode................. 9.15848662 [+0%]
using_zip................ 22.70730675 [+148%]
loop_and_comprehension... 17.81126971 [+94%]
nested_comprehension..... 17.86892597 [+95%]
comparing processing times using 1000000 points and 100 iterations
hardcode................. 9.75002857 [+0%]
using_zip................ 23.13891725 [+137%]
loop_and_comprehension... 18.08724660 [+86%]
nested_comprehension..... 18.01269820 [+85%]
comparing processing times using 10000000 points and 10 iterations
hardcode................. 9.96045920 [+0%]
using_zip................ 23.11653558 [+132%]
loop_and_comprehension... 17.98296033 [+81%]
nested_comprehension..... 18.17317708 [+82%]
comparing processing times using 100000000 points and 1 iterations
hardcode................. 64.58698246 [+0%]
using_zip................ 92.53437881 [+43%]
loop_and_comprehension... 73.62493845 [+14%]
nested_comprehension..... 62.99444739 [-2%]
We can see that the gap between the "harcoded" solution and the solutions with comprehensions built with gettattr
seems to constantly reduce as the number of points grows.
So, for a very big number of points it could be a good idea to use generated comprehensions from a list of coordinates:
[[getattr(point, coordinate) for point in points]
for coordinate in ('x', 'y')]
However, for a small number of points this is the worst solution (from the ones I tested at least).
For information, here is the code I used to run this benchmark:
import timeit
...
def compare(nb_points, nb_iterations):
reference = None
points = [Point(randint(1, 100), randint(1, 100))
for _ in range(nb_points)]
print("comparing processing times using {} points and {} iterations"
.format(nb_points, nb_iterations))
for func in (hardcode, using_zip, loop_and_comprehension, nested_comprehension):
duration = timeit.timeit(lambda: func(points), number=nb_iterations)
print('{:.<25} {:0=2.8f} [{:0>+.0%}]'
.format(func.__name__, duration,
0 if reference is None else (duration / reference - 1)))
if reference is None:
reference = duration
print("-" * 80)
compare(10, 10000000)
compare(100, 1000000)
compare(1000, 100000)
compare(10000, 10000)
compare(100000, 1000)
compare(1000000, 100)
compare(10000000, 10)
compare(100000000, 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With