Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: What's the fastest way to zip right to left, and is there no builtin for this?

Given 2 sequences of different lengths:

In [931]: a = [1,2,3]
In [932]: b = [4,5,6,7]

This is what i want

In [933]: c = zip(reversed(a),reversed(b))
In [934]: [x for x in reversed(c)]
Out[934]: [(1, 5), (2, 6), (3, 7)]

But I don't like the idea of using reversed on all my input parameters, and I also don't want to re-implement my own zip function.

So:

  1. Is there a faster/more efficient way to do this?
  2. Is there a simpler way to do this?
like image 557
Alex Gaudio Avatar asked Nov 04 '11 17:11

Alex Gaudio


People also ask

What can I use instead of zip in Python?

izip() : Make an iterator that aggregates elements from each of the iterables. Like zip() except that it returns an iterator instead of a list. Used for lock-step iteration over several iterables at a time.

What is zip (* data in Python?

Python's zip() function is defined as zip(*iterables) . The function takes in iterables as arguments and returns an iterator. This iterator generates a series of tuples containing elements from each iterable. zip() can accept any type of iterable, such as files, lists, tuples, dictionaries, sets, and so on.

Is zip faster than for loop Python?

When using write in both, there's no difference whatsoever. No, it's not faster. Only write seems to be faster than print . Your solution seemed to be about map instead of zip , not write instead of print .

What does * do in zip?

The "*" operator unpacks a list and applies it to a function. The zip function takes n lists and creates n-tuple pairs from each element from both lists: zip([iterable, ...]) This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.


4 Answers

The following is faster and clearer than the other posted solutions:

s = zip(reversed(a), reversed(b))
s.reverse()                       # in-place

The reversed built-in creates an iterator those loops backwards at the same speed as forward iteration. No additional memory (full copies of the input) are produced and there are no slow trips around the Python eval-loop (present in the other solutions that use indicies).

like image 144
Raymond Hettinger Avatar answered Oct 02 '22 12:10

Raymond Hettinger


I would suggest:

>>> a = [1,2,3]
>>> b = [4,5,6,7]
>>> k = min(len(a),len(b))
>>> zip(a[-k:], b[-k:])
[(1, 5), (2, 6), (3, 7)]
like image 43
sam hocevar Avatar answered Oct 03 '22 12:10

sam hocevar


The following will work even if you don't know which is longer:

>>> zip(a[-len(b):], b[-len(a):])
[(1, 5), (2, 6), (3, 7)]

Here is an example of what happens with the smaller list:

>>> range(5)[-10:]
[0, 1, 2, 3, 4]

So even though the slice looks kind of funky, if you have a negative start value that is larger than the length of the list, it will give you the entire list.

like image 27
Andrew Clark Avatar answered Oct 02 '22 12:10

Andrew Clark


In the following solution, no copy of any of the input lists is done.

def rzip(a,b):
    m = min(len(a),len(b))
    for i in xrange(m):
        yield a[-m+i],b[-m+i]


for a,b in (([1,2,3,4,5,6,7],[8,9,10]),
            ([1,2,3],[8,9,10]),
            ([1,2,3,4],[1,2,3,4,5,6,7,8,9,10])):
    print list(rzip(a,b))

m being the length of the shortest list, -m+0 is always transformed into index 0 for this shortest list,
while -m+0 is transfomed in an index >= 0 for the longest list

result:

[(5, 8), (6, 9), (7, 10)]
[(1, 8), (2, 9), (3, 10)]
[(1, 7), (2, 8), (3, 9), (4, 10)]

Despite the fact the function rzip() isn't a one-liner, I think it is simple and I bet it is the faster solution

.

Edit

At first, in the function rzip() above (and eyquem1 in the following code) , I thought that creating the list of zipped elements by runing from correct position in each list from left to right would be a good procedure too. In the following code, eyquem1bis is the function rzip() slightly improved by a pre-calculation of the starting positions.

But I was interested by the Raymond Hettinger's claim that the exclusive use of zip() and iterator reversed() is the clearer and faster solution. So I imagined that using other iterators from left to right would be better than my function rzip() and I used iter() and islice() to write functions eyquem2 and eyquem3 in the following code.

I compared the execution's times of the diverse solutions proposed in most of the answers.

import random
from time import clock

lima,limb = 20000 , 50000

a = list(range(lima))
b = list(range(limb))
random.shuffle(a)
random.shuffle(b)


A,B,C,D,E,F,G,H,I = [],[],[],[],[],[],[],[],[],

n = 10
for essays in range(100):

    te = clock()
    for rep in range(n):
        k = min(len(a),len(b)) 
        sam1 = list(zip(a[-k:], b[-k:]))
    tref= (clock()-te)/100
    A.append((clock()-te,
              'Sam Hocevar, deducted normal slicing and zip'
              '\nk = min(len(a),len(b))'
              '\nlist(zip(a[-k:], b[-k:]))'))


    te = clock()
    for rep in range(n):
        k = min(len(a),len(b)) 
        sam2 = list(zip(a[len(a)-k:], b[len(b)-k:]))
    tref= (clock()-te)/100
    B.append((clock()-te,
              'Sam Hocevar, exactly normal slicing and zip'
              '\nk = min(len(a),len(b))'
              '\nlist(zip(a[len(a)-k:], b[len(b)-k:]))'))


    te = clock()
    for rep in range(n):
        fj = list(zip(a[-len(b):], b[-len(a):]))
    C.append((clock()-te,
              'F.J. , deducted tricky slicing and zip'
              '\nlist(zip(a[-len(b):], b[-len(a):]))'))


    te = clock()
    for rep in range(n):
        m = min(len(a),len(b))
        sleep = list(zip(a[len(a)-m:],b[len(b)-m:]))
    D.append((clock()-te,
              'sleeplessnerd, exactly normal slicing and zip'
              '\nm = min(len(a),len(b))'
              '\nlist(zip(a[len(a)-m:],b[len(b)-m:]))'))


    te = clock()
    for rep in range(n):
        m = min(len(a),len(b))
        ey1 = [ (a[-m+i],b[-m+i]) for i in range(m) ]
    E.append((clock()-te,
              'eyquem 1, deducted normal slicing and listcomp'
              '\nm = min(len(a),len(b))'
              '\n[ (a[-m+i],b[-m+i]) for i in range(m) ]'))


    te = clock()
    for rep in range(n):
        m = min(len(a),len(b))
        x,y = len(a)-m , len(b)-m
        ey1bis = [ (a[x+i],b[y+i]) for i in range(m)]
    F.append((clock()-te,
              'eyquem 1 improved, exactly normal slicing and listcomp'
              '\nm = min(len(a),len(b)'
              '\nx,y = len(a)-m , len(b)-m'
              '\n[(a[x+i],b[y+i]) for i in range(m)]'))


    te = clock()
    for rep in range(n):
        ita = iter(a)
        itb = iter(b)
        if len(b)>len(a):
            for av in range(len(b)-len(a)):
                itb.__next__()
        else:
            for av in range(len(a)-len(b)):
                itb.__next__()
        ey2 = list(zip(ita,itb))
    G.append((clock()-te,
              'eyquem 2, use of zip and iterator iter'
              '\nlist(zip(ita,itb))'))


    from itertools import islice
    te = clock()
    for rep in range(n):
        if len(b)>len(a):
            ey3 = list(zip(iter(a) , islice(b,len(b)-len(a),None)))
        else:
            ey3 = list(zip(islice(a,len(a)-len(b),None),iter(b)))
    H.append((clock()-te,
              'eyquem 3, use of zip and iterators iter AND islice'
              '\nlist(zip(iter(a),islice(b,len(b)-len(a),None)))'))


    te = clock()
    for rep in range(n):
        ray = list(reversed(list(zip(reversed(a),reversed(b)))))
    I.append((clock()-te,
              'Raymond Hettinger, use of zip and iterator reversed'
              '\nlist(reversed(list(zip(reversed(a),reversed(b)))))'))


print( 'len(a) == %d\nlen(b) == %d\n' % (len(a),len(b)) )

tempi = [min(x) for x in (A,B,C,D,E,F,G,H,I)]
tempi.sort(reverse=True)
tref = tempi[0][0]/100
print( '\n\n'.join('%.2f %%     %s' % (t/tref,ch) for t,ch in tempi) )


print('\nsam1==sam2==fj==sleep==ey1==ey1bis==ey2==ey3==ray  is ',sam1==sam2==fj==sleep==ey1==ey1bis==ey2==ey3==ray)

A result with lists of very different lengths:

len(a) == 20000
len(b) == 50000

100.00 %     Sam Hocevar, exactly normal slicing and zip
k = min(len(a),len(b))
list(zip(a[len(a)-k:], b[len(b)-k:]))

99.80 %     Sam Hocevar, deducted normal slicing and zip
k = min(len(a),len(b))
list(zip(a[-k:], b[-k:]))

98.02 %     sleeplessnerd, exactly normal slicing and zip
m = min(len(a),len(b))
list(zip(a[len(a)-m:],b[len(b)-m:]))

97.98 %     F.J. , deducted tricky slicing and zip
list(zip(a[-len(b):], b[-len(a):]))

82.30 %     eyquem 2, use of zip and iterator iter
list(zip(ita,itb))

69.61 %     eyquem 1, deducted normal slicing and listcomp
m = min(len(a),len(b))
[ (a[-m+i],b[-m+i]) for i in range(m) ]

67.62 %     eyquem 1 improved, exactly normal slicing and listcomp
m = min(len(a),len(b)
x,y = len(a)-m , len(b)-m
[(a[x+i],b[y+i]) for i in range(m)]

61.23 %     eyquem 3, use of zip and iterators iter AND islice
list(zip(iter(a),islice(b,len(b)-len(a),None)))

60.92 %     Raymond Hettinger, use of zip and iterator reversed
list(reversed(list(zip(reversed(a),reversed(b)))))

sam1==sam2==fj==sleep==ey1==ey1bis==ey2==ey3==ray  is  True

Result with two lists of similar lengths:

len(a) == 49500
len(b) == 50000

100.00 %     Sam Hocevar, deducted normal slicing and zip
k = min(len(a),len(b))
list(zip(a[-k:], b[-k:]))

99.39 %     F.J. , deducted tricky slicing and zip
list(zip(a[-len(b):], b[-len(a):]))

99.12 %     sleeplessnerd, exactly normal slicing and zip
m = min(len(a),len(b))
list(zip(a[len(a)-m:],b[len(b)-m:]))

98.10 %     Sam Hocevar, exactly normal slicing and zip
k = min(len(a),len(b))
list(zip(a[len(a)-k:], b[len(b)-k:]))

69.91 %     eyquem 1, deducted normal slicing and listcomp
m = min(len(a),len(b))
[ (a[-m+i],b[-m+i]) for i in range(m) ]

66.54 %     eyquem 1 improved, exactly normal slicing and listcomp
m = min(len(a),len(b)
x,y = len(a)-m , len(b)-m
[(a[x+i],b[y+i]) for i in range(m)]

58.94 %     Raymond Hettinger, use of zip and iterator reversed
list(reversed(list(zip(reversed(a),reversed(b)))))

51.29 %     eyquem 2, use of zip and iterator iter
list(zip(ita,itb))

51.17 %     eyquem 3, use of zip and iterators iter AND islice
list(zip(iter(a),islice(b,len(b)-len(a),None)))

sam1==sam2==fj==sleep==ey1==ey1bis==ey2==ey3==ray  is  True

I conclude that there are 3 groups of solutions:

  • the solutions of Sam Hocevar, FJ and sleeplessnerd have the longest executions.
    They use different combinations of slicing and function zip().
    By the way the overwhelmingly approved Sam's solution is the longest.

  • my function rzip() and rzip()-improved execute in 68 % of the time taken by the ones of the above group.
    They use only a list comprehension of elements obtained by their indices

  • my solutions eyquem2 and eyquem3 , and the Raymond Hettinger's solution, take less than 61 % of the said preceding times.
    They use iterators.

The execution's times of my solutions depend on the relative lengths of the list:
for lists of similar lengths, solutions eyquem2 and eyquem3 have execution's times divided by 2 !
But for very different lists , eyquem2 surprisingly takes as much as 82 % of the reference time.

Constant execution's time around 60 % of reference time of Raymond Hettinger's solution confirms his claim.
My solution eyquem3 with iter() and islice() is not so bad too, but it isn't so clear than Raymond Hettinger's one, though this latter is a little obscure too at first glance.

like image 29
eyquem Avatar answered Sep 30 '22 12:09

eyquem