Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Smart way to delete tuples

Tags:

python

tuples

I having a list of tuple as describes below (This tuple is sorted in decreasing order of the second value):

from string import ascii_letters
myTup = zip (ascii_letters, range(10)[::-1])
threshold = 5.5

>>> myTup
[('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), \
('i', 1), ('j', 0)]

Given a threshold, what is the best possible way to discard all tuples having the second value less than this threshold.

I am having more than 5 million tuples and thus don't want to perform comparison tuple by tuple basis and consequently delete or add to another list of tuples.

like image 665
Curious Avatar asked Dec 26 '22 17:12

Curious


2 Answers

Here's an exotic approach that wraps the list in a list-like object before performing bisect.

import bisect

def revkey(items):
    class Items:
        def __getitem__(self, index):
            assert 0 <= index < _len
            return items[_max-index][1]
        def __len__(self):
            return _len
        def bisect(self, value):
            return _len - bisect.bisect_left(self, value)
    _len = len(items)
    _max = _len-1
    return Items()

tuples = [('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), ('i', 1), ('j', 0)]

for x in range(-2, 12):
    assert len(tuples) == 10
    t = tuples[:]
    stop = revkey(t).bisect(x)
    del t[stop:]
    assert t == [item for item in tuples if item[1] >= x]
like image 23
Peter Otten Avatar answered Dec 29 '22 07:12

Peter Otten


Since the tuples are sorted, you can simply search for the first tuple with a value lower than the threshold, and then delete the remaining values using slice notation:

index = next(i for i, (t1, t2) in enumerate(myTup) if t2 < threshold)
del myTup[index:]

As Vaughn Cato points out, a binary search would speed things up even more. bisect.bisect would be useful, except that it won't work with your current data structure unless you create a separate key sequence, as documented here. But that violates your prohibition on creating new lists.

Still, you could use the source code as the basis for your own binary search. Or, you could change your data structure:

>>> myTup
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), 
 (6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
>>> index = bisect.bisect(myTup, (threshold, None))
>>> del myTup[:index]
>>> myTup
[(6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]

The disadvantage here is that the deletion may occur in linear time, since Python will have to shift the entire block of memory back... unless Python is smart about deleting slices that start from 0. (Anyone know?)

Finally, if you're really willing to change your data structure, you could do this:

[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd'), (-5, 'e'), (-4, 'f'), 
 (-3, 'g'), (-2, 'h'), (-1, 'i'), (0, 'j')]
>>> index = bisect.bisect(myTup, (-threshold, None))
>>> del myTup[index:]
>>> myTup
[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd')]

(Note that Python 3 will complain about the None comparison, so you could use something like (-threshold, chr(0)) instead.)

My suspicion is that the linear time search I suggested at the beginning is acceptable in most circumstances.

like image 178
senderle Avatar answered Dec 29 '22 09:12

senderle