Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast algorithm to calculate delta of two list

I have two list of album names, ordered by some score.

albums_today = ['album1', 'album2', 'album3']
albums_yesterday = ['album2', 'album1', 'album3']

How can I calculate the change of list order and get something like

{'album1':1, 'album2':-1, 'album3':0}
like image 235
satoru Avatar asked Nov 23 '10 00:11

satoru


3 Answers

>>> albums_today = ['album1', 'album2', 'album3']
>>> albums_yesterday = ['album2', 'album1', 'album3']
>>> D = dict((k,v) for v,k in enumerate(albums_yesterday))
>>> dict((k,D[k]-v) for v,k in enumerate(albums_today))
{'album1': 1, 'album3': 0, 'album2': -1}

In Python2.7 or Python3 it can be written even more simply

>>> albums_today = ['album1', 'album2', 'album3']
>>> albums_yesterday = ['album2', 'album1', 'album3']
>>> D = {k:v for v,k in enumerate(albums_yesterday)}
>>> {k:D[k]-v for v,k in enumerate(albums_today)}
{'album1': 1, 'album3': 0, 'album2': -1}
like image 133
John La Rooy Avatar answered Nov 10 '22 09:11

John La Rooy


you could also use the same algorithm as i wrote above and just use a single hashmap.

def findDelta1(today,yesterday):
 results = {}
 ypos = 0
 for i,title in enumerate(today):
      if title in results:
           results[title] = results[title] - i
      else:
           for ypos in xrange(ypos,len(yesterday)):
                if yesterday[ypos] == title:
                     results[title] = ypos - i
                     ypos = ypos + 1
                     break
                else:
                     results[yesterday[ypos]] = ypos
 return results

still O(N), potentially faster and less RAM than my version above.

like image 3
Tyson Avatar answered Nov 10 '22 09:11

Tyson


how about this:

def delta(a, b):
    rank_a = dict((k, v) for v, k in enumerate(a))
    rank_b = enumerate(b)
    return dict((k, rank_a[k]-i) for i, k in rank_b)

which only creates a single dict to look things up into.

Well, as long as every entry in both lists are present exactly once each, then we know that once we look a key up in the rank_a collection, we don't need it anymore. We can delete it. Also, to save space, we don't have to populate that collection until the moment a particular key is needed.

class LookupOnce:
    def __init__(self, seq):
        self.cache = {}
        self.seq = iter(seq)
    def get(self, key):
        if key in self.cache:
            value = self.cache[key]
            del self.cache[key]
            return value
        for v,k in self.seq:
            if k == key:
                return v
            self.cache[k] = v
        raise KeyError


def delta(a, b):
    rank_a = LookupOnce(enumerate(a))
    rank_b = enumerate(b)
    result = {}
    for i, k in rank_b:
        result[k] = i - rank_a.get(k)
    return result
like image 2
SingleNegationElimination Avatar answered Nov 10 '22 09:11

SingleNegationElimination