Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient list mapping in python

I have the following input:

input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]

and trying to have the following output:

outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]

outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}

Any tips on how to handle given with scalability in mind (var input can get really large).

like image 739
Joey Avatar asked Feb 27 '23 03:02

Joey


2 Answers

You probably want something like:

import collections
import itertools

def build_catalog(L):
    counter = itertools.count().next
    names = collections.defaultdict(counter)
    result = []
    for t in L:
        new_t = [ names[item] for item in t ]
        result.append(new_t)
    catalog = dict((name, idx) for idx, name in names.iteritems())
    return result, catalog

Using it:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
like image 61
Thomas Wouters Avatar answered Mar 03 '23 08:03

Thomas Wouters


This class will automatically map objects to increasing integer values:

class AutoMapping(object):
    def __init__(self):
        self.map = {}
        self.objects = []

    def __getitem__(self, val):
        if val not in self.map:
            self.map[val] = len(self.objects)
            self.objects.append(val)
        return self.map[val]

Example usage, for your input:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
like image 45
interjay Avatar answered Mar 03 '23 09:03

interjay