I have the following input:
input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
and trying to have the following output:
outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]
outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}
Any tips on how to handle given with scalability in mind (var input can get really large).
You probably want something like:
import collections
import itertools
def build_catalog(L):
counter = itertools.count().next
names = collections.defaultdict(counter)
result = []
for t in L:
new_t = [ names[item] for item in t ]
result.append(new_t)
catalog = dict((name, idx) for idx, name in names.iteritems())
return result, catalog
Using it:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
This class will automatically map objects to increasing integer values:
class AutoMapping(object):
def __init__(self):
self.map = {}
self.objects = []
def __getitem__(self, val):
if val not in self.map:
self.map[val] = len(self.objects)
self.objects.append(val)
return self.map[val]
Example usage, for your input:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With