To illustrate what I mean by this, here is an example
messages = [
('Ricky', 'Steve', 'SMS'),
('Steve', 'Karl', 'SMS'),
('Karl', 'Nora', 'Email')
]
I want to convert this list and a definition of groups to a list of integers and a lookup dictionary so that each element in the group gets a unique id. That id should map to the element in the lookup table like this
messages_int, lookup_table = create_lookup_list(
messages, ('person', 'person', 'medium'))
print messages_int
[ (0, 1, 0),
(1, 2, 0),
(2, 3, 1) ]
print lookup_table
{ 'person': ['Ricky', 'Steve', 'Karl', 'Nora'],
'medium': ['SMS', 'Email']
}
I wonder if there is an elegant and pythonic solution to this problem.
I am also open to better terminology than create_lookup_list
etc
defaultdict
combined with the itertools.count().next
method is a good way to assign identifiers to unique items. Here's an example of how to apply this in your case:
from itertools import count
from collections import defaultdict
def create_lookup_list(data, domains):
domain_keys = defaultdict(lambda:defaultdict(count().next))
out = []
for row in data:
out.append(tuple(domain_keys[dom][val] for val, dom in zip(row, domains)))
lookup_table = dict((k, sorted(d, key=d.get)) for k, d in domain_keys.items())
return out, lookup_table
Edit: note that count().next
becomes count().__next__
or lambda: next(count())
in Python 3.
Mine's about the same length and complexity:
import collections
def create_lookup_list(messages, labels):
# Collect all the values
lookup = collections.defaultdict(set)
for msg in messages:
for l, v in zip(labels, msg):
lookup[l].add(v)
# Make the value sets lists
for k, v in lookup.items():
lookup[k] = list(v)
# Make the lookup_list
lookup_list = []
for msg in messages:
lookup_list.append([lookup[l].index(v) for l, v in zip(labels, msg)])
return lookup_list, lookup
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With