Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset of dictionary keys

I've got a python dictionary of the form {'ip1:port1' : <value>, 'ip1:port2' : <value>, 'ip2:port1' : <value>, ...}. Dictionary keys are strings, consisting of ip:port pairs. Values are not important for this task.

I need a list of ip:port combinations with unique IP addresses, ports can be any of those that appear among original keys. For example above, two variants are acceptable: ['ip1:port1', ip2:port1'] and ['ip1:port2', ip2:port1'].

What is the most pythonic way for doing it?

Currently my solution is

def get_uniq_worker_ips(workers):
    wip = set(w.split(':')[0] for w in workers.iterkeys())
    return [[worker for worker in workers.iterkeys() if worker.startswith(w)][0] for w in wip]

I don't like it, because it creates additional lists and then discards them.

like image 524
wl2776 Avatar asked Dec 19 '22 15:12

wl2776


1 Answers

You can use itertools.groupby to group by same IP addresses:

data = {'ip1:port1' : "value1", 'ip1:port2' : "value2", 'ip2:port1' : "value3", 'ip2:port2': "value4"}
by_ip = {k: list(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0])}
by_ip
# {'ip1': ['ip1:port1', 'ip1:port2'], 'ip2': ['ip2:port1', 'ip2:port2']}

Then just pick any one from the different groups of IPs.

{v[0]: data[v[0]] for v in by_ip.values()}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

Or shorter, making a generator expression for just the first key from the groups:

one_by_ip = (next(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0]))
{key: data[key] for key in one_by_ip}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

However, note that groupby requires the input data to be sorted. So if you want to avoid sorting all the keys in the dict, you should instead just use a set of already seen keys.

seen = set()
not_seen = lambda x: not(x in seen or seen.add(x))
{key: data[key] for key in data if not_seen(key.split(":")[0])}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

This is similar to your solution, but instead of looping the unique keys and finding a matching key in the dict for each, you loop the keys and check whether you've already seen the IP.

like image 117
tobias_k Avatar answered Dec 21 '22 11:12

tobias_k