I have the following Python 2.7 dictionary data structure (I do not control source data - comes from another system as is):
{112762853378: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112762853385: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112760496444: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'] }, 112760496502: {'dst': ['10.122.195.34'], 'src': ['4.3.2.1'] }, 112765083670: ... }
The dictionary keys will always be unique. Dst, src, and alias can be duplicates. All records will always have a dst and src but not every record will necessarily have an alias as seen in the third record.
In the sample data either of the first two records would be removed (doesn't matter to me which one). The third record would be considered unique since although dst and src are the same it is missing alias.
My goal is to remove all records where the dst, src, and alias have all been duplicated - regardless of the key.
How does this rookie accomplish this?
Also, my limited understanding of Python interprets the data structure as a dictionary with the values stored in dictionaries... a dict of dicts, is this correct?
The strategy is to convert the list of dictionaries to a list of tuples where the tuples contain the items of the dictionary. Since the tuples can be hashed, you can remove duplicates using set (using a set comprehension here, older python alternative would be set(tuple(d.
dict. fromkeys() is a built-in function that generates a dictionary from the keys you have specified. Because dictionaries cannot include duplicate keys, the function will remove any duplicate values from our list.
Start Duplicate Remover by clicking its icon on the Ablebits Data tab. Select the range where you want to remove duplicate entries. Tick the Duplicates option.
You could go though each of the items (the key value pair) in the dictionary and add them into a result dictionary if the value was not already in the result dictionary.
input_raw = {112762853378: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112762853385: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112760496444: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'] }, 112760496502: {'dst': ['10.122.195.34'], 'src': ['4.3.2.1'] } } result = {} for key,value in input_raw.items(): if value not in result.values(): result[key] = value print result
One simple approach would be to create a reverse dictionary using the concatenation of the string data in each inner dictionary as a key. So say you have the above data in a dictionary, d
:
>>> import collections >>> reverse_d = collections.defaultdict(list) >>> for key, inner_d in d.iteritems(): ... key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ... reverse_d[key_str].append(key) ... >>> duplicates = [keys for key_str, keys in reverse_d.iteritems() if len(keys) > 1] >>> duplicates [[112762853385, 112762853378]]
If you don't want a list of duplicates or anything like that, but just want to create a duplicate-less dict, you could just use a regular dictionary instead of a defaultdict
and re-reverse it like so:
>>> for key, inner_d in d.iteritems(): ... key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ... reverse_d[key_str] = key >>> new_d = dict((val, d[val]) for val in reverse_d.itervalues())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With