Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Duplicates From Dictionary

I have the following Python 2.7 dictionary data structure (I do not control source data - comes from another system as is):

 {112762853378:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112762853385:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112760496444:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4']    },  112760496502:     {'dst': ['10.122.195.34'],      'src': ['4.3.2.1']    },  112765083670: ... } 

The dictionary keys will always be unique. Dst, src, and alias can be duplicates. All records will always have a dst and src but not every record will necessarily have an alias as seen in the third record.

In the sample data either of the first two records would be removed (doesn't matter to me which one). The third record would be considered unique since although dst and src are the same it is missing alias.

My goal is to remove all records where the dst, src, and alias have all been duplicated - regardless of the key.

How does this rookie accomplish this?

Also, my limited understanding of Python interprets the data structure as a dictionary with the values stored in dictionaries... a dict of dicts, is this correct?

like image 632
Bit Bucket Avatar asked Jan 05 '12 20:01

Bit Bucket


People also ask

How do I remove duplicates from a dictionary list?

The strategy is to convert the list of dictionaries to a list of tuples where the tuples contain the items of the dictionary. Since the tuples can be hashed, you can remove duplicates using set (using a set comprehension here, older python alternative would be set(tuple(d.

Do dictionaries remove duplicates?

dict. fromkeys() is a built-in function that generates a dictionary from the keys you have specified. Because dictionaries cannot include duplicate keys, the function will remove any duplicate values from our list.

How do I remove duplicate keys?

Start Duplicate Remover by clicking its icon on the Ablebits Data tab. Select the range where you want to remove duplicate entries. Tick the Duplicates option.


2 Answers

You could go though each of the items (the key value pair) in the dictionary and add them into a result dictionary if the value was not already in the result dictionary.

input_raw = {112762853378:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112762853385:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112760496444:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4']    },  112760496502:     {'dst': ['10.122.195.34'],      'src': ['4.3.2.1']    } }  result = {}  for key,value in input_raw.items():     if value not in result.values():         result[key] = value  print result 
like image 108
Andrew Cox Avatar answered Oct 02 '22 14:10

Andrew Cox


One simple approach would be to create a reverse dictionary using the concatenation of the string data in each inner dictionary as a key. So say you have the above data in a dictionary, d:

>>> import collections >>> reverse_d = collections.defaultdict(list) >>> for key, inner_d in d.iteritems(): ...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ...     reverse_d[key_str].append(key) ...  >>> duplicates = [keys for key_str, keys in reverse_d.iteritems() if len(keys) > 1] >>> duplicates [[112762853385, 112762853378]] 

If you don't want a list of duplicates or anything like that, but just want to create a duplicate-less dict, you could just use a regular dictionary instead of a defaultdict and re-reverse it like so:

>>> for key, inner_d in d.iteritems(): ...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ...     reverse_d[key_str] = key >>> new_d = dict((val, d[val]) for val in reverse_d.itervalues()) 
like image 40
senderle Avatar answered Oct 02 '22 13:10

senderle