I have the following Python 2.7 dictionary data structure (I do not control source data - comes from another system as is): <pre class="prettyprint"> {112762853378: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112762853385: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'], 'alias': ['www.example.com'] }, 112760496444: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4'] }, 112760496502: {'dst': ['10.122.195.34'], 'src': ['4.3.2.1'] }, 112765083670: ... } </pre> The dictionary keys will always be unique. Dst, src, and alias can be duplicates. All records will always have a dst and src but not every record will necessarily have an alias as seen in the third record. In the sample data either of the first two records would be removed (doesn't matter to me which one). The third record would be considered unique since although dst and src are the same it is missing alias. My goal is to remove all records where the dst, src, and alias have all been duplicated - regardless of the key. How does this rookie accomplish this? Also, my limited understanding of Python interprets the data structure as a dictionary with the values stored in dictionaries... a dict of dicts, is this correct?

One simple approach would be to create a reverse dictionary using the concatenation of the string data in each inner dictionary as a key. So say you have the above data in a dictionary, <code>d</code>: <pre class="prettyprint"><code>>>> import collections >>> reverse_d = collections.defaultdict(list) >>> for key, inner_d in d.iteritems(): ... key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ... reverse_d[key_str].append(key) ... >>> duplicates = [keys for key_str, keys in reverse_d.iteritems() if len(keys) > 1] >>> duplicates [[112762853385, 112762853378]] </code></pre> If you don't want a list of duplicates or anything like that, but just want to create a duplicate-less dict, you could just use a regular dictionary instead of a <code>defaultdict</code> and re-reverse it like so: <pre class="prettyprint"><code>>>> for key, inner_d in d.iteritems(): ... key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ... reverse_d[key_str] = key >>> new_d = dict((val, d[val]) for val in reverse_d.itervalues()) </code></pre>

Removing Duplicates From Dictionary

Tags:

python

dictionary

duplicates

I have the following Python 2.7 dictionary data structure (I do not control source data - comes from another system as is):

 {112762853378:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112762853385:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112760496444:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4']    },  112760496502:     {'dst': ['10.122.195.34'],      'src': ['4.3.2.1']    },  112765083670: ... }

The dictionary keys will always be unique. Dst, src, and alias can be duplicates. All records will always have a dst and src but not every record will necessarily have an alias as seen in the third record.

In the sample data either of the first two records would be removed (doesn't matter to me which one). The third record would be considered unique since although dst and src are the same it is missing alias.

My goal is to remove all records where the dst, src, and alias have all been duplicated - regardless of the key.

How does this rookie accomplish this?

Also, my limited understanding of Python interprets the data structure as a dictionary with the values stored in dictionaries... a dict of dicts, is this correct?

632

asked Jan 05 '12 20:01

Bit Bucket

2 Answers

You could go though each of the items (the key value pair) in the dictionary and add them into a result dictionary if the value was not already in the result dictionary.

input_raw = {112762853378:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112762853385:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4'],      'alias': ['www.example.com']    },  112760496444:     {'dst': ['10.121.4.136'],      'src': ['1.2.3.4']    },  112760496502:     {'dst': ['10.122.195.34'],      'src': ['4.3.2.1']    } }  result = {}  for key,value in input_raw.items():     if value not in result.values():         result[key] = value  print result

108

answered Oct 02 '22 14:10

Andrew Cox

One simple approach would be to create a reverse dictionary using the concatenation of the string data in each inner dictionary as a key. So say you have the above data in a dictionary, d:

>>> import collections >>> reverse_d = collections.defaultdict(list) >>> for key, inner_d in d.iteritems(): ...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ...     reverse_d[key_str].append(key) ...  >>> duplicates = [keys for key_str, keys in reverse_d.iteritems() if len(keys) > 1] >>> duplicates [[112762853385, 112762853378]]

If you don't want a list of duplicates or anything like that, but just want to create a duplicate-less dict, you could just use a regular dictionary instead of a defaultdict and re-reverse it like so:

>>> for key, inner_d in d.iteritems(): ...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d) ...     reverse_d[key_str] = key >>> new_d = dict((val, d[val]) for val in reverse_d.itervalues())

answered Oct 02 '22 13:10

senderle

Related questions
                            
                                Unused variable naming in python [duplicate]
                            
                                Using Cython To Link Python To A Shared Library
                            
                                How do I strftime a date object in a different locale? [duplicate]
                            
                                Is there any way to use gcloud with python3?
                            
                                Is it possible to unpack a tuple in Python without creating unwanted variables?
                            
                                Easiest way to develop simple GUI in Python [closed]
                            
                                Python - Cleanest way to override __init__ where an optional kwarg must be used after the super() call?
                            
                                Python pandas empty correlation matrix
                            
                                What is a "scalar" in numpy?
                            
                                Python os.walk + follow symlinks
                            
                                Implement list-like index access in Python
                            
                                Trying to use open(filename, 'w' ) gives IOError: [Errno 2] No such file or directory if directory doesn't exist
                            
                                Inserting a degree symbol into python plot
                            
                                Python: Exporting environment variables in subprocess.Popen(..)
                            
                                How to run all PyTest assertions even if some of them fail?
                            
                                Getting data from ctypes array into numpy
                            
                                Setupterm could not find terminal, in Python program using curses
                            
                                Hyperparameter optimization for Pytorch model [closed]
                            
                                How to solve the "Mastermind" guessing game?
                            
                                How to make Django QuerySet bulk delete() more efficient

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With