I was looking for a solution to finding the symmetric difference between two dictionaries in Python.
For example, if I have two dictionaries A and B, and I want to create a third dictionary C which contains all the items from A and B that are not found in the other, or in other words, unique.
I could not find a canonical answer, so I decided to open this question and give it my own answer. If you think you have a better method, I would love to see it.
Some data:
a = {'a': 1, 'b':2}
b = {'b': 2, 'c':3}
Desired output:
{'a': 1, 'c': 3}
To get the symmetric difference between two dictionaries use the following robust function:
def dict_symmetric_difference(a, b):
return {k: a[k] if k in a else b[k] for k in # break here to fit without scrolling
set(a.keys()).symmetric_difference(b.keys())}
Just the logic:
{k: a[k] if k in a else b[k] for k in set(a.keys()).symmetric_difference(b.keys())}
Here is a simpler version of the function for explanation:
def dict_symmetric_difference(a, b):
# first make sets of the dictionary keys
keys_in_a = set(a.keys())
keys_in_b = set(b.keys())
unique_keys = keys_in_a.symmetric_difference(keys_in_b) # get the unique keys
c = {} # start an empty dictionary
for key in unique_keys: # iterate over the keys
if key in a: # if the key is from a dictionary, take the value from there.
c[key] = a[key]
else: # the key is in b dictionary, take the value from there.
c[key] = b[key]
return c
Explanation of the a[k] if k in a else b[k]
expression:
It is a ternary operator which allows me to use it like so: a if condition else b
With this trick, I get the value for the key, no matter which dictionary it is in.
Using either Function:
>>> dict_symmetric_difference({'a': 1, 'b':2}, {'b':2, 'c':3})
{'a': 1, 'c': 3}
Here is some code that does timeit speed tests on the various algorithms.
The tests use pairs of dicts of equal sizes. The keys are short random letter strings, with varying proportions of shared keys between the dicts. The dicts are constructed from shuffled lists, so even if they contain lots of shared keys the underlying hash table structure of the two dicts should be rather different.
The exact amount of shared keys is random, the proportion of shared keys is controlled by the shared
arg of make_dicts
.
The main body of this code will run on Python 2.6+ and Python 3. I have Python 2.6.6 and Python 3.6.0 installed on this machine (which is a single core 32 bit machine with 2GB of RAM running on an old Debian derivative of Linux). Some of the dictionary symmetric difference functions use dictionary comprehensions, which aren't available in Python 2.6, so I couldn't test those functions on Python 2. Also, elmex_dsd_py2
won't run on Python 3, so I've commented it out. I was originally going to post Python 2.6 results too, but I had to reduce the output to fit within the message size limits.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
''' Dictionary symmetric difference
Speed tests of various implementations
See http://stackoverflow.com/q/42650081/4014959
Speed test code by PM 2Ring 2017.03.08
'''
from __future__ import print_function
from itertools import product
from random import random, seed, shuffle
from string import ascii_letters
from timeit import Timer
seed(163)
# The dict symmetric difference functions ------------------------------
def inbar_dsd_long(a, b):
# first make sets of the dictionary keys
keys_in_a = set(a.keys())
keys_in_b = set(b.keys())
# get the unique keys
unique_keys = keys_in_a.symmetric_difference(keys_in_b)
# start an empty dictionary
c = {}
# iterate over the keys
for key in unique_keys:
if key in a:
# if the key is from a dictionary, take the value from there.
c[key] = a[key]
else:
# the key is in b dictionary, take the value from there.
c[key] = b[key]
return c
def pm2r_dsd_py2(a, b):
return dict((k, a[k] if k in a else b[k]) for k in set(a.keys()) ^ set(b.keys()))
#def elmex_dsd_py2(a, b):
#symm_diff = set(a) ^ set(b)
#return dict((k, v) for k, v in a.items() + b.items() if k in symm_diff)
def raymond_dsd(a, b):
c = a.copy()
c.update(b)
for k in (a.keys() & b.keys()):
del c[k]
return c
def inbar_dsd_short(a, b):
return {k: a[k] if k in a else b[k] for k in
set(a.keys()).symmetric_difference(b.keys())}
def pm2r_dsd_py3(a, b):
return {k: a[k] if k in a else b[k] for k in a.keys() ^ b.keys()}
def evkounis_dsd(a, b):
res = {k:v for k, v in a.items() if k not in b}
res.update({k:v for k, v in b.items() if k not in a})
return res
def elmex_dsd_py3(a, b):
symm_diff = set(a) ^ set(b)
return {k: v for k, v in list(a.items()) + list(b.items()) if k in symm_diff}
funcs = (
inbar_dsd_long,
pm2r_dsd_py2,
#elmex_dsd_py2,
raymond_dsd,
inbar_dsd_short,
pm2r_dsd_py3,
evkounis_dsd,
elmex_dsd_py3,
)
# ----------------------------------------------------------------------
# Random key strings
all_keys = [''.join(t) for t in product(ascii_letters, repeat=3)]
shuffle(all_keys)
def make_dicts(size, shared):
''' Make a pair of dicts of length `size`, with random key strings.
`shared` is a real number 0 <= shared <= 1 giving the approximate
ratio of shared keys.
'''
a, b = [], []
keys = iter(all_keys)
shared_count = 0
for i in range(size):
ka = next(keys)
if random() < shared:
kb = ka
shared_count += 1
else:
kb = next(keys)
a.append((ka, i))
b.append((kb, i))
shuffle(a)
shuffle(b)
return dict(a), dict(b), shared_count
def verify(a, b):
''' Verify that all functions return the same result '''
results = [func(a, b) for func in funcs]
last = results[-1]
print(all(last == u for u in results[:-1]))
def time_test(loops, reps):
''' Print timing stats for all the functions '''
timings = []
for func in funcs:
fname = func.__name__
setup = 'from __main__ import a, b, ' + fname
cmd = '{0}(a, b)'.format(fname)
t = Timer(cmd, setup)
result = t.repeat(reps, loops)
result.sort()
timings.append((result, fname))
timings.sort()
for result, fname in timings:
print('{0:16} {1}'.format(fname, result))
# ----------------------------------------------------------------------
print('Verifying')
size = 1000
a, b, shared_count = make_dicts(size, 0.1)
print('size: {0}, shared count: {1}'.format(size, shared_count))
verify(a, b)
# Timeit tests
reps = 3
fmt = '\nsize: {0}, shared count: {1}, loops: {2}'
for shared in (0.1, 0.25, 0.5, 0.75, 0.9):
print('\nSHARED: {0:0.2f}'.format(shared))
#for size in (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000):
for size in (10, 100, 1000, 10000):
a, b, shared_count = make_dicts(size, shared)
loops = 100000 // size
print(fmt.format(size, shared_count, loops))
time_test(loops, reps)
output
Verifying
size: 1000, shared count: 100
True
SHARED: 0.10
size: 10, shared count: 1, loops: 10000
raymond_dsd [0.13777699099955498, 0.13792390800153953, 0.1381044740010111]
evkounis_dsd [0.23560065399942687, 0.23752641000100994, 0.2455631840020942]
pm2r_dsd_py3 [0.23770248700020602, 0.23880975800057058, 0.24221741200017277]
inbar_dsd_long [0.25206301800062647, 0.285963577000075, 0.28780356199786183]
inbar_dsd_short [0.2636144610005431, 0.2653795980004361, 0.2666834120027488]
elmex_dsd_py3 [0.3290278729982674, 0.33175632400161703, 0.3384615989998565]
pm2r_dsd_py2 [0.3978280019982776, 0.43710133700005827, 0.4523775029992976]
size: 100, shared count: 14, loops: 1000
raymond_dsd [0.09872918600012781, 0.09888040100122453, 0.10413656799937598]
evkounis_dsd [0.1804931380029302, 0.1811683220003033, 0.18133216399655794]
pm2r_dsd_py3 [0.20522897000046214, 0.20773609400202986, 0.20979003499815008]
inbar_dsd_short [0.21217649699974572, 0.21281453499977943, 0.21295483400172088]
inbar_dsd_long [0.22985933599920827, 0.23097444899758557, 0.24446944000010262]
elmex_dsd_py3 [0.24242248500013375, 0.24477665499944123, 0.24785449900082313]
pm2r_dsd_py2 [0.3103436530000181, 0.31146229099977063, 0.3152951789998042]
size: 1000, shared count: 94, loops: 100
raymond_dsd [0.10726087399962125, 0.10726979699757067, 0.10853421000138042]
evkounis_dsd [0.19798667299983208, 0.19957152200004202, 0.20145120699817198]
pm2r_dsd_py3 [0.24767412599976524, 0.25033419099781895, 0.25519442899894784]
inbar_dsd_long [0.25753367499783053, 0.259813735003263, 0.2615334299989627]
inbar_dsd_short [0.25835196700063534, 0.2647503340012918, 0.26879757099959534]
elmex_dsd_py3 [0.3065065359987784, 0.3129320820007706, 0.3159641370002646]
pm2r_dsd_py2 [0.32748841799912043, 0.34595297499981825, 0.3797209490003297]
size: 10000, shared count: 987, loops: 10
raymond_dsd [0.2801321059996553, 0.2831085340003483, 0.28407657299976563]
evkounis_dsd [0.36119127300116816, 0.36392319399965345, 0.36926983400189783]
pm2r_dsd_py3 [0.5073807749977277, 0.5122791090034298, 0.5579565990010451]
inbar_dsd_short [0.5086212060014077, 0.5168500030013092, 0.5182715480004845]
inbar_dsd_long [0.602521363998676, 0.6031914080012939, 0.6047401769974385]
pm2r_dsd_py2 [0.6753699099972437, 0.6772755890015105, 0.6782451350009069]
elmex_dsd_py3 [0.7430517110005894, 0.7464511920006771, 0.7468688779990771]
SHARED: 0.25
size: 10, shared count: 3, loops: 10000
raymond_dsd [0.1376171269985207, 0.13765478899949812, 0.13801490599871613]
pm2r_dsd_py3 [0.20131645299989032, 0.20166713100115885, 0.20322838700303691]
inbar_dsd_long [0.20759937799812178, 0.2079929980027373, 0.21979623799779802]
evkounis_dsd [0.2186124869986088, 0.2202955180000572, 0.223359776999132]
inbar_dsd_short [0.23444793200178538, 0.23780764999901294, 0.23976211099943612]
elmex_dsd_py3 [0.3178573650002363, 0.3193927319989598, 0.32410190099835745]
pm2r_dsd_py2 [0.3520881920012471, 0.3543025139988458, 0.3581208620016696]
size: 100, shared count: 23, loops: 1000
raymond_dsd [0.10508492400185787, 0.10563860000183922, 0.10888238600091427]
evkounis_dsd [0.15686738300064462, 0.15824111300025834, 0.15863642399926903]
pm2r_dsd_py3 [0.1829918709991034, 0.184900373002165, 0.18732255400027498]
inbar_dsd_short [0.18875792199833086, 0.19031438200181583, 0.19102797700179508]
inbar_dsd_long [0.21139359699736815, 0.22990316799769062, 0.2418856490003236]
elmex_dsd_py3 [0.22641843899691594, 0.2265430750012456, 0.23143781299950206]
pm2r_dsd_py2 [0.2681290770015039, 0.2703527909980039, 0.27255326500016963]
size: 1000, shared count: 263, loops: 100
raymond_dsd [0.10895683100170572, 0.11233176399764488, 0.11593639900092967]
evkounis_dsd [0.17859331599902362, 0.17949835600302322, 0.18466946999978973]
pm2r_dsd_py3 [0.2147589500018512, 0.21515577800164465, 0.21701817199937068]
inbar_dsd_long [0.21823484400010784, 0.2254721450008219, 0.22556141600216506]
inbar_dsd_short [0.22114897099891095, 0.22157548800169025, 0.22668778500155895]
pm2r_dsd_py2 [0.2780861230021401, 0.27864550599770155, 0.28336624599978677]
elmex_dsd_py3 [0.28186336900034803, 0.2837228559983487, 0.29606761199829634]
size: 10000, shared count: 2480, loops: 10
raymond_dsd [0.278912030000356, 0.28916871899855323, 0.2898256120024598]
evkounis_dsd [0.33290919899809523, 0.3355702890003158, 0.3366183610014559]
pm2r_dsd_py3 [0.4445611189985357, 0.45341551800083835, 0.4544847100005427]
inbar_dsd_short [0.4466933030016662, 0.4632708070021181, 0.48025122500257567]
inbar_dsd_long [0.5405201060020772, 0.5567013979998592, 0.5911358039993502]
pm2r_dsd_py2 [0.586115582002094, 0.600204237998696, 0.6029243630000565]
elmex_dsd_py3 [0.7058123890019488, 0.7067292030005774, 0.7115862030004791]
SHARED: 0.50
size: 10, shared count: 6, loops: 10000
raymond_dsd [0.15135921700129984, 0.1533788429987908, 0.17841531700105406]
pm2r_dsd_py3 [0.15311526600271463, 0.15356177799912984, 0.15895434199774172]
inbar_dsd_long [0.16137141400031396, 0.1618921000008413, 0.17238240400183713]
inbar_dsd_short [0.1808154470018053, 0.18266997299724608, 0.1863039679992653]
evkounis_dsd [0.18221631199776311, 0.18251911100014695, 0.18520446800175705]
pm2r_dsd_py2 [0.2700158850020671, 0.2743520539988822, 0.28932957600045484]
elmex_dsd_py3 [0.28983224500188953, 0.2912340100010624, 0.2933940119983163]
size: 100, shared count: 51, loops: 1000
raymond_dsd [0.10294843999872683, 0.10327848499946413, 0.10685922099946765]
evkounis_dsd [0.13586801600104081, 0.13726477299860562, 0.142784658997698]
pm2r_dsd_py3 [0.1435330319982313, 0.14396326799760573, 0.14474550500017358]
inbar_dsd_short [0.15043617100309348, 0.15080328300246038, 0.1527250040016952]
inbar_dsd_long [0.1667091649978829, 0.17330403699816088, 0.17601154400108499]
pm2r_dsd_py2 [0.20728979400155367, 0.20776088099955814, 0.2079896369978087]
elmex_dsd_py3 [0.21078268400015077, 0.2123827169998549, 0.21517163300086395]
size: 1000, shared count: 491, loops: 100
raymond_dsd [0.11212847299975692, 0.11414236799828359, 0.11498476199994911]
evkounis_dsd [0.14059560900204815, 0.14112727400060976, 0.150327464001748]
pm2r_dsd_py3 [0.14733014900048147, 0.15143406900097034, 0.1542897660001472]
inbar_dsd_short [0.15075810700000147, 0.151888833999692, 0.15750856500017107]
inbar_dsd_long [0.16265833400029805, 0.16367860500031384, 0.17333104299905244]
pm2r_dsd_py2 [0.1993612549995305, 0.19947306600079173, 0.20446195700060343]
elmex_dsd_py3 [0.24682135100010782, 0.24862800600021728, 0.25419495800088043]
size: 10000, shared count: 4938, loops: 10
evkounis_dsd [0.2519790539990936, 0.2573451700009173, 0.2603536310016352]
raymond_dsd [0.2875208960031159, 0.2887761790007062, 0.30461744100102806]
pm2r_dsd_py3 [0.3364586130010139, 0.342166794998775, 0.3465069459998631]
inbar_dsd_short [0.3490315640010522, 0.6202766900023562, 0.7155317880024086]
inbar_dsd_long [0.42809327600116376, 0.4363977649991284, 0.4812496539998392]
pm2r_dsd_py2 [0.46369219400003203, 0.46809901899905526, 0.4706174610000744]
elmex_dsd_py3 [0.6603999830003886, 0.6629649060014344, 0.6652154759976838]
SHARED: 0.75
size: 10, shared count: 7, loops: 10000
pm2r_dsd_py3 [0.14004066000052262, 0.14024711000092793, 0.1411744200013345]
inbar_dsd_long [0.1457400300023437, 0.1463650259975111, 0.17371471199658117]
raymond_dsd [0.1495657380000921, 0.15151091000007, 0.1532108950013935]
inbar_dsd_short [0.16798981899773935, 0.1684792589985591, 0.17371860500134062]
evkounis_dsd [0.18283682300170767, 0.18351536599948304, 0.18536045300061232]
pm2r_dsd_py2 [0.24651207700298983, 0.24725952299922938, 0.3011513509991346]
elmex_dsd_py3 [0.27965197500088834, 0.2817374969999946, 0.28211258000010275]
size: 100, shared count: 83, loops: 1000
evkounis_dsd [0.10071835599956103, 0.10109729699979653, 0.1036734150002303]
inbar_dsd_long [0.10147314599817037, 0.1017698140021821, 0.11575333300061175]
pm2r_dsd_py2 [0.1257392070001515, 0.14690794800117146, 0.2597000979985751]
pm2r_dsd_py3 [0.16547765900031663, 0.17877282599874889, 0.1817621379996126]
elmex_dsd_py3 [0.18176361400037422, 0.18339519599976484, 0.18422297999859438]
inbar_dsd_short [0.18878075899920077, 0.1932126639985654, 0.201184026998817]
raymond_dsd [0.23026226100046188, 0.2342098570006783, 0.24134657600006904]
size: 1000, shared count: 751, loops: 100
inbar_dsd_short [0.0925550639985886, 0.09375216300031752, 0.09518678500171518]
pm2r_dsd_py3 [0.09365715600142721, 0.0952552939997986, 0.0984138530002383]
raymond_dsd [0.10659463599949959, 0.10675223399812239, 0.1076178000002983]
inbar_dsd_long [0.10787330499806558, 0.10813268299898482, 0.1191909779990965]
evkounis_dsd [0.11020168100003502, 0.11101243599841837, 0.11369209199983743]
pm2r_dsd_py2 [0.1283391249999113, 0.12977415000204928, 0.13450328500039177]
elmex_dsd_py3 [0.20605224600149086, 0.20856778099914663, 0.21231961700323154]
size: 10000, shared count: 7525, loops: 10
evkounis_dsd [0.19238157699874137, 0.19369199399807258, 0.20787687100164476]
pm2r_dsd_py3 [0.237352975000249, 0.2393961540001328, 0.24592895499881706]
inbar_dsd_short [0.24010049900243757, 0.24383026600116864, 0.246290401002625]
inbar_dsd_long [0.31666912799846614, 0.3353785740000603, 0.3762496050003392]
raymond_dsd [0.3268343650015595, 0.3270019219999085, 0.32956799900057376]
pm2r_dsd_py2 [0.3330148269997153, 0.34052117800092674, 0.3426254549995065]
elmex_dsd_py3 [0.6130798710000818, 0.6139247349965444, 0.6146237579996523]
SHARED: 0.90
size: 10, shared count: 10, loops: 10000
pm2r_dsd_py3 [0.09191049900255166, 0.09203974899719469, 0.09560386399971321]
inbar_dsd_long [0.09304381299807574, 0.09397280899793259, 0.10319281500051147]
inbar_dsd_short [0.0980829280015314, 0.09835117700276896, 0.0987546550022671]
raymond_dsd [0.14094099900103174, 0.14119526200011023, 0.14634641500015277]
evkounis_dsd [0.14480078699853038, 0.1466599049999786, 0.14705315900209825]
pm2r_dsd_py2 [0.16137886599972262, 0.16186897499937913, 0.1626489610025601]
elmex_dsd_py3 [0.24912584599951515, 0.2519607159993029, 0.2550744569998642]
size: 100, shared count: 88, loops: 1000
pm2r_dsd_py3 [0.08017906299937749, 0.08175948099960806, 0.08336899599817116]
inbar_dsd_short [0.08394136000060826, 0.08467326000027242, 0.08476182100275764]
inbar_dsd_long [0.09241838099842425, 0.0929719669984479, 0.10157853300188435]
evkounis_dsd [0.09769711500121048, 0.09770239999852492, 0.10219176600003266]
pm2r_dsd_py2 [0.11295593600152642, 0.11317849099941668, 0.11382339899864746]
raymond_dsd [0.11950065099881613, 0.11954410699763685, 0.16439275900120265]
elmex_dsd_py3 [0.17893833099878975, 0.18027151500064065, 0.18072834000122384]
size: 1000, shared count: 896, loops: 100
pm2r_dsd_py3 [0.06560493199867778, 0.06627220900190878, 0.06649829500020132]
inbar_dsd_short [0.067232484001579, 0.06832705600027111, 0.06892605100074434]
inbar_dsd_long [0.07928322799853049, 0.0793153419981536, 0.0874185499997111]
pm2r_dsd_py2 [0.08986150900091161, 0.09258468600091874, 0.09545781900305883]
evkounis_dsd [0.09216968399778125, 0.09272978199805948, 0.09716289000061806]
raymond_dsd [0.11052805100189289, 0.11131704600120429, 0.11136766299750889]
elmex_dsd_py3 [0.18965840600139927, 0.1898866600022302, 0.19107911399987643]
size: 10000, shared count: 9011, loops: 10
evkounis_dsd [0.1584843410018948, 0.16192917299849796, 0.16836377900108346]
pm2r_dsd_py3 [0.1789340169998468, 0.17990425000243704, 0.1874260629992932]
inbar_dsd_short [0.18104806900009862, 0.18631987900153035, 0.18891330599944922]
inbar_dsd_long [0.2561770180000167, 0.2672927259991411, 0.27309057399907033]
pm2r_dsd_py2 [0.26508888299940736, 0.2661178109992761, 0.2812051930013695]
raymond_dsd [0.3262405569985276, 0.32729987999846344, 0.3313657439975941]
elmex_dsd_py3 [0.5737760600022739, 0.5791283889993792, 0.5847248999998556]
A symmetric difference is equal to the union minus the intersection:
>>> a = {'a': 1, 'b':2}
>>> b = {'b': 2, 'c':3}
>>> c = a.copy()
>>> c.update(b)
>>> for k in (a.keys() & b.keys()):
del c[k]
>>> c
{'a': 1, 'c': 3}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With