Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: updating a large dictionary using another large dictionary

Tags:

python

I am trying to update some values of a large dictionary using values from another dictionary where they have similar keys (the same date but in a different format). The process I'm currently using is too slow and I want to reduce the bottleneck.

This is my current solution (it writes the updated dict to a file):

from dateutil import parser
File = open(r'E:Test1.txt','w')

b = {'1946-1-1':0,..........,'2012-12-31':5}
d = {'1952-12-12':5,........,'1994-7-2':10}

for key1, val1 in b.items():
    DateK1 = parser.parse(key1)
    Value = val1
    for key2, val2 in d.items():
        DateK2 = parser.parse(key2)
        if DateK1 == DateK2:
            d[key2] = Value        

Order= sorted(d.items(), key=lambda t: t[0])

for item in Order:
    File.write('%s,%s\n' % item)
File.close()
like image 898
Eric Gentil Avatar asked Oct 03 '12 22:10

Eric Gentil


2 Answers

You should use the update method to merge dictionaries:

b.update(d)

.

At the moment you are iterating over d for every key in b... which is slow. You can get around this by setting up two dictionaries which will have matching keys (and equal dates will hash the same - perhaps the cool thing to note here is that datetime objects hash):

b1 = dict( (parser.parse(k),v) for k,v for b.iteritems() )
d1 = dict( (parser.parse(k),v) for k,v for d.iteritems() )

d1.update(b1) # update d1 with the values from b1

Edit:

I've just realised that you're not quite doing an update, since only those shared values are being updated, so instead (again by just iterating once):

for k_d1 in d1:
    if k_d1 in b1:
        d1[k_d1] = b1[k_d1]
like image 118
Andy Hayden Avatar answered Oct 04 '22 03:10

Andy Hayden


Suggested changes:

  1. Use .iteritems() instead of .items(). The way you have it, a list of pairs is created in memory and iterated over, which is wasteful.
  2. You said that the date format is different between b and d. I'm guessing the month and day are switched? If so, you can still make big savings by computing what the d key would be and then checking membership.

Code with changes:

def switch_month_day(datestr):
  fields = datestr.split("-")
  return "%s-%s-%s" % (fields[0], fields[2], fields[1])

for key, val in b.iteritems():
  DateK = switch_month_day(key)
  if DateK in d:
    d[DateK] = val
like image 30
wberry Avatar answered Oct 04 '22 04:10

wberry