Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split dictionary into multiple dictionaries fast

I have found a solution but it is really slow:

def chunks(self,data, SIZE=10000):     for i in xrange(0, len(data), SIZE):         yield dict(data.items()[i:i+SIZE]) 

Do you have any ideas without using external modules (numpy and etc.)

like image 978
badc0re Avatar asked Apr 05 '14 08:04

badc0re


People also ask

Can a dictionary be sliced?

You can't really "slice" a dictionary, since it's a mutable mapping and not a sequence.

How do you break nested dictionaries in Python?

In Python, we use “ del “ statement to delete elements from nested dictionary.


2 Answers

Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this

from itertools import islice  def chunks(data, SIZE=10000):     it = iter(data)     for i in range(0, len(data), SIZE):         yield {k:data[k] for k in islice(it, SIZE)} 

Sample run:

for item in chunks({i:i for i in xrange(10)}, 3):     print(item) 

Output

{0: 0, 1: 1, 2: 2} {3: 3, 4: 4, 5: 5} {8: 8, 6: 6, 7: 7} {9: 9} 
like image 190
thefourtheye Avatar answered Sep 23 '22 18:09

thefourtheye


Another method is iterators zipping:

>>> from itertools import izip_longest, ifilter >>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8} 

Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunks list to izip_longest you will get needed number of elements from source dict (ifilter used to remove None from zip results). With generator expression you can lower memory usage:

>>> chunks = [d.iteritems()]*3 >>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks)) >>> list(g) [{'a': 1, 'c': 3, 'b': 2},  {'e': 5, 'd': 4, 'g': 7},  {'h': 8, 'f': 6}] 
like image 29
ndpu Avatar answered Sep 22 '22 18:09

ndpu