Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - dictionary iterator for pool map

I am handling set of frozensets. I am trying to find minimal sets for each frozenset in the dictionary 'output'. I have 70k frozensets, so i am making chunk of this frozenset dictionary and parallelizing this task. When i try to pass this dictionary as an input to my function, only key is being sent and so i am getting error, can someone help me to find what's wrong in this.

output => {frozenset({'rfid', 'zone'}): 0, frozenset({'zone'}): 0, frozenset({'zone', 'time'}): 0}

def reduce(prob,result,output):
    print(output)
    for k in output.keys():
    #Function to do something


def reducer(prob,result,output):
    print(output)
    p = Pool(4) #number of processes = number of CPUs
    func2 = partial(reduce,prob,result)
    reduced_values= p.map( func2,output,chunksize=4)
    p.close() # no more tasks
    p.join()  # wrap up current tasks
    return reduced_values

if __name__ == '__main__':
    final = reducer(prob,result,output)

{frozenset({'rfid', 'zone'}): 0, frozenset({'zone'}): 0, frozenset({'zone', 'time'}): 0}
frozenset({'rfid', 'zone'}) 
Error : AttributeError: 'frozenset' object has no attribute 'keys'

Updated as requested

from multiprocessing import Pool
from functools import partial
import itertools

output = {frozenset({'rfid', 'zone'}): 0, frozenset({'zone'}): 0, frozenset({'zone', 'time'}): 0}
prob = {'3': 0.3, '1': 0.15, '2': 0.5, '4': 0.05}
result = {'2': {frozenset({'time', 'zone'}), frozenset({'time', 'rfid'})}, '3': {frozenset({'time', 'rfid'}), frozenset({'rfid', 'zone'})}}

def reduce(prob,result,output):
    print(output)
    for k in output.keys():
        for ky,values in result.items():
            if any(k>=l for l in values):
                output[k] += sum((j for i,j in prob.items() if i == ky))
    return output


def reducer(prob,result,output):
    print(output)
    p = Pool(4) #number of processes = number of CPUs
    func2 = partial(reduce,prob,result)
    reduced_values= p.map( func2,output,chunksize=4)
    p.close() # no more tasks
    p.join()  # wrap up current tasks
    return reduced_values

if __name__ == '__main__':
    final = reducer(prob,result,output)


{frozenset({'zone', 'rfid'}): 0, frozenset({'zone'}): 0, frozenset({'time', 'zone'}): 0}
    for k in output.keys():
AttributeError: 'frozenset' object has no attribute 'keys'
frozenset({'zone', 'rfid'})

Full error details from the console:

{frozenset({'zone', 'time'}): 0, frozenset({'zone', 'rfid'}): 0, frozenset({'zone'}): 0}
frozenset({'zone', 'time'})
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "F:\Python34\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "F:\Python34\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "C:\Users\Dell\workspace\key_mining\src\variable.py", line 16, in reduce
    for k in output.keys():
AttributeError: 'frozenset' object has no attribute 'keys'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\***\variable.py", line 33, in <module>
    final = reducer(prob,result,output)
  File "C:\***\variable.py", line 27, in reducer
    reduced_values= p.map( func2,output,chunksize=4)
  File "F:\Python34\lib\multiprocessing\pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "F:\Python34\lib\multiprocessing\pool.py", line 599, in get
    raise self._value
AttributeError: 'frozenset' object has no attribute 'keys'
like image 508
ds_user Avatar asked Sep 20 '14 13:09

ds_user


People also ask

How to iterate a dictionary in Python?

You can iterate a dictionary in python over keys, iterate over the key and the value, using lambda function e.t.c. In this article, I will explain what is Dictionary? its usage, and how to iterate through for loop with several examples.

How are dictionaries indexed in Python?

Unlike sequences, which are iterables that support element access using integer indices, dictionaries are indexed by keys. The keys in a dictionary are much like a set, which is a collection of hashable and unique objects. Because the objects need to be hashable, mutable objects can’t be used as dictionary keys.

What is a dictionary in Python?

Dictionaries are an useful and widely used data structure in Python. As a Python coder, you’ll often be in situations where you’ll need to iterate through a dictionary in Python, while you perform some actions on its key-value pairs.

How to modify the keys of a dictionary in Python 2?

But .iteritems(), iterkeys(), and .itervalues() return iterators. So, if you’re using Python 2, then you can modify the dictionary’s keys by using .keys() directly. On the other hand, if you’re using iterkeys() in your Python 2 code and you try to modify the keys of a dictionary, then you’ll get a RuntimeError.


2 Answers

The problem is that you're passing a dict object to map. When map iterates over the items in output, it's doing this:

for key in output:  # When you iterate over a dictionary, you just get the keys.
    func2(key)

So each time func2 is called, all that's contained in output is a single key (a frozenset) from the dictionary.

Based on your comments above, it seems you want to pass the entire dictionary to func2, but if you do that, you're really not doing anything at all in parallel. I think maybe you think that doing

pool.map(func2, output, chunksize=4)

Will result in the output dictionary being split into four dictionaries, each chunk being passed to an instance of func2. But that's not what happens at all. Instead, each key from the dictionary is sent individually func2.

chunksize is just used to tell the pool how many elements of output to send to each child process via inter-process communication at a time. It's only used for internal purposes; no matter what chunksize you use, func2 will only be called with a single element of output.

If you want to actually pass chunks of the dict, you need to do something like this:

# Break the output dict into 4 lists of (key, value) pairs
items = list(output.items())
chunksize = 4
chunks = [items[i:i + chunksize ] for i in range(0, len(items), chunksize)]
reduced_values= p.map(func2, chunks)

That will pass a list of (key, value) tuples from the output dict to func2. Then, inside func2, you can turn the list back into a dict:

def reduce(prob,result,output):
    output = dict(item for item in output)  # Convert back to a dict
    print(output)
    ...
like image 62
dano Avatar answered Sep 29 '22 01:09

dano


Problem is, you are trying to access frozenset.keys in output.keys().

set and frozenset are used to perform set operation like UNION, INTERSECTION etc.

When you pass arg in frozenset it convert it to frozenset object and frozenset has no method keys which you are trying to access @ for k in output.keys(): when you passed output in reducer you passed output it might have onlyoutput.keyslike[frozenset({...}), frozenset({...})], when you try to accessoutput.keys()meansfrozenset({...}).keys` its give error.

like image 35
Nilesh Avatar answered Sep 29 '22 02:09

Nilesh