Why am I receiving a Recursion error with multiprocessing?

Question

I'm looking to geocode a large list of addresses using multiprocessing. I have the following code:

import multiprocessing
import geocoder

addresses = ['New York City, NY','Austin, TX', 'Los Angeles, CA', 'Boston, MA'] # and on and on

def geocode_worker(address):
    return geocoder.arcgis(address)

def main_process():
    pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
    return pool.map(geocode_worker, addresses)

if __name__ == '__main__':
    main_process()

But it gives me this error:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/anaconda3/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
    task = get()
  File "/opt/anaconda3/lib/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 599, in __getattr__
    if not self.ok:
  File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 536, in ok
    return len(self) > 0
  File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 422, in __len__
    return len(self._list)

The last 3 lines of the error are repeated over and over again, and then the last line of the traceback is:

RecursionError: maximum recursion depth exceeded while calling a Python object

Can anyone help me figure out why?

dano · Accepted Answer

The problem is that the ArcgisQuery object returned by geocoder is not picklable - or rather, it's not unpicklable. The unpickle process hits an infinite loop due to its use of __getattr__, which internally tries to access self.ok, which eventually relies on self._list to be defined, which is not defined while unpickling, because it is only defined in __init__, and __init__ is not called while unpickling. Since it's not defined, it tries to use __getattr__ to find it, which tries to access self.ok again, and creates the infinite loop.

You can work around this by not passing the ArcgisQuery object itself between the worker process and your main process, and instead pass just its underlying __dict__. Then, rebuild the ArcgisQuery objects in your main process:

import multiprocessing
import geocoder
from geocoder.arcgis import ArcgisQuery

addresses = ['New York City, NY','Austin, TX', 'Los Angeles, CA', 'Boston, MA'] # and on and on

def geocode_worker(address):
    out = geocoder.arcgis(address)
    return out.__dict__ # Only return the object's __dict__

def main_process():
    pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
    l = pool.map(geocode_worker, addresses)
    out = []
    for d in l:
        q = ArcgisQuery(d['location'])  # location is a required constructor arg
        q.__dict__.update(d)  # Load the rest of our state into the new object
        out.append(q)
    return out

if __name__ == '__main__':
    print(main_process())

If you don't actually need the whole ArcgisQuery object, and only need some parts of it, you could also just return those from the worker processes, to avoid the need for this hack.

For what it's worth, it looks like geocoder could fix its pickling problem by implementing __getstate__ and __setstate__ on ArcgisQuery or its base class, like this:

    def __getstate__(self):
        return self.__dict__

    def __setstate__(self, state):
        self.__dict__.update(state)

Why am I receiving a Recursion error with multiprocessing?

Tags:

python

multiprocessing

geocode

bms2202

1 Answers

dano

Recent Activity

Donate For Us

Why am I receiving a Recursion error with multiprocessing?

Tags:

python

multiprocessing

geocode

bms2202

1 Answers

dano

Related questions

Recent Activity

Donate For Us