Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: init() missing 1 required positional argument: 'message' using Multiprocessing

I am running a piece of code using a multiprocessing pool. The code works on a data set and fails on another one. Clearly the issue is data driven - Having said that I am not clear where to begin troubleshooting as the error I receive is the following. Any hints for a starting point would be most helpful. Both sets of data are prepared using the same code - so I don't expect there to be a difference - yet here I am.

Also see comment from Robert - we differ on os, and python version 3.6 (I have 3.4, he has 3.6) and quite different data sets. Yet error is identical down to the lines in the python code.

My suspicions:

  1. there is a memory limit per core that is being enforced.
  2. there is some period of time after which the process literally collects - finds the process is not over and gives up.

    Exception in thread Thread-9:

    Traceback (most recent call last):

    File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\threading.py", line 911, in _bootstrap_inner self.run()

    File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\threading.py", line 859, in run self._target(*self._args, **self._kwargs)

    File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\pool.py", line 429, in _handle_results task = get()

    File "C:\Program Files\Python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\connection.py", line 251, in recv return ForkingPickler.loads(buf.getbuffer())

    TypeError: init() missing 1 required positional argument: 'message'

like image 408
pythOnometrist Avatar asked Apr 07 '17 01:04

pythOnometrist


3 Answers

I think the issue is that langdetect quietly declares a hidden global detector factory here https://github.com/Mimino666/langdetect/blob/master/langdetect/detector_factory.py#L120:

def init_factory():
    global _factory
    if _factory is None:
        _factory = DetectorFactory()
        _factory.load_profile(PROFILES_DIRECTORY)

def detect(text):
    init_factory()
    detector = _factory.create()
    detector.append(text)
    return detector.detect()


def detect_langs(text):
    init_factory()
    detector = _factory.create()
    detector.append(text)
    return detector.get_probabilities()

This kind of thing can cause issues in multiprocessing, in my experience, by running afoul of the way that multiprocessing attempts to share resources in memory across processes and manages namespaces in workers and the master process, though the exact mechanism in this case is a black box to me. I fixed it by adding a call to init_factory function to my pool initialization function:

from langdetect.detector_factory import init_factory
def worker_init_corpus(stops_in):
    global sess
    global stops
    sess = requests.Session()
    sess.mount("http://", HTTPAdapter(max_retries=10))
    stops = stops_in
    signal.signal(signal.SIGINT, signal.SIG_IGN)
    init_factory()

FYI: The "sess" logic is to provide each worker with an http connection pool for requests, for similar issues when using that module with multiprocessing pools. If you don't do this, the workers do all their http communication back up through the parent process because that's where the hidden global http connection pool is by default, and then everything is painfully slow. This is one of the issues I've run into that made me suspect a similar cause here.

Also, to further reduce potential confusion: stops is for providing the stopword list I'm using to the mapped function. And the signal call is to force pools to exit nicely when hit with a user interrupt (ctrl-c). Otherwise they often get orphaned and just keep on chugging along after the parent process dies.

Then my pool is initialized like this:

self.pool = mp.Pool(mp.cpu_count()-2, worker_init_corpus, (self.stops,))

I also wrapped my call to detect in a try/catch LangDetectExeception block:

try:
    posting_out["lang"] = detect(posting_out["job_description"])
except LangDetectException:
    posting_out["lang"] = "none"

But this doesn't fix it on its own. Pretty confident that the the initialization is the fix.

like image 64
Robert E Mealey Avatar answered Nov 02 '22 11:11

Robert E Mealey


Thanks to Robert - focusing on lang detect yielded the fact that possibly one of my text entries were empty

LangDetectException: No features in text

rookie mistake - possibly due to encoding errors- re-running after filtering those out - will keep you (Robert) posted.

like image 31
pythOnometrist Avatar answered Nov 02 '22 10:11

pythOnometrist


I was throwing a custom exception somewhere in the code, and it was being thrown in most of my processes (in the pool). About 90% of my processes went to sleep because this exception occurred in them. But, instead of getting a normal traceback, I get this cryptic error. Mine was on Linux, though.

To debug this, I removed the pool and ran the code sequentially.

like image 2
Rohan Bhatia Avatar answered Nov 02 '22 10:11

Rohan Bhatia