Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can python subprocess.Popen see select.poll and then later not? (select 'module' object has no attribute 'poll')

I'm using the (awesome) mrjob library from Yelp to run my python programs in Amazon's Elastic Map Reduce. It depends on subprocess in the standard python library. From my mac running python2.7.2, everything works as expected

However, when I switched to using the exact same code on Ubuntu LTS 11.04 also with python2.7.2, I encountered something strange:

mrjob loads the job, and then attempts to communicate with its child processes using subprocess and generates this error:

      File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/emr.py", line 1212, in _build_steps
        steps = self._get_steps()
      File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/runner.py", line 1003, in _get_steps
        stdout, stderr = steps_proc.communicate()
      File "/usr/lib/python2.7/subprocess.py", line 754, in communicate
        return self._communicate(input)
      File "/usr/lib/python2.7/subprocess.py", line 1302, in _communicate
        stdout, stderr = self._communicate_with_poll(input)
      File "/usr/lib/python2.7/subprocess.py", line 1332, in _communicate_with_poll
        poller = select.poll()
    AttributeError: 'module' object has no attribute 'poll'

This appears to be a problem with subprocess and not mrjob.

I dug into /usr/lib/python2.7/subprocess.py and found that during import it runs:

    if mswindows:
        ... snip ...
    else:
        import select
        _has_poll = hasattr(select, 'poll')

By editing that, I verified that it really does set _has_poll==True. And this is correct; easily verified on the command line.

However, when execution progresses to using Popen._communicate_with_poll somehow the select module has changed! This is generated by printing dir(select) right before it attempts to use select.poll().

    ['EPOLLERR', 'EPOLLET', 'EPOLLHUP', 'EPOLLIN', 'EPOLLMSG', 
    'EPOLLONESHOT', 'EPOLLOUT', 'EPOLLPRI', 'EPOLLRDBAND', 
    'EPOLLRDNORM', 'EPOLLWRBAND', 'EPOLLWRNORM', 'PIPE_BUF', 
    'POLLERR', 'POLLHUP', 'POLLIN', 'POLLMSG', 'POLLNVAL', 
    'POLLOUT', 'POLLPRI', 'POLLRDBAND', 'POLLRDNORM',
    'POLLWRBAND', 'POLLWRNORM', '__doc__', '__name__', 
    '__package__', 'error', 'select']

no attribute called 'poll'!?!? How did it go away?

So, I hardcoded _has_poll=False and then mrjob happily continues with its work, runs my job in AWS EMR, with subprocess using communicate_with_select... and I'm stuck with a hand-modified standard library...

Any advice? :-)

like image 615
user1181407 Avatar asked Jan 31 '12 21:01

user1181407


2 Answers

I had a similar problem and it turns out that gevent replaces the built-in select module with gevent.select.select which doesn't have a poll method (as it is a blocking method). However for some reason by default gevent doesn't patch subprocess which uses select.poll.

An easy fix is to replace subprocess with gevent.subprocess:

import gevent.monkey
gevent.monkey.patch_all(subprocess=True)

import sys
import gevent.subprocess
sys.modules['subprocess'] = gevent.subprocess

If you do this before importing the mrjob library, it should work fine.

like image 65
Tiago Queiroz Avatar answered Oct 01 '22 03:10

Tiago Queiroz


Sorry for writing a full answer instead of a comment, otherwise I'd lose code indentation.

I cannot help you directly since something seems very strictly tied to your code, but I can help you find out, by relying on the fact that Python modules can be arbitrary object, try something like that:

class FakeModule(dict):
    def __init__(self, origmodule):
        self._origmodule = origmodule
    self.__all__ = dir(origmodule)

    def __getattr__(self, attr):
    return getattr(self._origmodule, attr)


    def __delattr__(self, attr):
        if attr == "poll":
            raise RuntimeError, "Trying to delete poll!"
        self._origmodule.__delattr__(attr)


def replaceSelect():
    import sys
    import select
    fakeselect = FakeModule(select)

    sys.modules["select"] = fakeselect

replaceSelect()

import select
del select.poll

and you'll get an output like:

Traceback (most recent call last):
  File "domy.py", line 27, in <module>
    del select.poll
  File "domy.py", line 14, in __delattr__
    raise RuntimeError, "Trying to delete poll!"
RuntimeError: Trying to delete poll!

By calling replaceSelect() in your code you should be able to get a traceback of where somebody is deleting poll(), so you can understand why.

I hope my FakeModule implementation is good enough, otherwise you might need to modify it.

like image 34
Alan Franzoni Avatar answered Oct 01 '22 01:10

Alan Franzoni