"Online" monkey patching of a function

Tags:

Your program just paused on a pdb.set_trace().

Is there a way to monkey patch the function that is currently running, and "resume" execution?

Is this possible through call frame manipulation?

Some context:

Oftentimes, I will have a complex function that processes large quantities of data, without having a priori knowledge of what kind of data I'll find:

def process_a_lot(data_stream):
    #process a lot of stuff
    #...
    data_unit= data_stream.next()
    if not can_process(data_unit)
        import pdb; pdb.set_trace()
    #continue processing

This convenient construction launches a interactive debugger when it encounters unknown data, so I can inspect it at will and change process_a_lot code to handle it properly.

The problem here is that, when data_stream is big, you don't really want to chew through all the data again (let's assume next is slow, so you can't save what you already have and skip on the next run)

Of course, you can replace other functions at will once in the debugger. You can also replace the function itself, but it won't change the current execution context.

Edit: Since some people are getting side-tracked: I know there are a lot of ways of structuring your code such that your processing function is separate from process_a_lot. I'm not really asking about ways to structure the code as much as how to recover (in runtime) from the situation when the code is not prepared to handle the replacement.

614

asked Nov 05 '14 18:11

loopbackbee

1 Answers

First a (prototype) solution, then some important caveats.

# process.py

import sys
import pdb
import handlers

def process_unit(data_unit):
    global handlers
    while True:
        try:
            data_type = type(data_unit)
            handler = handlers.handler[data_type]
            handler(data_unit)
            return
        except KeyError:
            print "UNUSUAL DATA: {0!r}". format(data_unit)
            print "\n--- INVOKING DEBUGGER ---\n"
            pdb.set_trace()
            print
            print "--- RETURNING FROM DEBUGGER ---\n"
            del sys.modules['handlers']
            import handlers
            print "retrying"


process_unit("this")
process_unit(100)
process_unit(1.04)
process_unit(200)
process_unit(1.05)
process_unit(300)
process_unit(4+3j)

sys.exit(0)

And:

# handlers.py

def handle_default(x):
    print "handle_default: {0!r}". format(x)

handler = {
    int: handle_default,
    str: handle_default
}

In Python 2.7, this gives you a dictionary linking expected/known types to functions that handle each type. If no handler is available for a type, the user is dropped own into the debugger, giving them a chance to amend the handlers.py file with appropriate handlers. In the above example, there is no handler for float or complex values. When they come, the user will have to add appropriate handlers. For example, one might add:

def handle_float(x):
    print "FIXED FLOAT {0!r}".format(x)

handler[float] = handle_float

And then:

def handle_complex(x):
    print "FIXED COMPLEX {0!r}".format(x)

handler[complex] = handle_complex

Here's what that run would look like:

$ python process.py
handle_default: 'this'
handle_default: 100
UNUSUAL DATA: 1.04

--- INVOKING DEBUGGER ---

> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue

--- RETURNING FROM DEBUGGER ---

retrying
FIXED FLOAT 1.04
handle_default: 200
FIXED FLOAT 1.05
handle_default: 300
UNUSUAL DATA: (4+3j)

--- INVOKING DEBUGGER ---

> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue

--- RETURNING FROM DEBUGGER ---

retrying
FIXED COMPLEX (4+3j)

Okay, so that basically works. You can improve and tweak that into a more production-ready form, making it compatible across Python 2 and 3, et cetera.

Please think long and hard before you do it that way.

This "modify the code in real-time" approach is an incredibly fragile pattern and error-prone approach. It encourages you to make real-time hot fixes in the nick of time. Those fixes will probably not have good or sufficient testing. Almost by definition, you have just this moment discovered you're dealing with a new type T. You don't yet know much about T, why it occurred, what its edge cases and failure modes might be, etc. And if your "fix" code or hot patches don't work, what then? Sure, you can put in some more exception handling, catch more classes of exceptions, and possibly continue.

Web frameworks like Flask have debug modes that work basically this way. But those are debug modes, and generally not suited for production. Moreover, what if you type the wrong command in the debugger? Accidentally type "quit" rather than "continue" and the whole program ends, and with it, your desire to keep the processing alive. If this is for use in debugging (exploring new kinds of data streams, maybe), have at.

If this is for production use, consider instead a strategy that sets aside unhandled-types for asynchronous, out-of-band examination and correction, rather than one that puts the developer / operator in the middle of a real-time processing flow.

171

answered Sep 21 '22 20:09

Jonathan Eunice

Related questions
                            
                                How to create an activity plot from Pandas Dataframe (like the Github contribution plot)
                            
                                Python : Adding a code routine at each line of a block of code
                            
                                How to get IP address of the launched instance with Boto
                            
                                Celery dies with DBPageNotFoundError
                            
                                sqlalchemy, mixins, foreignkeys and declared_attr
                            
                                Subtract subgroup averages from individuals without resorting to for loop
                            
                                Efficiently grouping a list of coordinates points by location in Python
                            
                                Matplotlib timelines
                            
                                ImportError: "No modules named". But modules already installed in dist-packages
                            
                                Generating random numbers with a given probability density function
                            
                                How to download .gz files with requests in Python without decoding it?
                            
                                Python socket stress concurrency
                            
                                Full list of twitter "friends" using python and tweepy
                            
                                Functools.update_wrapper() doesn't work properly
                            
                                Pandas: number of days elapsed since a certain date
                            
                                Flask JSON serializable error because of flask babel
                            
                                How can I exponentially scale the Y axis with matplotlib
                            
                                Operations on every row in pandas DataFrame
                            
                                Python numpy.var returning wrong values
                            
                                PIL(low) and multi-page TIFFS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"Online" monkey patching of a function

Tags:

python

function

monkeypatching

loopbackbee

People also ask

1 Answers

Jonathan Eunice

Recent Activity

Donate For Us