Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine if a python function has changed

Tags:

python

Context

I am trying to cache executions in a data processing framework (kedro). For this, I want to develop a unique hash for a python function to determine if anything in the function body (or the functions and modules this function calls) has changed. I looked into __code__.co_code. While that nicely ignores comments, spacing etc, it also doesn't change when two functions are obviously different. E.g.

def a():
  a = 1
  return a

def b():
  b = 2
  return b

assert a.__code__.co_code != b.__code__.co_code

fails. So the byte code for these two functions is equal.

The ultimate goal: Determine if either a function's code or any of its data inputs have changed. If not and the result already exists, skip execution to save runtime.

Question: How can one get a fingerprint of a functions code in python?

Another idea brought forward by a colleague was this:

import dis

   def compare_instructions(func1, func2):
       """compatre instructions of two functions"""
       func1_instructions = list(dis.get_instructions(func1))
       func2_instructions = list(dis.get_instructions(func2))
 
       # compare every attribute of instructions except for starts_line
       for line1, line2 in zip(func1_instructions, func2_instructions):
           assert line1.opname == line2.opname
           assert line1.opcode == line2.opcode
           assert line1.arg == line2.arg
           assert line1.argval == line2.argval
           assert line1.argrepr == line2.argrepr
           assert line1.offset == line2.offset
  
       return True

This seems rather like a hack. Other tools like pytest-testmon try to solve this as well but they appear to be using a number of heuristics.

like image 774
pascalwhoop Avatar asked Nov 16 '22 03:11

pascalwhoop


1 Answers

__code__.co_code returns the byte_code which doesn't reference the constants. Ignore the constants in your functions and they are the same.

__code__.co_consts contains information about the constants so would need to be accounted for in your comparison.

assert a.__code__.co_code != b.__code__.co_code \
       or a.__code__.co_consts != b.__code__.co_consts

Looking at inspect highlights a few other considerations for 'sameness'. For example, to ensure the functions below are considered different, default arguments must be accounted for.

def a(a1, a2=1):
    return a1 * a2

def b(b1, b2=2):
    return b1 * b2

One way to finger print is to use the built-in hash function. Assume the same function defintions as in the OP's example:

def finger_print(func):
    return hash(func.__code__.co_consts) + hash(func.__code__.co_code)

assert finger_print(a) != finger_print(b)
like image 176
Guy Gangemi Avatar answered Dec 01 '22 01:12

Guy Gangemi