Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to profile code written with numba.njit() decorators

I have a fairly complex computational code that I'm trying to speed up and multi-thread. In order to optimize the code, I'm trying to work out which functions are taking the longest or being called the most.

I haven't really profiled code before, so I could be missing something. However, I know many existing profiling modules don't really play nice with numba's njit() decorator due to the recompiling with LLVM.

So my question would be this: What's the best way to profile code in which most functions have the njit() decorator, with a few non-jitted control functions?

I've come across data_profiler before, however it doesn't seem to be in the conda repository anymore and I wouldn't know how to build it from source in conda, or if it would still be compatible with modern versions of its dependencies.

like image 210
Yoshi Avatar asked Mar 18 '19 01:03

Yoshi


1 Answers

If this may help as a last resort, let's make it a try:

Having spent a few tens of man*years in QuantFX module development, both using numba and other vectorisation / jit-acceleration tools, let me share a few pieces of experience, that were considered handy for our similarly motivated profiling.

On the contrary of the mentioned data_profiler, with milliseconds, we enjoyed microsecond resolution provided as side-effect of using a ZeroMQ module, for distributed signalling / messaging infrastructure.

ZeroMQ has all its services implemented in a core-engine, called a Context, yet there is one small utility free to re-use independently of this instrumentation, a Stopwatch - a microsecond resolution timer class.

So, nothing could stop us from:

from pyzmq import Stopwatch as MyClock

aClock_A = MyClock(); aClock_B = MyClock(); aClock_C = MyClock(); print( "ACK: A,B,C made" )

# may use 'em when "framing" a code-execution block:
aClock_A.start(); _ = sum( [ aNumOfCollatzConjectureSteps( N ) for N in range( 10**10 ) ] ); TASK_A_us = aClock_A.stop()
print( "INF: Collatz-task took {0:} [us] ".format( TASK_A_us ) )

# may add 'em into call-signatures and pass 'em and/or re-use 'em inside whatever our code
aReturnedVALUE = aNumbaPreCompiledCODE(  1234,
                                        "myCode with a need to profile on several levels",
                                        aClock_A, #     several, 
                                        aClock_B, # pre-instantiated,
                                        aClock_C  #     Stopwatch instances, so as
                                        )         #  to avoid chained latencies

This way one can, if indeed pushed into using at least this, as the tool of the last resort, "hard-wire" into one's own source code any structure of Stopwatch-based profiling. The only restriction is the need to be conform the finite-state-automaton of the Stopwatch instance, where once a .start() method was called, only a .stop() method may come next and similarly, calling the .stop() method on a not yet .start()-ed instance will quite naturally throw an exception.

The common try-except-finally scaffolding will help to ascertain that all Stopwatch instances happen to become .stop()-ed again, even if exceptions may have happened.

Structure of "hard-wired" profiling depends on your code-execution "Hot-Spots under test" and even "cross-boundary" profiling of call-related overheads, spent between a native python call of the @jit-decorated numba-LLVM-ed code and starting the 1st line "inside" the numba-compiled code ( i.e. how long does it take between a call-invocation and parameter analyses, driven by a list of call-signatures or principally avoided, by enforcing a single, explicit, call-signature )

Good Luck. Hope it could help you.

like image 104
user3666197 Avatar answered Oct 01 '22 13:10

user3666197